18. The Xerox Alto

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Instruction Coding for Stack Machines

The clearest example of the application of information theory to the design of instruction sets comes from a pair of papers published in the proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, ACM SIGARCH Computer Architecture News, Vol. 10, No. 2, Mar. 1982 and ACM SIGPLAN Notices, Vol. 17, No. 4, April 1982.

These papers describe the analysis and optimization of the stack instruction set for the Mesa language on the Alto (later Dorado and Dandelion) computer family designed by Xerox PARC. The same methodology could be applied to optimization of the JVM architecture supporting Java or to any similar stack-based instruction set.

Static Analysis of the MESA Instruction Set by Sweet and Sandman (page 158) begins the job, while An of a MESA Instruction Set Using Dynamic Instruction Frequencies by McDaniel (page 177) completes this.

It is worth noting that the MESA language was developed as an embedded systems development language. As such, floating point is not a priority, but strong support for character and integer types is essential. MESA was a predecessor language to the Ada project, with similar goals of supporting strong encapsulation of the implementations of abstract data types. It was not object oriented, but the philosophy was similar in many regards.

The context of the Alto design was important to this project. The Alto computer was designed at Xerox as part of their "automated office of the future" project in the 1970's. The Alto was user microprogrammable (the microcode was stored in fast RAM inside the CPU), and therefore, the compiler writers could custom write microcode to execute the object code produced by their compilers. Furthermore, all of the Alto machines were internal to Xerox corporation, so there were no paying customers to get upset if the development group for one of the languages decided to change the instruction set used for that language and force all of the users to recompile everything they'd written in that language.

The Xerox Automated Office of the Future project also produced the Ethernet protocols for local area networks, the Smalltalk programming language, and they developed the WIMP (Window Icon Mouse Pointer) user interface to the point where it would inspire the development of the Apple Macintosh.

Methodology

This is an empirical methodology for developing a near optimal encoding of an instruction set. It rests on examination of as much object code as possible the target architecture. The people at Xerox had access to all existing code in the Mesa language, and they used all of it repeatedly as they refined the instruction set!

1) The first step is to convert all object code to canonical form, the pure zero operand stack architecture form with single simple straightforward representations for operations such as push immediate, goto, conditional goto, etc. The operation set should match that of the source language, not that of the target microarchitecture.

2) Collect instruction frequency data.

Sweet and Sandman found:

	Frequency, Instruction
	   16%       Push Immediate
	   12%       Push 16-bit value from local variable
	    6%       Pop 16-bit into local variable
	    5%       Recover value just popped from stack
	    5%       Push 32-bit value from local variable
	    4%       Call function
	    4%       Jump
	    3%       Follow pointer on stack top
	    3%       Pop 32-bit value to local variable
	    2%       Push address of local variable
	    2%       Add 16-bit
	    2%       Jump on top 2 words not equal
	    2%       Return
	    2%       Jump on equal
	    2%       Push 16-bit value from global variable
	    2%       Call local function
	    2%       Add 32-bit
	    1%       Push 32-bit value from global variable

Using this information and some elementary information theory, you can quickly conclude that, for example, the Push immediate instruction should have a 3 bit opcode because it represents about 1 in 8 instructions. The push and pop operations that address local variables should, similarly, have 4 bit opcodes.

If the machine has an 8 bit instruction syllable (as the Xerox Mesa machine did), you are rapidly led to instruction formats such as:

	 ________ ______________ 
	|push imm|    const     |
	|__|__|__|__|__|__|__|__|
	 ___________ ___________ 
	|push local |   addr    |
	|__|__|__|__|__|__|__|__|
	 ___________ ___________ 
	| pop local |   addr    |
	|__|__|__|__|__|__|__|__|

Of course, if someone wants to push an arbitrary 16 or 32 bit constant, this should be supported, but with 5 bits, we can provide short representations of the 32 most popular constants. Similarly, some functions will probably have more than 16 local variables, but most do not.

The people at Xerox continued this analysis by doing frequency analysis of the operands to popular operations, concluding that, indeed, half of all immediate constants were zero, and beyond this, most were selected from a few small values less than 16.

Similarly, most functions don't have 16 local variables, and in those that have more, the local variables can be sorted in order of frequency of use, leading to code where the vast majority of local variables references are to the first few locals.

3) Apply peephole optimization to the compiler output, substituting these new short instructions werever they may be substituted into the instruction stream.

4) Analyze the instruction stream for common instruction sequences such as "push immediate zero" followed by "compare for equal". Wherever such an instruction sequence occurs with a frequency of more than 1 in 256, reserve an opcode for it. In this case, "compare with immediate zero".

5) Apply peephole optimization to the compiler output to utilize these new instructions, and then repeat the analysis again! Peephole optimization involves only local changes to the output of the compiler, replacing short sequences of instructions with equivalent shorter sequences. Global optimization is also importantm, but it does not hinge to a great extent on issues of instruciton encoding.

The result is a very efficient static encoding of the instruction set!

Dynamic Analysis

The dynamic analysis of the resulting instruction set verified that the static instruciton frequency was a good enough predictor of the dynamic instruciton frequency. The most important contribution of the dynamic analysis, however, was to the microcode and to the design of the register-transfer data paths.

The dynamic analysis shows which instructions are the important instructions, from the point of view of performance, allowing the microcode for those instructions to be tightened, and study of the dynamic frequencies shows which data paths in the machine are the bottlenecks to faster execution.