18. The Xerox Alto

Part of the 22C:122/55:132 Lecture Notes for Spring 2003
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Instruction Coding for Stack Machines

The clearest example of the application of information theory to the design of instruction sets comes from a pair of papers published in the proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, ACM SIGARCH Computer Architecture News, Vol. 10, No. 2, Mar. 1982 and ACM SIGPLAN Notices, Vol. 17, No. 4, April 1982.

These papers describe the analysis and optimization of the stack instruction set for the Mesa language on the Alto (later Dorado and Dandelion) computer family designed by Xerox PARC. The same methodology could be applied to optimization of the JVM architecture supporting Java.

Static Analysis of the MESA Instruction Set by Sweet and Sandman (page 158) begins the job, while An of a MESA Instruction Set Using Dynamic Instruction Frequencies by McDaniel (page 177) completes this.

The context of the Alto design was important to this project. The Alto computer was designed at Xerox as part of their "automated office of the future" project in the 1970's. The Alto was user microprogrammable (the microcode was stored in fast RAM inside the CPU), and therefore, the compiler writers could custom write microcode to execute the object code produced by their compilers. Furthermore, all of the Alto machines were internal to Xerox corporation, so there were no paying customers to get upset if the development group for one of the languages decided to change the instruction set used for that language and force all of the users to recompile everything they'd written in that language.

The Xerox Automated Office of the Future project also produced the Ethernet protocols for local area networks, the Smalltalk programming language, and they developed the WIMP (Window Icon Mouse Pointer) user interface to the point where it would inspire the development of the Apple Macintosh.

Methodology

Examine as much object code as you can for the target architecture. The people at Xerox had access to all existing code in the Mesa language, and they used all of it!

1) convert all code to canonical form, the pure zero operand stack architecture form with single simple straightforward representations for operations such as push immediate, goto, conditional goto, etc. The operation set should match that of the source language, not that of the target microarchitecture.

2) Collect instruction frequency data.

Sweet and Sandman found:

	Frequency, Instruction
	   16%       Push Immediate
	   12%       Push 16-bit value from local variable
	    6%       Pop 16-bit into local variable
	    5%       Revover value just popped from stack
	    5%       Push 32-bit value from local variable
	    4%       Call function
	    4%       Jump
	    3%       Follow pointer on stack top
	    3%       Pop 32-bit value to local variable
	    2%       Push address of local variable
	    2%       Add 16-bit
	    2%       Jump on top 2 words not equal
	    2%       Return
	    2%       Jump on equal
	    2%       Push 16-bit value from global variable
	    2%       Call local function
	    2%       Add 32-bit
	    1%       Push 32-bit value from global variable

Using this information and some elementary information theory, you can quickly conclude that, for example, the Push immediate instruction should have a 3 bit opcode because it represents about 1 in 8 instructions. The push and pop operations that address local variables should, similarly, have 4 bit opcodes.

If the machine has an 8 bit instruction syllable (as the Xerox Mesa machine did), you are rapidly led to instruction formats such as:

	 ________ ______________ 
	|push imm|    const     |
	|__|__|__|__|__|__|__|__|
	 ___________ ___________ 
	|push local |   addr    |
	|__|__|__|__|__|__|__|__|
	 ___________ ___________ 
	| pop local |   addr    |
	|__|__|__|__|__|__|__|__|

Of course, if someone wants to push an arbitrary 16 or 32 bit constant, this should be supported, but with 5 bits, we can provide short representations of the 32 most popular constants. Similarly, some functions will probably have more than 16 local variables, but most do not.

The people at Xerox continued this analysis by doing frequency analysis of the operands to popular operations, concluding that, indeed, half of all immediate constants were zero, and beyond this, most were selected from a few small values.

Similarly, most functions don't have 16 local variables, and in those that have more, the local variables can be sorted in order of frequency of use, leading to code where the vast majority of local variables references are to the first few locals.

3) Apply peephole optimization to the compiler output, substituting these new short instructions werever they may be substituted into the instruction stream.

4) Analyze the instruction stream for common instruction sequences such as "push immediate zero" followed by "compare for equal". Wherever such an instruction sequence occurs with a frequency of more than 1 in 256, reserve an opcode for it. In this case, "compare with immediate zero".

5) Apply peephole optimization to the compiler output to utilize these new instructions, and then repeat the analysis again!

The result is a very efficient static encoding of the instruction set!

Dynamic Analysis

The dynamic analysis of the resulting instruction set verified that the static instruciton frequency was a good enough predictor of the dynamic instruciton frequency, and it provided a number of strong hints to the designers of the underlying register transfer logic, suggesting shortcuts in the data paths that would lead to noticably faster execution (for example, eliminating the ALU from address computation in the case where a pointer to a simple variable was being followed, since pointers don't require ALU processing).