22C:122, Lecture Notes, Lecture 13, Fall 1999

Douglas W. Jones
University of Iowa Department of Computer Science

Instruction Coding for Stack Machines
The clearest example of the application of information theory to the design of instruction sets comes from a pair of papers published in the proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, ACM SIGARCH Computer Architecture News, Vol. 10, No. 2, Mar. 1982 and ACM SIGPLAN Notices, Vol. 17, No. 4, April 1982.
These papers describe the analysis and optimization of the stack instruction set for the Mesa language on the Alto (later Dorado and Dandelion) computer family designed by Xerox PARC. The same methodology could be applied to optimization of the JVM architecture supporting Java.
Static Analysis of the MESA Instruction Set by Sweet and Sandman (page 158) begins the job, while An of a MESA Instruction Set Using Dynamic Instruction Frequencies by McDaniel (page 177) completes this.
Methodology
Examine as much object code as you can for the target architecture. The people at Xerox had access to all existing code in the Mesa language, and they used all of it!
1) convert all code to canonical form, the pure zero operand stack architecture form with single simple straightforward representations for operations such as push immediate, goto, conditional goto, etc. The operation set should match that of the source language, not that of the target microarchitecture.
2) Collect instruction frequency data.
Sweet and Sandman found:
```
	Frequency, Instruction
	   16%       Push Immediate
	   12%       Push 16-bit value from local variable
	    6%       Pop 16-bit into local variable
	    5%       Revover value just popped from stack
	    5%       Push 32-bit value from local variable
	    4%       Call function
	    4%       Jump
	    3%       Follow pointer on stack top
	    3%       Pop 32-bit value to local variable
	    2%       Push address of local variable
	    2%       Add 16-bit
	    2%       Jump on top 2 words not equal
	    2%       Return
	    2%       Jump on equal
	    2%       Push 16-bit value from global variable
	    2%       Call local function
	    2%       Add 32-bit
	    1%       Push 32-bit value from global variable
	
```
Using this information and some elementary information theory, you can quickly conclude that, for example, the Push immediate instruction should have a 3 bit opcode because it represents about 1 in 8 instructions. The push and pop operations that address local variables should, similarly, have 4 bit opcodes.
If the machine has an 8 bit instruction syllable (as the Xerox Mesa machine did), you are rapidly led to instruction formats such as:
```
	 ________ ______________ 
	|push imm|    const     |
	|__|__|__|__|__|__|__|__|
	 ___________ ___________ 
	|push local |   addr    |
	|__|__|__|__|__|__|__|__|
	 ___________ ___________ 
	| pop local |   addr    |
	|__|__|__|__|__|__|__|__|
	
```
Of course, if someone wants to push an arbitrary 16 or 32 bit constant, this should be supported, but with 5 bits, we can provide short representations of the 32 most popular constants. Similarly, some functions will probably have more than 16 local variables, but most do not.
The people at Xerox continued this analysis by doing frequency analysis of the operands to popular operations, concluding that, indeed, half of all immediate constants were zero, and beyond this, most were selected from a few small values.
Similarly, most functions don't have 16 local variables, and in those that have more, the local variables can be sorted in order of frequency of use, leading to code where the vast majority of local variables references are to the first few locals.
3) Apply peephole optimization to the compiler output, substituting these new short instructions werever they may be substituted into the instruction stream.
4) Analyze the instruction stream for common instruction sequences such as "push immediate zero" followed by "compare for equal". Wherever such an instruction sequence occurs with a frequency of more than 1 in 256, reserve an opcode for it. In this case, "compare with immediate zero".
5) Apply peephole optimization to the compiler output to utilize these new instructions, and then repeat the analysis again!
The result is a very efficient static encoding of the instruction set!
Dynamic Analysis
The dynamic analysis of the resulting instruction set verified that the static instruciton frequency was a good enough predictor of the dynamic instruciton frequency, and it provided a number of strong hints to the designers of the underlying register transfer logic, suggesting shortcuts in the data paths that would lead to noticably faster execution (for example, eliminating the ALU from address computation in the case where a pointer to a simple variable was being followed, since pointers don't require ALU processing).