22C:18, Lecture 29, Summer 1997

Douglas W. Jones
University of Iowa Department of Computer Science

Hawk programming using RPN

One way to program the Hawk computer in RPN is to directly emulate the Burroughs 5000 architecture; for example, we could:

This would allow us to write macros such as the following (note the use of lower case to avoid conflict with predefined Hawk symbols):
MACRO add
  ADDSI  R4,-4
  LOADS  R1,R4
  ADDSI  R4,-4
  LOADS  R2,R4
  ADD    R2,R2,R1
  STORES R2,R4
  ADDSI  R4,4
ENDMAC

MACRO pushimm =c
  LIL    R1,c
  STORES R1,R4
  ADDSI  R4,-4
ENDMAC
A more pleasing version of the add macro shown above can written follows, although this is unlikely to make it run any faster because although it reduces the instruction count, it does not change the total number of halfwords of used to represent the instructions:
MACRO add
  ADDSI  R4,-4
  LOADS  R1,R4
  LOAD   R2,R4,-4
  ADD    R2,R2,R1
  STORE  R2,R4,-4
ENDMAC
The problem with this approach is that each B 5000 instruction translates into a long sequence of Hawk instructions, no effective use is made of most of the Hawk registers, and many of the instructions are "housekeeping" instructions that have to do with computing things at run-time that could better be computed at assembly time.

Consider what happens to the following short sequence of B 5000 instructions:

pushlocal a
pushlocal b
add
poplocal a
A decent Hawk programmer would be tempted to translate this to something like the following (using R3 as the frame pointer):
LOAD  R1,R3,a
LOAD  R2,R3,b
ADD   R1,R2,R1
STORE R1,R3,a
Using a macro set such as the above, however, will lead to something like the following:
; pushlocal a
   LOAD   R1,R3,a  ; --
   STORES R1,R4
   ADDSI  R4,4
; pushlocal b
   LOAD   R2,R3,b  ; --
   STORES R1,R4
   ADDSI  R4,4
; add
   ADDSI  R4,-4
   LOADS  R1,R4
   LOAD   R2,R4,-4
   ADD    R1,R2,R1 ; --
   STORE  R2,R4,-4
; poplocal b
   ADDSI  R4,-4
   LOADS  R1,R4
   STORE  R1,R3,a  ; --
This is awful! We only needed the 4 marked instructions, but we have added 10 extra instructions! Our programs will be huge and slow compared to what they could be!

It is worth noting that, on the B 5000, the stack pointer and frame pointer (activation record pointer) had an interesting relationship to each other. Within any particular procedure, whenever any particular instruction is executed, the difference between these two registers was always constant. Consider the following example from the previous lecture:

   D    Code
  ===   ==================
        multiply:
           -- assume the caller the parameters above the mark
           -- x is therefore local variable 0
           -- y is therefore local variable 1
   2       Push Local 0 -- get x
   3       Branch Not Equal else
   2       Push Immediate 0
   3       Function Return
        else:
   2       Push Mark -- setup to push parameters to call
   4       Push Local 1 -- get y
   5       Push Local 0 -- get x
   6       Push Immediate 1
   7       Subtract     -- we're passing x-1
   6       Parameterized Call multiply, 2  -- the call
   3       Push Local 1 -- get y again
   4       Add          -- we're returning multiply(y,x-1)+y
   3       Function Return

The numbers in the column labeled D give the displacement of the stack pointer from the frame pointer immediately before each instruction is executed. The initial value is 2 because the caller pushed two parameters on the stack. Each push increments this value, and each pop decrements it, but the control structures are such that every single time each instruction is executed, no matter where the function was called from, D will have the value shown!

A Static Approach to Stack Management

As a result of this observation, we can never justify computing the value of the stack pointer at run-time! What is essential is that the frame pointer be adjusted properly with each procedure call. This leads to the following alternative proposal for macros to imitate the B 5000 instruction set:

In effect, we are returning to the standard Hawk calling sequence we have used up to this point in the semester! Even our use of AR is compatable.
MACRO add
  AR = AR - 4
  LOAD  R1,R2,AR
  AR = AR - 4
  LOAD  R3,R2,AR
  ADD   R1,R1,R2
  STORE R1,R2,AR
  AR = AR + 4
ENDMAC

MACRO pushimm =c
  LIL    R1,c
  STORE  R1,R4,AR
  AR = AR + 4
ENDMAC
Again, we can improve our coding of the above add macro:
MACRO add
  AR = AR - 1
  LOAD  R1,R2,AR
  LOAD  R3,R2,AR-4
  ADD   R1,R1,R3
  STORE R1,R2,AR-4
ENDMAC
Now, consider what happens when we use these macros to implement the same short sequence of B 5000 instructions we discussed above:
pushlocal a
pushlocal b
add
poplocal a
Expanding these macro calls using our new macros gives something like the following; for the sake of example, we've assumed that AR was initially 12:
; pushlocal a
   LOAD   R1,R2,a  ; --
   STORE  R1,R2,12
; pushlocal b
   LOAD   R2,R3,b  ; --
   STORE  R1,R2,16
; add
   LOAD   R1,R2,16
   LOAD   R3,R2,12
   ADD    R1,R1,R3 ; --
   STORE  R1,R2,12
; poplocal b
   LOAD   R1,R2,12
   STORE  R1,R2,a  ; --
This code takes exactly the same number of halfwords of instruction space as was taken by the expansion of the previous macros, so it will probably run about as fast, even though it requires fewer instructions. The one advantage of this code is that it uses fewer registers by eliminating the need for a stack pointer register, but it does nothing to take advantage of the registers the Hawk machine already has, so this advantage is not put to any good use.

Putting the Stack Top in Registers

If stack machines like the descendants of the Burroughs 5000 or the Rockwell AAMP actually stored the entire stack in memory, the resulting performance would be bad enough that the machines would never have sold. One of the keys to the success of these machines is that they don't use the literal push and pop logic of:

	push(x) = { M[SP] = x; SP = SP + 1; }
	pop() =  { SP = SP - 1; return M[SP] }
Instead, these machines store, at minimum, the top two items on the stack in "hidden" registers inside the CPU. The programmer can always assume that the contents of the stack is in memory, but in fact, short sequences of pushes and pops, such as the sequence used to add 2 numbers in the above examples, can usually be executed with no extra memory references.

We can exploit exactly the same idea with macros on the Hawk machine. Fully developing this idea takes work, but for a start, consider the following idea. We'll use R3 to R15 as our stack, and we'll use the assembly time symbol SP to designate the lowest unused register on this stack. As with the previous set of macros, we'll allocate registers statically.

MACRO add
  SP = SP - 1
  ADD   SP-1,SP-1,SP
ENDMAC

MACRO pushimm =c
  LIL    SP,c
  SP = SP + 1
ENDMAC
Assuming R5 as an initial value of SP, macros such as the above allow our example to be coded as follows:
; pushlocal a
   LOAD   R5,R2,a  ; --
; pushlocal b
   LOAD   R6,R2,b  ; --
; add
   ADD    R5,R5,R6 ; --
; poplocal b
   STORE  R5,R2,a  ; --
This is exactly the quality of code we would expect from a decent compiler on the Hawk! There is a problem, though, because now, our "logical" activation record has the following format:
      On the Stack in Memory
   
	|                |
	|                |<----
	|----------------|   |
	|                |   | 
	|  Some Locals   |  AR
	|                |   | 
	|----------------|   |
	| Return Address |<---------
	|----------------|           |
	|                |           |
	|    previous    |           |
	|   activation   |           |
	|     record     |           |
	|                |           |
				     |
				     |
	   In Registers              |
				     |
	|                |           |
	|                |<---- SP   |
	|----------------|           |
	|                |           |
	|                |           |
	|  Other Locals  |           |
	|                |           |
	|                |           |
	|----------------|           |
	|       R2     0-+----------
	|----------------|
	|       R1       |
	 ----------------
The result of this is that the pushing of the caller's activation record for a procedure call is going to be more complex than it was when all of the locals of a procedure were in memory!