Lecture 30, Subroutine Calls
A naive designer of stack computers might conclude that the following basic call and return instructions are appropriate:
M[--sp] = pc; pc = dst;
pc = M[sp++];
To see what is wrong with these instructions, consider the following bit of code:
min: function int32 ( a:int32, b:int32 ) temp: var int32 temp = a if a > b then temp = b end return temp end
Our first attempt to compile this requires that we have some way to push the addresses of local variables on the stack. We'll use the PUSHLA high-level instruction to do this, ignoring for the moment how it works.
MIN: PUSHL 1 -- allocate space for temp PUSHLA TEMP PUSHLA A LOAD POPS -- temp = a PUSHLA A LOAD PUSHLA B LOAD GT BFALSE EI -- if a > b then PUSHLA TEMP PUSHLA B LOAD POPS -- temp = b EI: -- end PUSHLA RETURNVALUE PUSHLA TEMP LOAD POPS -- return temp POPL 1 -- deallocate temp RETURN
This code has several problems.
The classic way to handle local variable addressing is to set aside a register called the frame pointer or fp. This lets us define
M[--sp] = fp - x;
This definition shifts the problem from how do we push the value of a local variable to how does the frame pointer get set on entry to a subroutine and how does it get restored on return. An obvious idea is to make the CALL and RETURN instructions in our stack architecture do this. For example, we can define them as follows:
M[--sp] = pc; M[--sp] = fp; fp = sp; pc = dst;
fp = M[sp++]; pc = M[sp++];
There are several awkward problems with the above. First, consider the expression x+min(y,z). The code we would like to generate looks like this:
PUSHGA x LOAD PUSHGA y LOAD PUSHGA z LOAD CALL min ADD
Unfortunately, this won't work because the return from the call did not pop the parameters off the stack. We could solve this by adding a POPL command to get rid of one parameter right after the CALL, but this is awkward.
There is a second problem. The parameters got pushed onto the stack before the return address and saved frame pointer, but the local variables get pushed after the return address. If the frame pointer points to the saved value of the frame pointer (that's how our definition of CALL worked out above) this means that the addresses of local variables and parameters have the opposite sign relative to the frame pointer. There's nothing difficult to implement about this, but it is awkward.
Finally, what is the local address where we store the return value? We can easily solve this problem: It is the same as the address of the first parameter.
Numerous computers have gone to market with call and return instructions that follow the naive model described at the top of this section, including the Intel x86 architecture. On these machines, programmers typically must add instructions before each return operation to pop local variables from the stack, and after call instructions to pop parameters from the stack.
Sometimes, later models of the architecture include instructions, added as afterthoughts, to simplify stack cleanup. The PDP-11 computer is a good example. This machine has 8 registers; 6 are general purpose, while one is effectively reserved as the stack pointer and one serves as the program counter. Despite the fact that it has general purpose registers, the original PDP-11 model (the 11/20) had call and return instructions that basically followed the naive model given here (if you ignore irrelevant features).
Late in the history of the PDP-11, a new and truly remarkable instruction was added:
sp = pc pc = M[sp++] sp = sp + n
What is going on here? This instruction seems to be doing entirely random things! First, note that the PDP-11 stack grew down, just like the ARM and the Intel x86. Incrementing the stack pointer popped things from the stack, while push decremented the stack pointer. The key to understanding the mark instruction lies how it is used. Consider the expression x+min(y,z). The code we will generate for a call using a PDP-11ish MARK instruction looks like this:
PUSHLA X PUSHLA Y LOAD -- push parameter 1, y PUSHLA Z LOAD -- push parameter 2, z CALL MIN -- the MARK pops down to the result ADD -- compute x+min(y, z)
Now look at how the called function is coded:
MIN: PUSHI (MARK 1) -- push mark instruction to pop one parameter PUSHI 0 -- allocate space for temp -- nothing in the body changes POPS -- min = temp PUSHLA MARK_R RETURN -- jump to the mark
Above, we used the naive RETURN instruction, but we used it in a strange way. It's not returning to the point of call because we just pushed the address of the MARK instruction itself (effectively a local variable) onto the stack. So, the RETURN is going to jump to an instruction on the stack. Executing the MARK instruction in that context puts everything right, loading the following word (the return address) into the program counter and then setting the stack pointer 2 words farther into memory. This combination pops all the local variables off of the stack as well as the parameters.
The PDP-11 MARK instruction didn't solve all of our problems. It forced us to reserve space on the stack just below the parameters for the return value of the function, and it didn't automate the handling of the frame pointer.
The MARK instruction is genuinely absurd, but it is an impressively effective afterthought. The way the MARK instruction works comes close to being the definitive kluge. Jackson W. Granholm's classic 1962 article in Datamation entitled "How to Design a Kludge" gave the following definition: "An ill-assorted collection of poorly-matching parts, forming a distressing whole." The New Hacker's Dictionary offered these definitions for the word kluge: "A Rube Goldberg (or Heath Robinson) device, whether in hardware or software. A clever programming trick intended to solve a particular nasty case in an expedient, if not clear, manner; often used to repair bugs and often involves ad-hockery and verges on being a crock. Something that works for the wrong reason."
The MARK instruction takes its name from the B5000 architecture, where the term "stack mark" was used to refer to the block of stuff pushed on the stack between the caller's local variables and the part of the stack used by the called code. The Burroughs stack marks were carefully packaged bundles containing the return address and saved frame pointer.
The following table shows the layout of the activation record of the min function as we would expect in the naive model, at a point immediately after the values of the parameters a and b have been pushed on the stack, right before the GT instruction is executed.
|5||x||a = returnvalue|
|0||5||← top of stack|
In the above, we assumed that the stack grows down, toward low memory, and we assumed that the return value will go in the location occupied by the first parameter. The displacements from the top of stack are positive when the referent is in the stack. If the stack pointer points to the top item on the stack, the above displacements are the ones that could be used in the code assuming that PUSHL uses the stack pointer instead of the frame pointer. We have assumed word addressing here, for the sake of simplicity.
On a machine with a PDP-11 style MARK instruction, the activation record would look like this:
|6||x||a = returnvalue|
|0||5||← top of stack|
Again, we have used the location of the first parameter to hold the return value. It is crucial that the return value not be popped off the stack by the function return, so the MARK instruction only popped one of the two parameters from the stack.
Note that if the return value of the function is more than one word, it may be necessary to reserve space for the return value before pushing the parameters. Consider, for example, the case of a function that takes a one-word integer as a parameter and returns a two-word floating point value as a result.
The fundamental problem that led to the need for a MARK instruction on the PDP-11 was that the call and return instructions operated by directly pushing and popping the return address on the stack top. This is a fine way to do it on a pure stack machine, but on a machine with general purpose registers, it does not make sense. On machines with general purpose registers, reaching into the stack with indexed addressing is cheap. Therefore, we can use something like the following scheme:
M[sp + params] = pc pc = dst;
pc = M[sp + arsize] sp = sp + locals + 1
The above definitions assume that arsize is the size of the activation record, in words, including parameters, local variables, and anything else that might be on the stack. The following picture shows the activation record structure for this model, with the snapshot taken at the same instant (after pushing copies of a and b for comparison) as was illustrated in the previous examples.
|0||5||← top of stack|
It is useful, with this model, to have a PUSHL instruction that pushes uninitialized data onto the stack. There is no need to waste a memory cycle pushing a zero in the spot where the return address or the return value will go. For example, before calling a function that returns a one-word value, we should do PUSHL 2 in order to reserve space for both the return address and the return value of the function. Similarly, if our programming language does not provide default initialization of variables, then all of the local variables can be allocated with a single PUSHL instruction that allocates them as a block.
If we are generating code on a general register machine, taking advantage of the fact that the activation record size is always statically known, along with the displacement of all fields within it, we can eliminate the stack pointer entirely and keep a single index register to point to the current activation record. In that case, PUSHL merely does a compile-time computation, incrementing the current size of the activation record without emitting any run-time code. On the ARM, we can't do this because the ARM coding standards ask us to assure that the stack pointer always points to the last word we are using on the stack.