Lecture 28, Subroutine Calls

Part of the notes for 22C:196:002 (CS:4908:0002)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Naive Model

A naive designer of stack computers might conclude that the following basic call and return instructions are appropriate:

CALL dst
CALL the subroutine at address dst, pushing the return address on the stack:
M[++sp] = pc;
pc = dst;
RETURN
Return to the caller by popping the stack top into the program counter.
pc = M[sp--];

To see what is wrong with these instructions, consider the following bit of code:

min: function int32 ( a:int32, b:int32 )
    temp: var int32
    temp = a
    if a > b then
        temp = b
    end
    return temp
end

Our first attempt to compile this gives the following:

MIN: PUSHI 0   -- allocate space for temp

     PUSHLA TEMP
     PUSHLA A
     LOAD
     POPS      -- temp = a

     PUSHLA A
     LOAD
     PUSHLA B
     LOAD
     GT
     BFALSE EI -- if a > b then

     PUSHLA TEMP
     PUSHLA B
     LOAD
     POPS      -- temp = b

EI:            -- end
     PUSHLA MIN_R
     PUSHLA TEMP
     LOAD
     POPS      -- return temp
               -- there is a problem here!
     RETURN

The note just above the return instruction indicates a serious problem. The local variable a is sitting on the stack above the return address, and it must somehow be popped off the stack before the RETURN instruction can be executed. Our (admittedly minimal) instruction set contains no mechanism for popping local variables off the stack prior to the return. The problem is worse, however. Consider the following code fragment from a user of the above:

-- ignore initialization of the local variable x
x = min(x,5)

The natural way to compile this is:

     PUSHLA X
     PUSHI  0  -- push a space to hold the return value
     PUSHLA X
     LOAD      -- push parameter 1, x
     PUSHI  5  -- push parameter 2, 5
     CALL   MIN
               -- there is a problem here!
     POPS      -- x = min(x, 5)

After the CALL instruction, the values of the two parameters are still on the stack, yet these must be popped before the POPS instruction can be used to store the result.

Kluges

Numerous computers have gone to market with call and return instructions that follow the naive model described above. On these machines, programmers typically must add instructions before each return operation to pop local variables from the stack, and after call instructions to pop parameters from the stack.

Sometimes, later models of the architecture include instructions, added as afterthoughts, to simplify stack cleanup. The PDP-11 computer is a good example. This machine has 8 registers; 6 are general purpose, while one is effectively reserved as the stack pointer and one serves as the program counter. Despite the fact that it has general purpose registers, the original PDP-11 model (the 11/20) had call and return instructions that basically followed the naive model given here (if you ignore irrelevant features).

Late in the history of the PDP-11, a new and truly remarkable instruction was added:

MARK n
sp = pc
pc = M[sp++]
sp = sp + n

What is going on here? This instruction seems to be doing entirely random things! First, note that the PDP-11 stack grew down. Incrementing the stack pointer popped things from the stack, while push decremented the stack pointer. The key to understanding the mark instruction lies in its application to the problem described above.

     PUSHLA X
     PUSHI  0  -- push a space to hold the return value
     PUSHLA X
     LOAD      -- push parameter 1, x
     PUSHI  5  -- push parameter 2, 5
     CALL   MIN
               -- the MARK solves the problem here
     POPS      -- x = min(x, 5)

Now look at how the called function is coded:

MIN: PUSHI (MARK 2) -- push a mark instruction on the stack
     PUSHI 0        -- allocate space for temp
     -- nothing in the body changes
     POPS           -- min = temp
     PUSHLA MARK_R
     POPPC

Above, we used the POPPC instruction to pop the stack top into the PC. We need something like this to call methods of objects -- where the code to call the method pushes the address of the method onto the stack, but here we used it in a distinctly odd way: We pushed the address of the MARK instruction itself (after all, it is effectively a local variable) onto the stack right after the return address (that is, right before it, since the stack grows by decrementing SP). To return, we jumped to the MARK instruction sitting on the stack top! Executing the MARK instruction in that context puts everything right, loading the following word (the return address) into the program counter and then setting the stack pointer 2 words farther into memory.

This is genuinely absurd, but as an afterthought is is impressively effective. The way the MARK instruction works comes close to being the definitive kluge. Jackson W. Granholm's classic 1962 article in Datamation entitled "How to Design a Kludge" gave the following definition: "An ill-assorted collection of poorly-matching parts, forming a distressing whole." The New Hacker's Dictionary offered these definitions for the word kluge: "A Rube Goldberg (or Heath Robinson) device, whether in hardware or software. A clever programming trick intended to solve a particular nasty case in an expedient, if not clear, manner; often used to repair bugs and often involves ad-hockery and verges on being a crock. Something that works for the wrong reason."

Activation Record Layout

The following table shows the layout of the activation record of the min function as we would expect in the naive model, at a point immediately after the values of the parameters a and b have been pushed on the stack, right before the GT instruction is executed.

     
caller's
 activation 
record
6   min_r (the return value)
5x   a
45   b
3   return address
2x   temp
1x  
05 ← top of stack
free
space
 

In the above, we assumed that the stack grows down, toward low memory, so that the displacements from the top of stack are positive when the referent is in the stack. If the stack pointer points to the top item on the stack, the above displacements are the ones that would be used in the code. We have assumed word addressing here, for the sake of simplicity.

On a machine with a PDP-11 style MARK instruction, the activation record would look like this:

     
caller's
 activation 
record
7   min_r (the return value)
6x   a
55   b
4   return address
3MARK 2 
2x   temp
1x  
05 ← top of stack
free
space
 

The name of the MARK instruction comes from a more generic term. In computers where activation records are pushed on a stack, the stack mark is the item that divides one activation record from the next. This term is based on an older term used on computers with a variable word length. On these machines, the word mark was an extra bit associated with each memory location to indicate whether that location was the last one in a word. Typical machines that used word marks had byte addressible memory with 6-bit bytes (excluding the bit used for the word mark). Of course, back then, the term byte was not yet current. Each 6-bit byte was commonly called a digit. The Burroughs architecture introduced the term mark stack control word for the word that divides activation records.

Note that these examples include space reserved on the stack for the function's return value. This space is reserved by pushing an empty word before the first parameter is pushed, and it is addressed as if it were a parameter or local variable, by addressing relative to the stack top or relative to the activation record pointer. It is crucial that the return value not be popped off the stack by the function return, since the value is needed by the caller.

Trying to do it right

The fundamental problem that led to the need for a MARK instruction on the PDP-11 was that the call and return instructions operated by directly pushing and popping the return address on the stack top. This is a fine way to do it on a pure stack machine, but on a machine with general purpose registers, it does not make sense. On machines with general purpose registers, reaching into the stack with indexed addressing is cheap. Therefore, we can use something like the following scheme:

CALL params,dst
CALL the subroutine at address dst, saving the return address on the stack below the parameters:
M[sp + params] = pc
pc = dst;
RETURN arsize
Return to the caller and pop the activation record off of the stack.
pc = M[sp + arsize]
sp = sp + locals + 1

The above definitions assume that arsize is the size of the activation record, in words, including parameters, local variables, and anything else that might be on the stack. The following picture shows the activation record structure for this model, with the snapshot taken at the same instant (after pushing copies of a and b for comparison) as was illustrated in the previous examples.

     
caller's
 activation 
record
6   min_r (the return value)
5   return address
4x   a
35   b
2x   temp
1x  
05 ← top of stack
free
space
 

It is extremely important, with this model, to have a PUSHU instruction that pushes an uninitialized word onto the stack. There is no need to waste a memory cycle pushing a zero in the spot where the return address will go. In fact, it is better to have a general PUSHU n instruction to push n consecutive uninitialized words. For example, before calling a function that returns a one-word value, we should do PUSHU 2 in order to reserve space for both the return address and the return value of the function. Similarly, if our programming language does not provide default initialization of variables, then all of the local variables can be allocated with a single PUSHU instruction that allocates them as a block.

If we are generating code on a general register machine, taking advantage of the fact that the activation record size is always statically known, along with the displacement of all fields within it, we can eliminate the stack pointer entirely and keep a single index register to point to the current activation record. In that case, PUSHU merely does a compile-time computation, incrementing the current size of the activation record without emitting any run-time code.