33. Calls, Interrupts and Traps
the 22C:122/55:132 Lecture Notes for Spring 2004
Another classic area where different architectural approaches lead in vastly different directions is the area of procedure, function and method calls.
Since the development of the Algol 60 programming language it has been common to think of calls as allocating objects called activation records on a run-time stack. (It is noteworthy that PL/I, Pascal, C and Simula 67 were all strongly influenced by Algol, and that C++ and Java combine large parts of Simula 67 semantics with C syntax.) The basic model for use of a stack was well documented in E. W. Dijkstra's paper on Recursive Programming (Numerische Mathematik, 2, 312-318, 1960), although Dijkstra did not use the term activation record. The activation record for a block of code contains:
In general, if we assume this model, two registers must be dedicated to holding the static link and dynamic link. A call is made as follows:
In general, it takes several instructions to do all of this! Some of these instructions will be part of the code in the calling block, and some will be part of the prologue on the called block. In general, on return from a block to its caller, it takes several instructions to restore the execution context and deallocate the activation record that is no longer needed. Again, these instructions are typically divided between the epilogue on the called block and the body of the calling block. In sum, we can identify 3 sequences of instructions:
Usually, when talking collectively about these three types of instruciton sequences, the term calling sequence is used broadly to refer to all three. There are obviously many ways to build calling sequences, but the design of a good calling sequence is sufficiently difficult that, on most machines, there is a standard calling sequence used by most programmers.
Machine architects have long seen calling sequences as a problem area. As a result, they have long attempted to produce general call instructions that package large parts of the calling sequence into one machine instruction. A typical medium complexity CALL instruction will push the return address on the stack and transfer control to a destination, but many machines have been designed with CALL instructions that attempted to do quite a bit more.
The DEC VAX, for example, included a CALL instruction that included a mask indicating what registers were being used, and selectively pushed these registers prior to pushing the return address and a stack mark word holding this mask so that the right registers can be restored on return. The corresponding RETURN instruction popped the stack mark word and then used it to restore the appropriate registers, including the saved program counter and restore the stack to its state immediately prior to the call. These instrucitons require large numbers of memory cycles and force a break from pipelined execution!
The designers of Gnu C++ discovered that they could almost always produce faster object code if they never used the standard VAX CALL instruction except for calling library routines that were obliged to adhere to the standard. This discovery led many machine architects to abandon such complex call instructions and go back to far simpler models that were easy to pipeline. Typically, this has meant a return to CALL instructions with the semantics of the old IBM System 360 BAL (branch and link) instruction:
CALL R DST R = PC PC = DST (the effective address)
The return address is simply left in a register, so that the receiving sequence can save it as needed. If the function is at a leaf of the call tree (it calls no other functions), the return address can remain in the register for the duration.
Given this basic calling mechanism, and the fully developed model of an activation record as outlined above, a typical calling sequence suite would be:
-- assume the following named index registers: -- AR - pointer to the current activation record -- NR - points to new activation record (temporarily) -- RA - register holding return address (temporarily) -- if P points to an activation record -- P->CP holds the continuation point -- P->DL holds the saved dynamic link -- P->SL holds the saved static link -- we assume that ARS, the size of each activation -- record, is known statically; it may vary from -- place to place depending on the number of -- temporaries currently needed, but each time -- any particular instruciton is executed, the -- size is fixed. -------------------- -- calling sequence ADD NR = AR + ARS -- (a) -- copy parameters into fields of *NR (b) -- save registers into fields of *AR (c) MOVE NR->SL = DST_CONTEXT -- (d) CALL RA,DST -- restore registers from *AR (e) -------------------- -- receiving sequence DST MOVE AR->CP = RA -- (f) MOVE NR->DL = AR -- (g) MOVE AR = NR -- (h) -------------------- -- return sequence MOVE AR = AR->DL -- (i) MOVE RA = AR->CP -- (j) JUMP *RA
Notes on the above pseudocode:
(a) The above code assumes that the called function will need to use RAM. Small functions where the entire activation record for the function fits in registers need no activation record in RAM and this line can be eliminated.
(b) The above code assumes that parameters are passed in memory, as must be the case if there are many or if large objects are passed by value. If there are only a few small parameters, all may be passed in registers and the code at b will not be needed.
(c,e) The above code assumes that the caller has values in registers that must be saved. If the called routine (and anything it calls) does not modify certain registers, this code can be omitted.
(d) If the called routine makes no reference to global variables, or rather, to variables declared in statically enclosing blocks other than the outermost block, it needs no static link and this line can be omitted.
(f,j) If the called routine does not call other routines and does not need extra registers, these lines may be omitted and the return address may remain in the RA register through the call.
(g,h,i) If the called routine does not call others and does not need extra registers, these lines may be omitted. If this is done, all references to local variables or parameters in memory must be indexed off of NR instead of AR.
The above notes indicate, in a fair amount of detail, the kind of optimizations that good compilers such as the GNU C++ compiler make in order to minimize the cost of calls. This minimization is only possible if the compiler knows all of the characteristics of the called routine at the time the code for the call is compiled. Therefore, calls to other functions in the same compilation unit will typically be more efficient to calls in other compilation units, and calls to previously compiled routines may be more efficient than calls to routines that have not yet been compiled, but only if the compiler has access to a sufficiently detailed description of the called routine at the time it compiles the call.
Interrupts have been described as calls to parameterless functions that return no results and are initiated not by the calling code but by events detected by hardware outside the CPU such as the completion of an input/output transfer.
In fact, interrupts are not so simple except in the simplest of computers! The original DEC PDP-8 had one of the simplest interrupt mechanisms. On interrupt:
It was up to interrupt handler, starting at location 1, to save the other registers in the CPU, determine what caused the interrupt, and eventually, return by reenabling interrupts and doing an indirect jump through location zero.
This simplicity is misleading! The PDP-8 CPU had an Interrupt Enable bit, IE. If IE was set to zero, the instruction execution cycle would not handle interrupts. The occurance of an interrupt automatically reset IE, but on return, it was up to the software to enable interrupts; this meant that there was a 1-instruction window between the instruction that enabled interrupts and the instruction that completed the return. An interrupt could not be allowed to occur within this window, so additional hardware had to be added to prevent interrupt processing between the enable interrupt instruction and the following instruction. Things got even more complex in the context of the PDP-8 memory management unit because interrupt processing always took place in bank 0 of memory, and this meant that the MMU state had to be saved and restored.
On the DEC PDP-11 and many stack architectures interrupts were handled quite differently! On interrupt, the previous program counter and processor status word were pushed onto the stack, and then the program counter (PC) and processor status word (PSW) were loaded from the interrupt vector. This again looks simpler than it is, particularly on systems with multiple processes, each with its own stack, and even more so when there is a memory management unit so that each process has its own address space.
The PDP-11 processor status word included the condition codes, an indication of the priority level at which the processor was running, and a field dedicated to controlling the optional memory management unit. The priority field was a 3-bit field. If the CPU was running at high priority, all but the non-maskable interrupts were disabled. at lower priority levels, more interrupts were enabled, until at the lowest level, all were enabled.
The interrupt vector contained 8 PC/PSW pairs, one corresponding to each of the 8 priority levels. The interrupt request line from a device could be attached to one of the 8 levels, so that when that device requested an interrupt, the PC and PSW would be loaded from the corresponding entry in the vector.
The PDP-11 had a special return-from-interrupt instruciton that popped the top two words form the stack into the PC and PSW. So long as there was sufficient space on the stack of the currently running application, interrupt service routines could operate entirely in the context of that stack, popping everything they pushed prior to return and restoring the stack to its state prior to the interrupt as they finished the returned.
The trend in more recent architectures is generally away from the complexity of CISC interrupt models and toward a simple model where the interrupt hardware has no knowledge of the structure of the stack, if there is a stack! Thus, the interrupt may simply save the program counter and processor status word (including the basic MMU state) in special registers, set the program counter to a constant that depends on the device that requested the interrupt, and clear the processor status word. From that point on, it is up to the interrupt service routine (application software) to save things in the appropriate places.
This does not mean that the state save and restore process is slow compared to earlier machines! When interrupts push on a stack, if this was the wrong place to save the state (as is frequently the case on machines with one stack and address space per process), the software may have to pop information from one stack and save it elsewhere. Furthermore, the expensive part of interrupt handling is frequently the job of saving and restoring other state information, for example, the contents of registers. Registers are relatively inexpensive, and since the 1970's, many machines have provided multiple register banks, reserving a field of the processor status word to select the currently active register bank. When this is done, the change of processor status word on entry and exit from the interrupt service routine is all that is needed to save and restore the bulk of the registers needed by an applicaton.
Of course, this raises the question of how many register banks a machine should have. Two banks allows applications to share one while interrupt service routines use the other. So long as interrupt service routines cannot be interrupted, this is straightforward. More banks add complexity to the system because register banks become another system resource that must be allocated. So long as the number of active processes is smaller than the number of regbister banks, the situation is not too difficult, but if there are, say, 16 register banks and 17 active processes, suddenly the presence of multiple register banks becomes a serious problem.
Another problem that arises when there are multiple register banks is that of system access to data in a different bank. Allocating a register bank to a process requires loading that processes registers in the bank, and while this is being done, typically, the operating system will be using some other bank. This requires a special set of instructions just for the operating system for moving data to and from registr banks other than the current bank. Generally, these instructions are very obscure, and like all special system instructions, they aren't supported by compilers and force the operating system designer to descend to the assembly language level, something nobody likes to do these days.