CS:2630 Notes, Chapter 6

The Hawk monitor provides several subroutines to output numbers but for the moment, assume that these functions are missing so that we must write our own. The simplest way to write a binary to decimal conversion routine is to write it recursively. At each step, the number is divided into two parts, the least significant digit and the other digits. After division by 10, the remainder gives the least significant digit while the quotient gives the rest.

We use recursion because, if we are working in a left-to-right character set such as Greek, Roman or Cyrillic, we need to output the most significant digits first, while we compute the least significant digit first. Here is C code that does the same thing as the Hawk monitor's putdecu routine.

**A decimal output routine**
void putdecu( unsigned int n, int w ) { /* display n in decimal, padding to a width of w / / local variables and initialization / int quo = n / 10; / the quotient -- more significant digits / int rem = n % 10; / the remainder -- least significant digit / / output things to the left first / if (quo > 0) { / higher digits / putdecu(quo, w - 1); } else { / any leading blanks / while (w > 1) { putchar(' '); w = w - 1; } } / and then display this digit */ putchar(rem + '0'); }

A decimal output routine

void putdecu( unsigned int n, int w ) {
    /* display n in decimal, padding to a width of w */

    /* local variables and initialization */
    int quo = n / 10; /* the quotient -- more significant digits */
    int rem = n % 10; /* the remainder -- least significant digit */

    /* output things to the left first */
    if (quo > 0) { /* higher digits */
        putdecu(quo, w - 1);
    } else { /* any leading blanks */
        while (w > 1) {
            putchar(' ');
            w = w - 1;
        }
    }

    /* and then display this digit */
    putchar(rem + '0');
}

Calling putdecu(123,0) causes the recursive call putdecu(12,-1) which calls putdecu(1,-2). The latter calls putchar('1') before it returns. In turn, putdecu(12,-1) calls putchar('2') before it returns, allowing the outermost activation of putdecu to call putchar('3') and return. For the integer parameter 123, no leading blanks will be output unless the field-width parameter is greater than 3.

Informally, it might be convenient to think that the computer makes a copy of putdecu for each recursive call, but no copies are actually made. Instead, each time control enters the routine, new locations for each variable are assigned, and each time the routine returns, those copies are discarded. We refer to an activation of the routine being created when control enters the routine and discarded when the routine returns. During a recursive call, a routine has multiple coexisting activations.

Before converting this code to assembly language, we must answer several questions. Where are the local variables quo and rem stored? How are the parameters n and w passed and where are they stored while putdecu is running? How do we divide by ten? How can one subroutine call another, for example, the call to putchar() made by putdecu, or the recursive call to putdecu from within itself?

Note that any recursive program can be rewritten into a purely iterative program using auxiliary variables instead of recursion, and this iterative code is frequently faster than any recursive code. Here, we are not interested in doing this. Rather, what we will do is translate the recursive algorithm to assembly language that exactly preserves the original recursive structure.

Exercises

a) In the above code, explain why the expression rem+'0' works to convert integers from 0 to 9 into characters from '0' to '9'. This depends on a very specific property of the representations of the characters in our character set, so to answer this question, you may have to refer to the details of ASCII given in Chapter 2.

b) Write a main program in C to test putdecu() for a variety of values of n and w in order to find and verify the general rule for how many leading blanks are output?

c) Modify the above code so it takes a third parameter, the number base. As a result, calling putdecu(100,0,8) should output 144 because 100₁₀ is 144₈. First, deal with number bases up to 10, then change the code to handle higher number bases up to at least 16.

C Structures

Languages like Python, Java and C++ have classes. All objects of a class share a common structure, where each instance of that class holds corresponding variables, and all instances share a set of methods that operate on instances of that class.

C does not have a full-blown concept of class, but it does have a weaker data organizing mechanism known as the struct, short for structure. A C struct is like a class without methods. C structs are misnamed, because the general term data structure encompasses a wide variety of concepts including arrays, graphs, lists, queues, stacks and trees. The term record is used in COBOL, Pascal and Ada, among other languages, to refer to what C calls a struct in order to avoid this confusion.

Records, or structs are indeed useful in forming many types of data structures. Consider this example:

**C definiton of a binary tree node**
	struct treenode { struct treenode * left; struct treenode * right; int value; };

This declares a new type, struct treenode, where each instance of a struct treenode contains 3 fields, 2 pointers and an integer. left and right are pointers to other struct treenode objects, and value is the integer value in this node. Using struct treenode as a type name is awkward, and you can avoid it using an alternative and equally awkward construction:

**Alternative C binary tree node**
	typedef struct treenode treenode; struct treenode { treenode * left; treenode * right; int value; };

The first line above defines the type name treenode as a synonym for the as yet undefined struct treenode, so that this name can be used throughout the rest of the program, including in the definition of struct treenode. Using this alternate definition, we can statically construct a small binary tree as follows:

**C construction of a binary tree**
treenode grandchild = { NULL, NULL, 1}; treenode leftchild = { &grandchild, NULL, 2}; treenode rightchild = { NULL, NULL, 4}; treenode parent = { &leftchild, &rightchild, 3}; treenode * root = &parent;

C construction of a binary tree

treenode grandchild = { NULL, NULL, 1};
treenode leftchild  = { &grandchild, NULL, 2};
treenode rightchild = { NULL, NULL, 4};
treenode parent = { &leftchild, &rightchild, 3};
treenode * root = &parent;

In the above, we have initialized grandchild.value to 1, The dot notation for fields of a C struct works just like dot notation for fields of a Pascal or Ada record or fields of a Java or Python class. In many object oriented languages, the term handle has emerged as a synonym for pointer.

The C unary & operator returns a pointer to its operand, that is, the memory address of its operand. So, root is initialized to point parent and parent.left points to leftchild.

Confusingly, when pointers are involved, C offers two notations for accessing fields of a struct. Given the above, (*root).value is the parent.value, which has the value 3, but we can also write root->value to refer to parent.value. We will usually use the latter notation. Here is a little routine to traverse a tree in lexicographic order:

**Recursive tree traversal in C**
void treewalk(treenode * n) { if (n != NULL) { treewalk(n->left); printf("%d\n", n->value); treewalk(n->right); } }

The call treewalk(root) will output the values 1, 2, 3, 4 in order, one number per line. Inside treewalk the call to printf outputs n->value in decimal, (as commanded by %d) followed by newline (the \n).

Locals, Activation Records and the Stack

When talking about high level languages, it is common to say that the local variables of a subroutine are pushed on the stack when the subroutine is called and popped from the stack when the subroutine returns. This applies to most forms of subroutines, including procedures functions and methods, but it ignores a key issue: How is this done, where is the stack, and how are variables on the stack referenced.

The basic operations on a stack are push and pop. Since 1962, stacks have been illustrated using a stack of dishes. Imagine each dish holding a variable. To push a variable onto the stack, pile the dish holding it onto the stack of dishes. To pop a variable from the stack, remove the top dish from the stack. Some cafeterias have dish stacks on springs so that only the top dish is visible above the counter, making the terms push and pop even more intuitive. Adding a dish pushes the stack down into its well in the counter, and removing a dish lets the stack pop up.

In discussing local variables and subroutine calling, we say that all of the local variables in a particular subroutine are grouped together into a data structure called an activation record. This records one activation of the subroutine. To use the stack-of-dishes analogy, each dish holds one subroutine activation record, and the top dish in the stack holds the activation record for the current subroutine.

To continue with the analogy, a call pushes a new blank dish onto the stack. If a subroutine has parameters, write their values on the dish (use a dry-erase marker). Also note what you were doing at the time of the call, so you can return when you're done. When the subroutine assigns to a local variable, erase the previous value on the dish, if any, and write in the new value. When a subroutine returns, take the top dish off the stack.

The stack on the Hawk

In the last chapter, we noted that the Hawk monitor allocates space for the stack and sets up R2 to hold the address of that space. A variable or register holding the address of a data structure is called a pointer, so we refer to R2 as the stack pointer, abbreviated SP. Several monitor routines used the stack, but aside from the boilerplate framework for a main program, we have ignored R2. Here, we will focus on it.

Just as we can number the bits in a word from left to right or right to left, we can build a stack that grows from low memory to high or from high to low. Thus, pushing can either increment or decrement the stack pointer. On the Intel 80x86/Pentium family, pushing decrements the stack ponter (this was copied from the DEC PDP-11), while on the Hawk, the IBM RS6000 and Power PC, pushing increments the stack pointer.

There is a second choice. On entry to a subroutine, the stack pointer can point to the top item on the stack, or it can point to the free location beyond the top. It works either way. On entry to a Hawk subroutine, we will have the stack pointer R2 always point to the first unused word above the top item. As a result, the stack looks like this on subroutine entry;

**The Hawk stack on entry to a subroutine**
low memory
activation records of of subroutines waiting to be returned to
- stack pointer (`R2`) points here - free space available for activation record of called subroutine
high memory

Nothing here commits us to where the stack pointer This convention does not say where the stack pointer points during subroutine execution. It only says where it points on subroutine entry and exit. Within a subroutine, the code may do anything it wants with the stack pointer, so long as it undoes it before return. Initially, the stack pointer will always point to the start of the activation record except during calls. Later, in optomized code, we will increment the stack pointer immediately when local variables are allocated and decrement it just before return.

low memory

activation records of
of subroutines waiting
to be returned to

- stack pointer (R2) points here -

activation record of this subroutine

free space

high memory

By convention, we will use the first word of the activation record to hold the return address. A good compiler or a careful programmer will not make unnecessary use of memory, so the return address only needs to be stored when a subroutines calls other subroutines.

Beyond the return address, the rest of the activation record is available for local variables. If a subroutine does not call other subroutines, the local variables may all fit in registers, but when one routines calls another or if there are many local variables, we must allocate space for local variables on the stack.

Our calling convention is that a subroutine may use registers R1 and R3 to R7 with impunity. This means that when a subroutine calls another, if it has any values currently in registers up to R7, it must save them in its activation record before the call and restore them afterwards. Our convention also requires that if a subroutine alters R8 to R15, it must first save their previous values in the activation record, and it must restore them to their previous values before return.

Access to Fields of a Record, Load and Store

We will use a variations on instructions we have already seen to access fields of records, including variables that are fields of an activation record. The versions of LOAD and STORE we looked at in the last chapter used program-counter-relative addressing, but in fact, these instrucitons can be used to construct a memory address from the contents of any of general purpose register within the Hawk processor:

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 0 1 0 1 x

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

The versions of LOAD and STORE given in the last chapter had x set to zero, implying the use of the program counter. When x is nonzero, it specifies one of the 15 general purpose registers. The effective address is the sum of the contents of that register and the sign-extended second halfword of the instruction.

This approach to memory addressing is called indexed addressing. The register R[x] is referred to as the index register and the constant from the instruction is referred to as the displacement. Indexing is useful when the index register points to the first word of a data object such as a record and the displacement gives the location of a field in the object relative to the start of that object. Consider the problem of accessing records representing tree nodes such as those described in the C example given earlier.

0	`LEFT`
4	`RIGHT`
8	`VALUE`


; displacements of fields of one tree node record
LEFT    =       0
RIGHT   =       4
VALUE   =       8

; given the address of a tree node record in R3
LOAD    R4,R3,LEFT      ; get the LEFT pointer into R4
LOAD    R5,R3,RIGHT     ; get the RIGHT pointer into R5
LIS     R6,5
STORE   R6,R3,VALUE     ; set the VALUE field 5

If we use the stack pointer as the index register, and if it points to the first word of the topmost activation record on the stack, then we can use LOAD with appropriate displacements to examine variables of the activation record. Similarly, the STORE instruction can be used to change fields of the activation record.

Exercises

d) Look up the LOADCC instruction in the Hawk manual. How does it differ from LOAD? There are three differences, one in the binary code, two in the behavior. Give them all!

Addressing Modes

The Hawk architecture uses a variety of addressing modes, that is, modes of access to operands of instructions. Some of these addressing modes actually address memory, while others do not; the term addressing mode is used as a catch-all whether or not a memory address is used. In sum, the modes used on the Hawk are:

Immediate addressing
LIS R5,7: With immediate addressing, the operand is taken from a constant field of the instruction; in the example, the constant 7 is an 8-bit immediate constant.
Register addressing
MOVE R5,R7: With register addressing, the operand comes from or goes to a register; in the example, both the source and destination operands use register addressing.
Short-indexed addressing
LOADS R5,R7: Short indexed addressing is also called register-indirect addressing; here, an index register holds a pointer to the operand in memory. In this example, register 7 is used as an index register. A displacement of zero is implicit.
Indexed addressing
LOAD R5,R7,20: Indexed addressing was invented in the 1950s; in this mode, the operand address is the sum of the value in an index register and a constant, the displacement, taken from a field of the instruction. In this example, R7 points to an object and the displacement 20 identifies a field of that object.
Program-counter-relative addressing
LOAD R5,PHELLO: Program-counter relative addressing was invented in the early 1960s; in this mode, the operand address is the sum of the program counter and a displacement. In effect, this is indexed addressing using the program counter as an index register.

With the Hawk architecture, the addressing mode is generally determined by the instruction itself, that is, by the opcode. Thus the LIS, MOVE, LOADS and LOAD instructions are completely different, even though the primary difference between them is the addressing mode used to select the source operand. In contrast, LOAD using indexed mode and LOAD using PC-relative addressing are distinguished only by the contents of the index register field, where a zero indicates PC-relative addressing, while a nonzero value indicates indexing with one of the 15 general purpose registers.

Exercises

e) Look at the binary codes for LOAD, LOADS, STORE and STORES and having done so, explain how the Hawk CPU distinguishes between memory reference instructions that use indexed addressing and memory reference instructions that use short-indexed addressing.

f) Every hawk instruction that uses indexed addressing has a cousin that uses short indexed addressing. Furthermore, LOAD has a cousin called LEA that loads the effective address without going to memory. LOADS also has a cousin that does this that could have been named LEAS, the short-indexed cousin of LEA. What instruction is this? (Hint: There is a regular structure to the binary codes for these instructions.)

Addressing Modes on Some Other Machines

On some machines, a specific field of the instruction is used to encode the addressing mode for each operand. This idea was present in rudimentary form on the Digital Equipment Corporation PDP-5, introduced in 1964, and the PDP-8 family that grew from it.

`00`	`01`	`02`	`03`	`04`	`05`	`06`	`07`	`08`	`09`	`10`	`11`

opcode			i	z	address

i	z	mode	meaning
0	0	page zero	M[address]
0	1	current page	M[address \| (pc & 111110000000₂)]
1	0	indirect page zero	M[M[address]]
1	0	indirect current page	M[M[address \| (pc & 111110000000₂)]]

With a 12-bit memory address but only a 7-bit address field in the instruction, a PDP-8 instruction cannot directly address all of memory. To solve this, the PDP-8 designers broke memory into 128 word pages, where the most significant 5 bits of the address select the page and the least significant 7 bits select the word in that page. A PDP-8 instruction could only directly addresses page zero and the current page, determined by the high 5 bits of the program counter. Machines like the Hawk use Program-counter relative addressing to solve this problem.

If we ignore the complexity of the paged structure of the PDP-8 memory, we see that the PDP-8 had the following two basic addressing modes, neither of which is present on the Hawk, but both of which can be imitated with sequences of Hawk instructions:

Direct addressing

A constant field in the instruction holds the operand address. This is the oldest addressing mode, found on computers designed in the 1940s. A direct addressed store instruction in SMAL might look like this:

DSTORE R,ADDR

If we want to implement this on the Hawk, we need to set aside a register to hold the effective address. R1 will work. We can call it EA using the following definition:

EA = R1

Using this, we can implement DSTORE on the Hawk using the following SMAL macro:

MACRO DSTORE =r,=addr
LIW EA,addr
STORES r,EA
ENDMAC

Indirect addressing

A constant field in the instruction holds the address of the word in memory holding the operand address. That is, a word in memory points to the operand. Indirect addressing was invented in the 1950s. An indirect load instruction on the Hawk could look like this:

ILOAD R,ADDR

We cold implement this with the following Hawk macro:

MACRO ILOAD =r,=addr
LIW EA,addr
LOADS EA,EA
LOADS r,EA
ENDMAC

The Digital Equipment Corporation PDP-11 computer introduced in 1970 had a wide range of addressing modes. This was imitated by the Motorola 68000 microprocessor in the mid 1970s and further expanded on by the DEC VAX computer in the late 1970s and early 1980s. The VAX took this idea to the greatest extreme, with 16 distinct addressing modes. Here are the the original PDP-11 addressing modes.

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

opcode m r m r

source destination

m	mode	meaning
000	register	R[r]
001	register indirect	M[R[r]]
010	auto-increment	M[R[r]++]
011	auto-increment indirect	M[M[R[r]++]]
100	auto-decrement	M[--R[r]]
101	auto-decrement indirect	M[M[--R[r]]]
110	indexed	M[R[r]+M[PC++]]
111	indexed indirect	M[M[R[r]+M[PC++]]]

The PDP-11 had a 16-bit word and 8 general-purpose registers, where R[7] was the program counter. Many instructions had both source and a destination fields, where each could use any addressing mode. The indexed mode on the PDP-11 uses the next 16-bit word in the instruction stream as the displacement, exactly as in the Hawk, but both the source and destination fields could be indexed. Indirection could be applied to any addressing mode, generalizing on what the PDP-8 had done. The new modes on the PDP-11 are:

Auto-increment addressing
Auto-decrement addressing: These are like register-indirect addressing except that the register holding the operand address is incremented or decremented as a side effect of use. The decrement was done before use, while the increment was done after, and the magnitude of the increment or decrement was set by the operand size. Memory was byte-addressable and the opcode determined whether the operand was one, two or four bytes.

It is convenient that languages descended from C such as C++ and Java allow us to write M[R[r]++] as a compact description of auto-increment addressing. This is not entirely a coincidence. This feature was not present in BCPL, the language that was the predecessor of C. The C language was originally designed as Unix operating system development moved to the PDP-11, and the C operators ++ and -- have frequently been described as responses to the use of auto-increment and auto-decrement addressing on the PDP-11.

Curiously, the PDP-11 allowed immediate and direct addressing modes as side-effects of other modes. Recall that register 7 was the program counter on the PDP-11. Auto-increment addressing in the source field using register 7 would fetch the next word from the instruction stream as an immediate constant. Similarly, auto-increment indirect addressing would fetch the next word from the instruction stream and use it as an address, that is, direct addressing.

There are other more obscure addressing modes. The IBM 360 architecture supports Double indexed addressing, where the contents of two different registers are added to a 12-bit constant from the instruction register in order to compute the effective address. The DEC VAX architecture once the most popular server architecture on the Internet, had a 4-bit addressing mode field, allowing all of the PDP-11 addressing modes plus others. Index constants could be 8, 16 or 32-bits long, and there were indirect-indexed addressing modes.

Exercises

g) Suppose you to translate an assembly language program for some other computer to Hawk asseembly language. The original contains a load to register 5 using direct addressing to reference a memory labeled LOC. Write an equivalent Hawk instruction sequence.

h) Suppose you to translate an assembly language program for some other computer to Hawk asseembly language. The original contains a load to register 5 using auto-increment addressing through register 3, incrementing after the memory reference, as on the PDP-11. The operand size is one word. Write an equivalent Hawk instruction sequence.

Setting up an Activation Record

Looking back at the numeric output routine at the start of this chapter, we are interested in constructing an activation record that can be used to store the return address, the parameters n (the number) and w (the width), and the local variables rem and quo during the execution of functions called from within the putdecu() function. This suggests the following five-word activation record:

**An activation record for `putdecu()`**
0	return address
4	`n`
8	`w`
12	`rem`
16	`quo`

A picture of a five-word data object is easy to draw, but assemblers cannot read pictures. What we need is symbolic names for each field of the object, where the value of each name gives the offset from the start of the object to that field. Here again, assembly language programmers must adopt conventions to clearly document program details that high-level languages enforce with their syntax. We will use the same conventions we used with the tree-node example earlier in this chapter:

**An activation record for `PUTDECU`**
; activation record format for PUTDECU numeric output routine RETAD = 0 ; the return address N = 4 ; the number to print W = 8 ; the width REM = 12 ; the remainder QUO = 16 ; the quotient ARSIZE = 20 ; size of the activation record in bytes

; activation record format for PUTDECU numeric output routine
RETAD   =       0       ; the return address
N       =       4       ; the number to print
W       =       8       ; the width
REM     =       12      ; the remainder
QUO     =       16      ; the quotient
ARSIZE  =       20      ; size of the activation record in bytes

Note, in the above, that we have also defined the symbol ARSIZE as the size of the activation record. We will always use this name for the size of the activation record for each subroutine, redefining it as needed before the body of that routine.

Given these definitions, and given that the stack pointer R2 points to the start of such an activation record, we can store a value from register 4 into the remainder using STORE R4,R2,REM and load it back into register 3 using LOAD R3,R2,REM. These two instructions use the contents of the index register R2 plus the constant REM to compute their effective address. Here are some other example uses of the activation record:

**Accessing fields of the activation record for `PUTDECU`**
STORE R1,R2,RETAD ; Store into the return address from R1 LOAD R1,R2,RETAD ; Load from the return address into R1 STORES R1,R2 ; Equivalent to the above LOADS R1,R2 ; because RETAD is zero STORE R3,R2,REM ; Store into the REM field from R3 LOAD R3,R2,REM ; Load from the REM field into R3 LEA R3,R2,REM ; Get the address of the REM field into R3

STORE   R1,R2,RETAD    ; Store into the return address from R1
LOAD    R1,R2,RETAD    ; Load from the return address into R1

STORES  R1,R2          ; Equivalent to the above
LOADS   R1,R2          ;   because RETAD is zero

STORE   R3,R2,REM      ; Store into the REM field from R3
LOAD    R3,R2,REM      ; Load from the REM field into R3
LEA     R3,R2,REM      ; Get the address of the REM field into R3

Note that any symbol defined by assembly-time assignment can be redefined by another assignment. As a result, if a single assembly source file holds two subroutines, and if the activation record structure for each is defined before the code, there is no problem if the same names are used in both activation records. One name in particular, ARSIZE, will be defined for each activation record, giving its size. Because we will always use LOADS and STORES to access the return address, we will rarely need to define the RETAD field, although we will always include comments to indicate that it is there.

The Hawk Monitor's Divide Routine

Our recursive numeric output routine began by dividing the number to be printed by ten in order to get the remainder and quotient. On some machines, this is done with a machine instruction, but on the Hawk, multiply and divide are both supported by software in the monitor with the interface definitions in the header file stdlib.h. Here is the documentation already given in Chapter 5:

The monitor also provides TIMES for multiplying signed integers, and TIMESU for unsigned integers. The definitive documentation for these routines is in the monitor.h file.

Our focus here is how one subroutine can call another. We begin by focusing on the non-recursive call from our decimal print routine to DIVIDEU before looking at the recursive call. As it turns out, the latter is not materially different.

Calling Sequences

The main programs we wrote in the previous chapter were simple, with no local variables and no calls other than to monitor routines. As a result, outside of the boilerplate framework given for a main program, we ignored the stack. Now, we must ask, how can we push the current activation record before a call and how can we pop it back afterwards.

The principal thing to remember is this: On entry and exit from a subroutine, the stack pointer R2 always points to the first word of that routine's activation record. Initially, we will leave the stack pointer alone in the routine's body. As a result, right before each call, we must update the stack pointer to point to the first word of the new activation record, and right after each return, we must undo this. The following skeleton for a call to DIVIDEU from within our numeric output routine does this:

**The `DIVIDEU` calling sequence**
; first put the dividend in R3 and the divisor in R4 ; and make sure nothing valuable is in R1 or R5-7 ADDI R2,R2,ARSIZE ; push the current activation record LIL R1,DIVIDEU JSRS R1,R1 ; call DIVIDEU ADDI R2,R2,-ARSIZE ; pop the current activation record ; get the quotient from R3 and the remainder from R4

;       first put the dividend in R3 and the divisor in R4
;       and make sure nothing valuable is in R1 or R5-7

        ADDI    R2,R2,ARSIZE    ; push the current activation record
        LIL     R1,DIVIDEU
        JSRS    R1,R1           ; call DIVIDEU
        ADDI    R2,R2,-ARSIZE   ; pop the current activation record

;       get the quotient from R3 and the remainder from R4

The sequence of instructions needed to call a subroutine is called the calling sequence for that routine; the calling sequence given above is the standard calling sequence for all external subroutines on the Hawk. The calling sequence begins with putting the parameters in the appropriate registers, an it ends with getting the results, if any. The code for those details depends on the details of the parameters and results.

External subroutines are those that are given in a separate source file. We will only make a slight change for calls to local or internal subroutines, those defined in the same source file. For the recursive calls in our print routine we will use this calling sequence:

**The `PUTDECU` calling sequence**
; first put the number to be printed in R3 and the width in R4 ; and make sure nothing valuable is in R1 or R5-7 ADDI R2,R2,ARSIZE ; push the current activation record JSR R1,PUTDECU ; call PUTDECU ADDI R2,R2,-ARSIZE ; pop the current activation record ; nothing was returned


;       first put the number to be printed in R3 and the width in R4
;       and make sure nothing valuable is in R1 or R5-7

        ADDI    R2,R2,ARSIZE    ; push the current activation record
        JSR     R1,PUTDECU      ; call PUTDECU
        ADDI    R2,R2,-ARSIZE   ; pop the current activation record

;       nothing was returned

There are only two differences between this calling sequence and the calling sequence for the unsigned divide routine. First, it uses the JSR instruction to jump directly to the subroutine, using program-counter relative addressing instead of going through the more awkward process of first loading the address of the subroutine in a register before use. This works because the destination address PUTDECU is local.

JSR is really the same instruction as JUMP, except that JUMP has an implicit destination field of zero, indicating that the return address should be discarded. As a rule, the Hawk has only two interpretations for register fields that are zero: Index register zero refers to the program counter, while operand register zero is the constant zero, as a source, or it causes the result to be discarded, as a destination.

As already mentioned, we always pass parameters and return results in R3 and up. Our conservative convention for any registers not used to pass parameters or return results is as follows: If the calling routine has anything of value in R3 to R7, it should save it before the call and restore it after return. The caller may use these registers with impunity. It is up to the called routine to return with R8 to R15 unchanged, so if these registers are needed by a subroutine, they should be saved before use and restored afterward. Any time a subroutine needs to save and restore the value in some register, it will need a local variable for this.

For every calling sequence, there is a matching receiving sequence, used at the entry point of the called routine to receive control from the caller, and a matching return sequence, used at the end of the called routine to return to the caller. The first half of the calling sequence works with the receiving sequence to transfer parameters and control, saving whatever neeeds saving, while the return sequence works with the second half of the calling sequence to return to the caller and restore what is needed. The receiving and return sequences for our example are:

**The `PUTDECU` receiving and return sequences**
; receiving sequence assumptions PUTDECU:; expects R3 -- the number to be printed ; R4 -- the field width STORES R1,R2 ; save the return address ... ; body of subroutine goes here ; return sequence assertions LOADS PC,R2 ; return

;       receiving sequence assumptions
PUTDECU:; expects R3 -- the number to be printed
        ;         R4 -- the field width
        STORES  R1,R2           ; save the return address

                ...             ; body of subroutine goes here

;       return sequence assertions
        LOADS   PC,R2           ; return

In the above code, we used STORES R1,R2 to save the return address in the activation record. We could have use STORE R1,R2,RETAD to do this, but because we have allocated the return address as the first field of the activation record, we can use the shorter instruction to access that field. As a general rule, the variable that is used most intensely in a subroutine should be put at the start of the activation record in order to speed all references to that variable; in our example, none of the variables are used very intensely, so this is not an issue.

The final return from the function is done with a LOADS instruction, loading the saved return address directly into the program counter. We could have loaded the return address into some other register and then used JUMPS, the short-indexed version of the JUMP instruction, which is to say, the JSRS instruction with the destination field set to zero to indicate that the return address should be discarded.

The receiving sequence stores the return address in the activation record, allowing the subroutine to call other subroutines. On exit, the return sequence restores the return address from the activation record. Had the subroutine made no other calls, or had other calls been linked through some other register than R1, there would have been no need to save and restore R1. A good compiler, when it generates machine code for a subroutine, will always make such optimizations when possible, but generally, assembly language programmers are advised to use the most general calling sequence during development, and only try such optimizations at the very end, after making sure that everything works.

Exercises

i) Suppose someone built a new version of the Hawk computer with the JUMP, JSR and JSRS instructions eliminated, leaving JUMPS as the only instruction in the jump family. The new machine still has LEA, LOAD and the various load immediate operations. Give an instruction sequence that does almost everything that JUMP does. Are there any important restrictions on this instruction sequence that you'd need to worry about when replacing JUMP instructions?

j) Suppose someone built a new version of the Hawk as suggested above. Given an instruction sequence that does almost everything that JSR does Can you formulate this sequence as a macro named JSR?

k) Suppose someone built a new version of the Hawk as suggested above. Given an instruction sequence that can do almost everything that JSRS does. Suppose you had to move some Hawk code to the new machine, and the code contained the instruction JSRS R1,R1; what problems does this pose when you use your instruction sequence?

Putting it all Together

We have covered all the parts needed to write a Hawk assembly language subroutine equivalent to the putdec() routine at the start of this chapter. The first version of the code is quite long because it is not optimized. It is just a translation of the C code from the start of the chapter.

Each C statement or small group of C statements has been directly translated into a block of assembly code without any examination of global context. Between blocks, all variables are stored in the activation record, and each block is written without any attention to what was already in registers at the end of the previous block.

This approach to assembly-language progrmming requires less thought and less knowledge than any other, but it produces large and slow code. In this code, whenever a register is needed, R3 is used except when some other register was dictated by the need to pass parameters or examine results.

**The `PUTDECU` subroutine (no optimization)**
; activation record format for PUTDECU numeric output routine RETAD = 0 ; the return address N = 4 ; the number to print W = 8 ; the width REM = 12 ; the remainder QUO = 16 ; the quotient ARSIZE = 20 ; size of activation record in bytes PUTDECU:; expects R3 -- the number to be printed ; R4 -- the field width STORES R1,R2 ; -- save return address and parameters STORE R3,R2,N ; -- local variables and initialization STORE R4,R2,W LOAD R3,R2,N ; -- parameter dividend = n LIS R4,10 ; -- parameter divisor = 10 ADDI R2,R2,ARSIZE ; -- push the current activation record LIL R1,DIVIDEU JSRS R1,R1 ; -- call DIVIDEU ADDI R2,R2,-ARSIZE ; -- pop the current activation record STORE R3,R2,QUO ; quo = n / 10 STORE R4,R2,REM ; rem = n % 10 (remainder) LOAD R3,R2,QUO ; -- output things to the left first CMPI R3,0 BZS ELSE ; if (quo > 0) { -- higher digits LOAD R3,R2,QUO ; -- parameter quo LOAD R4,R2,W ; -- parameter w - 1 ADDSI R4,-1 ADDI R2,R2,ARSIZE ; -- push the current activation record JSR R1,PUTDECU ; putdec(quo, w - 1); ADDI R2,R2,-ARSIZE ; -- pop the current activation record BR ENDIF ELSE: ; } else { -- any leading blanks WHILE: ; while (w > 1) { LOAD R3,R2,W ; -- test w CMPI R3,1 BLE ENDWHILE LIS R3,' ' ; -- parameter ' ' ADDI R2,R2,ARSIZE ; -- push the current activation record LIL R1,PUTCHAR JSRS R1,R1 ; putchar(' ') ADDI R2,R2,-ARSIZE ; -- pop the current activation record LOAD R3,R2,W ADDSI R3,-1 STORE R3,R2,W ; w = w - 1 BR WHILE ENDWHILE: ; } ENDIF: ; } LOAD R3,R2,REM ; -- and then display this digit ADDI R3,R3,'0' ; -- parameter rem + '0' ADDI R2,R2,ARSIZE ; -- push the current activation record LIL R1,PUTCHAR JSRS R1,R1 ; putchar( rem + '0' ); ADDI R2,R2,-ARSIZE ; -- pop the current activation record LOADS PC,R2 ; return

; activation record format for PUTDECU numeric output routine
RETAD   =       0       ; the return address
N       =       4       ; the number to print
W       =       8       ; the width
REM     =       12      ; the remainder
QUO     =       16      ; the quotient
ARSIZE  =       20      ; size of activation record in bytes

PUTDECU:; expects R3 -- the number to be printed
        ;         R4 -- the field width
        STORES  R1,R2           ; -- save return address and parameters
        STORE   R3,R2,N         ; -- local variables and initialization
        STORE   R4,R2,W

        LOAD    R3,R2,N         ; -- parameter dividend = n
        LIS     R4,10           ; -- parameter divisor = 10
        ADDI    R2,R2,ARSIZE    ; -- push the current activation record
        LIL     R1,DIVIDEU
        JSRS    R1,R1           ; -- call DIVIDEU
        ADDI    R2,R2,-ARSIZE   ; -- pop the current activation record
        STORE   R3,R2,QUO       ; quo = n / 10
        STORE   R4,R2,REM       ; rem = n % 10 (remainder)

        LOAD    R3,R2,QUO       ; -- output things to the left first 
        CMPI    R3,0
        BZS     ELSE            ; if (quo > 0) { -- higher digits

        LOAD    R3,R2,QUO       ;   -- parameter quo
        LOAD    R4,R2,W         ;   -- parameter w - 1
        ADDSI   R4,-1
        ADDI    R2,R2,ARSIZE    ;   -- push the current activation record
        JSR     R1,PUTDECU      ;   putdec(quo, w - 1);
        ADDI    R2,R2,-ARSIZE   ;   -- pop the current activation record
        BR      ENDIF
ELSE:                           ; } else { -- any leading blanks
WHILE:                          ;   while (w > 1) {
        LOAD    R3,R2,W         ;     -- test w
        CMPI    R3,1
        BLE     ENDWHILE

        LIS     R3,' '          ;     -- parameter ' '
        ADDI    R2,R2,ARSIZE    ;     -- push the current activation record
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;     putchar(' ')
        ADDI    R2,R2,-ARSIZE   ;     -- pop the current activation record

        LOAD    R3,R2,W
        ADDSI   R3,-1
        STORE   R3,R2,W         ;     w = w - 1
        BR      WHILE
ENDWHILE:                       ;   }
ENDIF:                          ; }
        LOAD    R3,R2,REM       ; -- and then display this digit
        ADDI    R3,R3,'0'       ; -- parameter rem + '0'
        ADDI    R2,R2,ARSIZE    ; -- push the current activation record
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ; putchar( rem + '0' );
        ADDI    R2,R2,-ARSIZE   ; -- pop the current activation record

        LOADS   PC,R2           ; return

For example, in the above, the receiving sequence takes both parameters and stores them in the activation record. The first parameter is saved with STORE R3,R2,N. Just two instructions later, there is a LOAD R3,R2,N instruction at the start of the SMAL Hawk code for what was written as quo=n/10 in C. Nothing changed the value of R3 between these two instructions, so there is no need for the LOAD instruction. Furthermore, once the LOAD is deleted, the only reference to the N field of the activation record is the STORE instruction. If a variable has a value assigned to it that is never used, the variable itself is unnecessary, so we can delete that STORE and delete N from the activation record.

As a second example, the statement if(quo>0) in the C code was translated to SMAL Hawk code beginning with a LOAD R3,R2,QUO instruction to get the local variable QUO into R3, when in fact, R3 already held that value, so that LOAD instruction was comletely unneeded. The very next LOAD instruction in the code is another LOAD R3,R2,QUO, this time, preparing for the recursive call putdec(quo,w-1). Again, the desired value is already in R3 so this instruction is unneeded.

Once the two LOAD R3,R2,QUO instructions are eliminated, the only remaining instruction that references QUO is the STORE R3,R2,QUO immediately after the call to DIVIDEU. As in our first example, this means that we can delete this STORE and the variable QUO.

Exercises

l) Write a Hawk main program to test the above PUTDECU subroutine by using it to print out 10000₁₆ in decimal. How many calls to PUTDECU does this test perform, including both the call from the main program and the recursive calls? (Put the subroutine in the same source file as the main program, don't mess with separate assembly and linkage yet, and you will need to rename your PUTDECU to avoid conflict with the monitor.)

m) Modify PUTDECU to eliminate the redundant load instructions discussed in the text above. If you do the previous exercise, you can test your result.

n) Modify PUTDECU so it takes a additional parameter, the number base, and prints a number in any base between 2 and 10. (Initially, don't bother checking for illegal number bases.)

o) Write a Hawk assembly language function PUTDEC to print a signed integer. This should be equivalent to the following:

                void putdec( int n ) {
                    /* output the integer n */
                    if (n < 0) { n = -n; putchar('-'); }
                    putdecu( n );
                }

Errors

It is very likely that you will make errors in your programs. If you forget to save the return address or if you accidentally damage the saved return address on the stack, your return from a subroutine will jump to a wild memory location. If you accidentally change R2, attempts to use the activation record will access unpredictable memory locations.

The most common consequence of such errors is reported by the Hawk monitor as a bus trap. In general, traps are error conditions detected by hardware. When a program tries to store data in read-only memory or load data from nonexistant memory, the hardware detects this and reports a bus trap. The bus is the system component that connects the CPU to the memory. When the CPU tries to read or write memory, it puts the memory address on the bus and then waits for a reply. If there is no reply, the bus triggers the CPU trap mechanism to report the error.

Sometimes, when a program fails, it will jump to a location that is a legal memory address, either in read-only or read-write memory and then execute random data as instructions. While most possible data values are legal Hawk instructions, some are not. When the hardware encounters an opcode that is not defined, the result is an instruction trap.

In general, when the hardware detects a trap, it transfers control to the operating system. In our case, the hawk monitor takes control. The monitor's default trap service routine reports the trap PC and the trap MA. The trap PC is the address of the instruction that caused the trap, and in the case of a bus trap, the trap memory address is the illegal memory address that the program tried to use. The monitor then restores all of the CPU registers and halts.

If you are running a Hawk emulator or debugger, you can view the instruction that caused the trap by typing in the trap PC followed by the m (display memory) command. If the program that was running was yours, you should be able to relate the disassembled code in the debugger display to your code. Debugging is harder when the program takes a wild jump to an instruction that cannot be fetched. In that case, the contents of other registers may help identify what hapened.

In the case of bus traps, the trap MA gives the word address that caused the trap. If the trap was caused by an attempt to fetch an instruciton from nonexistant memory, this will equal the program counter. If a load or store instruction was the cause, trap PC will point to this instruction, and trap MA will be the effective address. Note that the least significant two bits of the trap memory address give more detail about the cause of the trap and should not be interpreted as part of the address.

The most difficult traps to debug are those that are the result of wild jumps. There are two common cases, wild jumps as the result of attempting to return from a subroutine after the stack pointer or activation record have been corrupted, or wild jumps as the result of incorrect computation of the destination address of a jump or subroutine call. In either case, the registers are likely to hold evidence of what was going on before the wild jump.

Optimization

As pointed out above, the code we have given is not very good. We can tighten this code up in several ways: First, as already mentioned, there are several places where we loaded a value from memory that was already in registers. Second, the Hawk has several instructions that can be used to tighten the code. Third, if you know about these instructions, you can redesign the algorithm to make them useful.

07 06 05 04 03 02 01 00 15 14 13 12 11 10 09 08

1 1 1 1 dst 0 1 0 0 x

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

const16

The LOADCC instruction does the same thing as LOAD, but it also sets the N and Z condition codes to report whether the value loaded is negative or zero. As a result, it is like following a LOAD by a comparison with zero, although it sets the C condition code differently. This allows us to make the following optimization:

**Original code**
LOAD R3,R2,QUO CMPI R3,0 BZS ELSE

**Optimized code**
LOADCC R3,R2,QUO BZS ELSE

The LOADCC instruction works for BZS and BZR to test for equality or inequality with zero, and it works with BGT, BGE, BLE and BLT for testing signed integer values.

Sometimes, a program needs to test a value in memory but does not need to load it into a register. In that case LOADCC R0 can be used to inspect the value from memory, setting the condition codes to report on its value without loading the value The hawk.h header file defines TEST as a synonym for LOADCC R0 in order to allow programmers to more clearly document the fact that a load instruction is being used only to test a value.

Just as LOAD has a short-indexed cousin LOADS, LOADCC and TEST have short-indexed cousins LOADSCC and TESTS. If the value the program needs to test is in a register, MOVECC is a variant of MOVE that sets the condition codes:

**The Hawk MOVECC instruction**
`07`	`06`	`05`	`04`	`03`	`02`	`01`	`00`	`15`	`14`	`13`	`12`	`11`	`10`	`09`	`08`

1 1 1 1				dst				1 1 1 0				x

To test a register without moving its value anywhere, use R0 as a destination. As with TEST and TESTS, the hawk.h header file defines TESTR as an abbreviation for MOVECC R0.

The test instructions on the Hawk mean that the code for comparison with zero is faster and smaller code than the code for comparison with other values. This is true on many computers, so programmers should generally favor comparison with zero over other constants. We can rewrite putdecu, for example, to use a test for zero instead of while(w>1). Also, remember that, on the Hawk, arithmetic instructions such as ADD, ADDI, and ADDSI all set the condition codes. As a result, you will rarely need to use TESTR to check the result of an arithmetic operation.

**The `PUTDECU` subroutine, partially optimized**
; activation record format for PUTDECU numeric output routine RETAD = 0 ; the return address W = 4 ; the width REM = 8 ; the remainder ARSIZE = 12 ; size of activation record in bytes PUTDECU:; expects R3 -- n, the number to be printed ; R4 -- w, the field width STORES R1,R2 ; -- save return address and parameters STORE R4,R2,W ; -- parameter dividend = n (in R3) LIS R4,10 ; -- parameter divisor = 10 ADDI R2,R2,ARSIZE ; -- push the current activation record LIL R1,DIVIDEU JSRS R1,R1 ; -- call DIVIDEU ADDI R2,R2,-ARSIZE ; -- pop the current activation record STORE R4,R2,REM ; rem = n % 10 (remainder) ; quo = n / 10 TESTR R3 BZS ELSE ; if (quo > 0) { -- higher digits ; -- parameter quo LOAD R4,R2,W ; -- parameter w - 1 ADDSI R4,-1 ADDI R2,R2,ARSIZE ; -- push the current activation record JSR R1,PUTDECU ; putdec(quo, w - 1); ADDI R2,R2,-ARSIZE ; -- pop the current activation record BR ENDIF ELSE: ; } else { -- any leading blanks FOR: ; for (;;) { LOAD R3,R2,W ADDSI R3,-1 STORE R3,R2,W ; w = w - 1 BLE ENDFOR ; if (w <= 0) break; LIS R3,' ' ; -- parameter ' ' ADDI R2,R2,ARSIZE ; -- push the current activation record LIL R1,PUTCHAR JSRS R1,R1 ; putchar(' ') ADDI R2,R2,-ARSIZE ; -- pop the current activation record BR FOR ENDFOR: ; } ENDIF: ; } LOAD R3,R2,REM ; -- and then display this digit ADDI R3,R3,'0' ; -- parameter rem + '0' ADDI R2,R2,ARSIZE ; -- push the current activation record LIL R1,PUTCHAR JSRS R1,R1 ; putchar( rem + '0' ); ADDI R2,R2,-ARSIZE ; -- pop the current activation record LOADS PC,R2 ; -- get the return address

; activation record format for PUTDECU numeric output routine
RETAD   =       0       ; the return address
W       =       4       ; the width
REM     =       8       ; the remainder
ARSIZE  =       12      ; size of activation record in bytes

PUTDECU:; expects R3 -- n, the number to be printed
        ;         R4 -- w, the field width
        STORES  R1,R2           ; -- save return address and parameters
        STORE   R4,R2,W
                                ; -- parameter dividend = n (in R3)
        LIS     R4,10           ; -- parameter divisor = 10
        ADDI    R2,R2,ARSIZE    ; -- push the current activation record
        LIL     R1,DIVIDEU
        JSRS    R1,R1           ; -- call DIVIDEU
        ADDI    R2,R2,-ARSIZE   ; -- pop the current activation record
        STORE   R4,R2,REM       ; rem = n % 10 (remainder)
                                ; quo = n / 10
        TESTR   R3
        BZS     ELSE            ; if (quo > 0) { -- higher digits
                                ;   -- parameter quo
        LOAD    R4,R2,W         ;   -- parameter w - 1
        ADDSI   R4,-1
        ADDI    R2,R2,ARSIZE    ;   -- push the current activation record
        JSR     R1,PUTDECU      ;   putdec(quo, w - 1);
        ADDI    R2,R2,-ARSIZE   ;   -- pop the current activation record
        BR      ENDIF
ELSE:                           ; } else { -- any leading blanks
FOR:                            ;   for (;;) {
        LOAD    R3,R2,W
        ADDSI   R3,-1
        STORE   R3,R2,W         ;     w = w - 1
        BLE     ENDFOR          ;     if (w <= 0) break;

        LIS     R3,' '          ;     -- parameter ' '
        ADDI    R2,R2,ARSIZE    ;     -- push the current activation record
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;     putchar(' ')
        ADDI    R2,R2,-ARSIZE   ;     -- pop the current activation record

        BR      FOR 
ENDFOR:                         ;   }
ENDIF:                          ; }
        LOAD    R3,R2,REM       ; -- and then display this digit
        ADDI    R3,R3,'0'       ; -- parameter rem + '0'
        ADDI    R2,R2,ARSIZE    ; -- push the current activation record
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ; putchar( rem + '0' );
        ADDI    R2,R2,-ARSIZE   ; -- pop the current activation record

        LOADS   PC,R2           ; -- get the return address

The somewhat optimized code above still makes poor use of registers. This is most obvious in the loop that outputs any required leading blanks, where the code is constantly reaching into the activation record for the field width w. The code given here makes no use of registers R8 and up. If we use these registers, the code of the loop can be significantly tightened.

We can do better. Every call in the code is bracketed by a pair of ADDI instructions to push and pop the activation record. Why not just push once on entry and pop once on return? This requires a change in the way we access local variables. If R2 has already been incremented by ARSIZE, we must subtract ARSIZE from every displacement into the activation record. Thus, we replace LOAD R3,R2,REM with LOAD R3,R2,REM-ARSIZE. This is ugly, but the payoff can be big.

**The `PUTDECU` subroutine, further optimized**
; activation record format for PUTDECU numeric output routine RETAD = 0 ; the return address R8SV = 4 ; saved value of R8 REM = 8 ; the remainder ARSIZE = 12 ; size of activation record in bytes PUTDECU:; expects R3 -- n, the number to be printed ; R4 -- w, the field width ; uses R8 -- w STORES R1,R2 ; -- save return address and parameters STORE R8,R2,R8SV ; -- save register ADDI R2,R2,ARSIZE ; -- push the current activation record MOVE R8,R4 ; -- parameter dividend = n (in R3) LIS R4,10 ; -- parameter divisor = 10 LIL R1,DIVIDEU JSRS R1,R1 ; -- call DIVIDEU STORE R4,R2,REM-ARSIZE; rem = n % 10 (remainder) ; quo = n / 10 TESTR R3 BZS ELSE ; if (quo > 0) { -- higher digits ; -- parameter quo MOVE R4,R8 ; -- parameter w - 1 ADDSI R4,-1 JSR R1,PUTDECU ; putdec(quo, w - 1); BR ENDIF ELSE: ; } else { -- any leading blanks FOR: ; for (;;) { ADDSI R8,-1 BLE ENDFOR ; if (w <= 0) break; LIS R3,' ' ; -- parameter ' ' LIL R1,PUTCHAR JSRS R1,R1 ; putchar(' ') BR FOR ENDFOR: ; } ENDIF: ; } LOAD R3,R2,REM-ARSIZE; -- and then display this digit ADDI R3,R3,'0' ; -- parameter rem + '0' LIL R1,PUTCHAR JSRS R1,R1 ; putchar( rem + '0' ); ADDI R2,R2,-ARSIZE ; -- pop the current activation record LOAD R8,R2,R8SV ; -- restore saved register LOADS PC,R2 ; -- get the return address

; activation record format for PUTDECU numeric output routine
RETAD   =       0       ; the return address
R8SV    =       4       ; saved value of R8
REM     =       8       ; the remainder
ARSIZE  =       12      ; size of activation record in bytes

PUTDECU:; expects R3 -- n, the number to be printed
        ;         R4 -- w, the field width
        ; uses    R8 -- w
        STORES  R1,R2           ; -- save return address and parameters
        STORE   R8,R2,R8SV      ; -- save register
        ADDI    R2,R2,ARSIZE    ; -- push the current activation record
        MOVE    R8,R4
                                ; -- parameter dividend = n (in R3)
        LIS     R4,10           ; -- parameter divisor = 10
        LIL     R1,DIVIDEU
        JSRS    R1,R1           ; -- call DIVIDEU
        STORE   R4,R2,REM-ARSIZE; rem = n % 10 (remainder)
                                ; quo = n / 10
        TESTR   R3
        BZS     ELSE            ; if (quo > 0) { -- higher digits
                                ;   -- parameter quo
        MOVE    R4,R8           ;   -- parameter w - 1
        ADDSI   R4,-1
        JSR     R1,PUTDECU      ;   putdec(quo, w - 1);
        BR      ENDIF
ELSE:                           ; } else { -- any leading blanks
FOR:                            ;   for (;;) {
        ADDSI   R8,-1
        BLE     ENDFOR          ;     if (w <= 0) break;

        LIS     R3,' '          ;     -- parameter ' '
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ;     putchar(' ')

        BR      FOR
ENDFOR:                         ;   }
ENDIF:                          ; }
        LOAD    R3,R2,REM-ARSIZE; -- and then display this digit
        ADDI    R3,R3,'0'       ; -- parameter rem + '0'
        LIL     R1,PUTCHAR
        JSRS    R1,R1           ; putchar( rem + '0' );

        ADDI    R2,R2,-ARSIZE   ; -- pop the current activation record
        LOAD    R8,R2,R8SV      ; -- restore saved register
        LOADS   PC,R2           ; -- get the return address

In the above, we eliminated four instructions that adjusted the stack pointer. Wherever the stack pointer was left pointing off the end of the activation record, we had to subtract the activation record size from each index displacement used to access an activation record field. In effect, instead of subtracting the activation record size from the stack pointer at run time between calls, we did it at assembly time.

The code here has two pairs of consecutive labels, ELSE FOR and ENDFOR ENDIF. To the assembler, these are synonyms; the labels in each pair have the same value. Giving one location in the program two different labels can clarify what is going on. Here, it documents a for loop inside an else clause. Labels take some space in the data structures used by the assembler, but they have no cost when the program runs.

Exercises

p) Combine the optimizations suggested above for PUTDECU and test it.

q) Consider this alternative receiving and return sequence:

        ; receiving sequence
        PUTDECU:STORES  R1,R2           ; -- save the return address
                ADDI    R2,R2,ARSIZE    ; -- push my activation record

                                        ; body of subroutine goes here
        ; return sequence
                ADDI    R2,R2,-ARSIZE   ; -- pop my activation record
                LOADS   PC,R2           ; return

How does this relate to the receiving and return sequence used by the main program given in the previous chapter?

A Second Example

A classic examples of recursion is the Fibonacci series that begins 0 1 1 2 3 5 8 13 and continues with the rule that each member of the series is the sum of the two before it. Mathematically, we say that f(i) gives the i^th member of the series, where f(0) = 0, f(1) = 1 and for higher values of i, f(i) = f(i–1)+f(i–2). The ratio of successive members of the this series approximates the golden ratio, φ, an irrational number close to 1.618.

We can easily write C code for a function to compute the i^th member of the Fibonacci series. The code is not very different in Java or Python.

**A recursive function to compute Fibonacci numbers**
unsigned int fibonacci(unsigned int i) { /* return the i-th member of the Fibonacci series */ if (i <= 1) { return i; } else { return fibonacci(i - 1) + fibonacci(i - 2); } }

There is a faster iterative algorithm to compute the Fibonacci series, but we are explore calling sequences and recursion, so we will ignore it.

This code has two return statements and a long expression with two recursive calls. It helps to rewrite the high-level-language code so each line corresponds roughly to what you can do with a short assembly code block. In our rewrite, each call is on a line by itself, most of the arithmetic is separated from the calls, and there is just one return.

**The Fibonacci function, rewritten**
unsigned int fibonacci( unsigned int i ) { /* return the i-th member of the Fibonacci series / unsigned int value / the return value / unsigned int fib1, fib2; / intermediate values */ value = i; if (value > 1) { fib1 = fibonacci( value - 1 ); fib2 = fibonacci( value - 2 ); value = fib1 + fib2; } return value; }

The Fibonacci function, rewritten

unsigned int fibonacci( unsigned int i ) {
    /* return the i-th member of the Fibonacci series */
    unsigned int value       /* the return value */ 
    unsigned int fib1, fib2; /* intermediate values */

    value = i;
    if (value > 1) {
        fib1 = fibonacci( value - 1 );
        fib2 = fibonacci( value - 2 );
        value = fib1 + fib2;
    }
    return value;
}

**The `FIBONACCI` function, in Hawk Assembly Language**
; activation record format for the FIBONACCI function RETAD = 0 ; the return address VALUE = 4 ; the tentative unsigned integer return value FIB1 = 8 ; fibonacci(value-1) FIB2 = 12 ; fibonacci(value-2) ARSIZE = 16 ; size of activation record in bytes FIBONACCI: ; expects R3 -- i, an unsigned integer ; returns R3 -- fib(i), the ith Fibonacci number STORES R1,R2 ; -- save the return address STORE R3,R2,VALUE ; value = i LOAD R3,R2,VALUE CMPI R3,1 BLEU ENDIF ; if (value > 1) { LOAD R3,R2,VALUE ADDSI R3,-1 ; -- parameter value - 1 ADDI R2,R2,ARSIZE JSR R1,FIBONACCI ; -- call fibonacci ADDI R2,R2,-ARSIZE STORE R3,R2,FIB1 ; fib1 = fibonacci( value - 1 ) LOAD R3,R2,VALUE ADDSI R3,-2 ; -- parameter value - 2 ADDI R2,R2,ARSIZE JSR R1,FIBONACCI ; -- call fibonacci ADDI R2,R2,-ARSIZE STORE R3,R2,FIB2 ; fib2 = fibonacci( value - 2 ) LOAD R3,R2,FIB1 ; -- get fib1 LOAD R4,R2,FIB2 ; -- get fib2 ADD R3,R3,R4 ; -- add them STORE R3,R2,VALUE ; value = fib1 + fib2 ENDIF: } LOAD R3,R2,VALUE ; -- get return value LOADS PC,R2 ; return value

; activation record format for the FIBONACCI function
RETAD   =       0       ; the return address
VALUE	=       4       ; the tentative unsigned integer return value
FIB1    =       8       ; fibonacci(value-1) 
FIB2    =       12      ; fibonacci(value-2)
ARSIZE  =       16      ; size of activation record in bytes
FIBONACCI:
        ; expects R3 -- i, an unsigned integer
        ; returns R3 -- fib(i), the ith Fibonacci number
        STORES  R1,R2           ; -- save the return address
        STORE   R3,R2,VALUE     ; value = i

        LOAD    R3,R2,VALUE
        CMPI    R3,1
        BLEU    ENDIF           ; if (value > 1) {

        LOAD    R3,R2,VALUE
        ADDSI   R3,-1           ;    -- parameter value - 1
        ADDI    R2,R2,ARSIZE
        JSR     R1,FIBONACCI    ;    -- call fibonacci
        ADDI    R2,R2,-ARSIZE
        STORE   R3,R2,FIB1      ;    fib1 = fibonacci( value - 1 )

        LOAD    R3,R2,VALUE
        ADDSI   R3,-2           ;    -- parameter value - 2
        ADDI    R2,R2,ARSIZE
        JSR     R1,FIBONACCI    ;    -- call fibonacci
        ADDI    R2,R2,-ARSIZE
        STORE   R3,R2,FIB2      ;    fib2 = fibonacci( value - 2 )
        
        LOAD    R3,R2,FIB1      ;    -- get fib1
        LOAD    R4,R2,FIB2      ;    -- get fib2
        ADD     R3,R3,R4        ;    -- add them
        STORE   R3,R2,VALUE     ;    value = fib1 + fib2
ENDIF:				}
        LOAD    R3,R2,VALUE     ; -- get return value
        LOADS   PC,R2           ; return value

Each block of instructions in this code corresponds to one line of the rewritten C code, and these blocks were written without regard to each other. There may be one subtle point, in the use of the BLEU instruction. This is used because we are comparing unsigned operands. This is not really an issue here because the Fibonacci function will produce an overflow and an invalid results long before the difference between signed and unsigned values makes a difference!

Exercises

r) Write a C main program that tests the fibonacci function, by printing the first 20 Fibonacci numbers, and then write a SMAL Hawk main program to test the FIBONACCI function.

s) Optimize the code for FIBONACCI to eliminate extraneous loads and stores, taking maximum advantage of values already known to be in registers. If you do the previous exercise, you can test your result.

t) Optimize the code for FIBONACCI to eliminate extraneous changes to the stack pointer between the two recursive calls.

u) Optimize the code for FIBONACCI to eliminate extraneous variables from the activation record. In this case, the minimum activation activation record holds just a single variable plus a return address. You may need to use more registers to do this.

v) Write an iterative subroutine in C that computes the i^th Fibonacci number and then translate that subroutine to Hawk assembly language.

Global Variables

Our focus has been local variables that are allocated on entry to a subroutine and deallocated on exit. Sometimes, you need global or statically allocated variables. These variables are allocated when the program is started and persist until the the program terminates. In C, global variables are declared at the outermost nesting level, outside any function, and are visible to all functions declared later in the same source file. The SMAL assembler, working together with the SMAL linker, can allocate global variables using the COMMON directive.

Suppose you wanted to count the number of calls made to the recursive FIBONACCI routine we have just defined. To do this, we need a global variable that is incremented each time the body of FIBONACCI is entered. We can allocate this variable as follows:

int fibcount = 0; /* global */

; allocate global variable
        COMMON  FIBCOUNT,4

; initialize global variable
LCSAVE  =       .
.       =       FIBCOUNT
        W       0
.       =       LCSAVE

The COMMON directive takes two arguments, the name of the global variable, FIBCOUNT in this case, and the size of the variable in bytes, 4 in this case. This directive merely tells the linker to allocate the indicated space in RAM. If multiple declarations of the same global variable occur in separately assembled sections of the program, the linker will allocate only one copy, so all references to that variable refer to the same memory location.

Whenever the initial value of a global variable can be statically determined, a good compiler will arrange for the linker and loader to do the initialization. The code shown does this by setting aside the previous value of the asembler's location counter in LCSAVE, then setting the assembly origin into the common, assembling the word that goes there, and then restoring the location counter. The SMAL assembler passes these changes to the linker, which passes them to the loader so that the values are loaded where they belong. Thus, while the initialization of a global variable looks like a kind of assignment statement, it has no run-time cost if the value can be computed in advance.

The following code fragment illustrates how to increment a global variable. This code could be placed at the head of our FIBONACCI routine to count calls.

**Incrementing a global variable**
	LIL R4,FIBCOUNT ; -- load pointer to fibcount LOADS R5,R4 ; -- load its value ADDSI R5,1 ; fibcount = fibcount + 1 STORES R5,R4 ; -- save the incremented value

The above code is designed to go directly after the entry to the FIBONACCI routine. At that point in the code, R3 is in use to hold the value of the parameter to the subroutine. Because of this, the above code uses R4 and R5.

Note that the Hawk monitor contains some global variable declarations. For example, stdio.h defines the global variables TERMINFO, a 2-word record that gives the number of rows and columns on the Hawk display screen. TERMINFO+TERMROWS is the address of the global variable holding number of rows on the screen, while TERMINFO+TERMCOLS is the address of the variable holding the number of columns.

Multiple Routines in One Source File

Suppose we want to use our code for PUTDECU to test the FIBONACCI routine, or visa versa. An obvious way to do this is to combine both routines into one source file, along with the main program. Doing this poses potential problems because we have reused several identifiers in our initial versions of the code for each of these routines. Some of these identifiers are reused for the same meaning in both, for example RETAD and ARSIZE. In the case of RETAD, this causes no problems because we never actually use that identifier (we use STORES to access the return address field) and it is always set to zero. ARSIZE, however, has different values in the main program and in each of our subroutines. Furthermore, both PUTDECU and FIBONACCI contain labels named ENDIF.

Our assembly language provides some help. Symbols defined by assignment with an equals sign may be redefined any time. So, as long as the definition is before use, there is no problem at all with defining ARSIZE differently for each subroutine. The same applies for register-save locations, where it makes sense to adopt a uniform naming convention, for example using R8SV for the location where R8 is saved, yet different subroutines may save the same reigster in different fields of the activation record.

Labels pose the main problem. In any source file, no identifier may be used as a label with two different meanings. We can solve this by using naming conventions that encourage use of unique labels in each subroutine. There are two conflicting requirements at work here: Long labels can ruin the readability of the code, while short labels are hard to make unique or meaningful. Here are some style suggestions:

Labels for subroutines:: The label for a subroutine may be up to 14 characters long, and it should be defined on the first line of the block of comments that documents the receiving sequence. There should be a unique and fairly obvious three-letter abbreviation for the subroutine name. For example, FIBONACCI can be abbreviated FIB and PUTDECU can be abbreviated PUT (if there is only one put routine in the source file) or PDU (if PUTDEC and PUTDECU are both in the same source file).
Labels for control structures within subroutines:: Labels for local control structures should begin with the three letter abbreviation for the routine they are in. The last label before the return from a subroutine, if it is an end-loop or end-if label, can have the suffix QT or X meaning quit or exit. So, the last label in the fibonacci routine could be FIBQT.
Labels for loops:: Labels at the top of a loop be suffixed LP or LOOP and labels marking the exit from a loop can be suffixed LX (the X stands for exit). If there are multiple loops, consider adding a qualifying letter as a suffix, so you end up with labels like FIBLPA and FIBLPB.
Labels for if statements:: Labels marking the start of else clauses can be suffixed with EL or ELSE and the label at the end of the if statement can be suffixed EI (for endif). If there are multiple if statements, add a final distinguishing suffix, for example, FIBEIA.

These conventions are by no means perfect, but they are sufficient to allow us to combine the fibonacci and print routines given here into one somewhat legible source file. Of course, we also need to rename PUTDECU to something that doesn't conflict with the stdio.h header file. One important result of these naming conventions is that names that have global meaning are longer and more legible, while names with only local meaning, those associated with local control structures within one routine, are less readable but still likely to be easy to find simply because they are always defined fairly close to the point of use.

Exercises

w) When you combined the putdecu and fibonacci C routines into one source file, there were no label conflicts, while when you combine the PUTDECU and FIBONACCI SMAL Hawk subroutines into one source file, there are label conflicts. Why? The answer has to do with differences in the languages.

x) Combine the putdecu and fibonacci C routines into one source file, and then add a call to putchar to the body of fibonccci so that it outputs a begin paren on entry, and just before exit, make it output the decimal representation of the value it is about to return followed by an end paren. If you called fibonacci(2), for example, this would output ((1)(0)1) and if you called fibonacci(3) it would output (((1)(0)1)(1)2). See what you get when you run it for fibonacci(5)

y) Consider the series where s(i) defined by:

                unsigned int s( unsigned int i ) {
                    int j;
                    int v = 0;
                    if (i < 2) return i;
                    for (j = 0, j < i; j++) v = v + s(j);
                    return v;
                }

This begins 0 1 1 2, just like the Fibonacci series. Write a recursive assembly language program that computes this series, and test it with a main program that prints out the first 10 members of the series. You can also test the C code to see how they differ.

z) Instrument the Fibonacci code to count how many calls have been made to FIBONACCI and then write a main program to evaluate f(8) and report the number of calls.

6. Subroutines, Recursion and Local Variables

An Example, Binary to Decimal Conversion

Exercises

C Structures

Locals, Activation Records and the Stack

The stack on the Hawk

Access to Fields of a Record, Load and Store

Exercises

Addressing Modes

Exercises

Addressing Modes on Some Other Machines

Exercises

Setting up an Activation Record

The Hawk Monitor's Divide Routine

Calling Sequences

Exercises

Putting it all Together

Exercises

Errors

Optimization

Exercises

A Second Example

Exercises

Global Variables

Multiple Routines in One Source File

Exercises