# Lecture 27, Code Generation

## An Observation

Because all data objects in Falcon have a constant size that is statically determined, the stack pointer at any point in a block is at a constant displacement from the base of the activation record for that block. Consider this code.

```procedure sort( ref i: int32, ref j: int32 ) -- semicolon here?
var k: int32;
if i < j then
k = i;
i = j;
j = k;
end;
end;
```

If you imagine a totally sub-optimal implementation of this code, you might consider pushing each term in an expression on the stack as you evaluate the expression. This leads to the following picture of what is on the stack at each point in the above code:

```procedure sort( ref i: int32, ref j: int32 ) -- semicolon here?
-- @i @j
var k: int32;
-- @i @j k
if i
-- @i @j k i
< j
-- @i @j k i j
-- @i @j k b
then
-- @i @j k
k
-- @i @j k @k
= i
-- @i @j k @k i
;
-- @i @j k
i
-- @i @j k @i
= j
-- @i @j k @i j
;
-- @i @j k
j
-- @i @j k @j
= k
-- @i @j k @j k
;
end;
-- @i @j k
end;
--- Legend i,  j - the values of the actual parameters
---       @i, @j - the addresses of the actual parameters
---        k     - the local variable
---       @k     - the address of the local variable
---        b     - a boolean resulting from comparing two integers
```

Note that the storage allocated in the activation record for the above routine's parameters holds references to the actual parameters, since they were declared as reference parameters. The basic activation record for the procedure therefore holds 2 reference parameters along with the local variable k. The rest of the variables in the activation record are anonymous temporaries that are used during the evaluation of expressions and the execution of assignment statements.

For example, to evaluate if i < j then we push the value of i on the stack, then push the value of j on the stack, before comparing them and replacing the two items with just one item reflecting the result of the comparison. Finally, we pop the result from the stack as we do a conditional branch.

The observation with which we began is simple: We do not need to use a dynamically computed stack pointer for this stack. We can use static computation, where each push is essentially an allocation operation in the current activation record, and each pop is a deallocation. Instead of using a stack pointer, we can use static offsets from the base of the activaiton record, just as we do for local variables.

## Stack Machines

Starting around 1960, Burroughs Corporation began making stack machines. Their product line had 6-bit bytes (they called them characters or syllables) and a 48-bit word. The Burroughs 5000 was the first of these machines, followed by the 6500, 6700 and eventually 8500. Collins Radio, in Cedar Rapids Iowa, began making similar machines, with an 8-bit byte and 16 (later 32 and 64) bit words in around 1968. The Collins Advanced Processor System of CAPS was originally a minicomputer, but as the technology advanced, it was followed by the Advanced Architecture Microprocessor or AAMP.

Aside: Why did the AAMP stay in production in a world where cheap commodity microprocessors are very cheap? Because most microprocessors come with a manufacturer's warrantee that disclaims all responsibility for the use of the system in any safety critical application. Collins (later Rockwell Collins) makes avionics, where the processors are routinely used to directly fly both civilian and military aircraft. In such applications, being able to prove the correctness of both the processor implementation and any compilers used is very valuable, and stack architectures make such correctness proof far easier than it would be with the kind of architectures that dominate the commodity computing marketplace.

Similar architectures have been used as software-implemented virtual machines in order to produce low-cost portable compilers. For example, the Pascal P-machine, executing P-code, was a common way to implement the Pascal language in the 1970s. Later, J-code and the JVM (Java Virtual Machine) served the same purpose. In 1997, Rockwell Collins modified the AAMP to create the JEM1 processor to directly implement J-code without the need for a virtual machine.

Here are the instructions of a typical stack machine, described in terms of a stack stored in memory (the array M), and a stack-pointer register sp:

PUSHI c
PUSH Immediate; the constant c is pushed onto the stack.
```M[++sp] = c;
```
LOAD onto the stack top the contents of the memory location addressed by the stack top.
```M[sp] = M[M[sp]];
```
POP
POP the data on the stack top and store it in the memory address referenced by the address popped from the stack top. A total of two words are popped from the stack; the data was on top of the address.
```temp = M[sp--];
M[M[sp--]] = temp;
```
DUP
DUPlicate the item on the stack top.
```M[++sp] = M[sp];
```
ADD the top two items on the stack, popping one operand and repacing the other operand with the sum.
```temp = M[sp--];
M[sp] = M[sp] + temp;
```
SUB
SUBtract top item popped from the stack from the item below it, replacing the latter with the difference. Other arithmetic operators follow the same pattern, as do logial operators such as and and or.
```temp = M[sp--];
M[sp] = M[sp] - temp;
```
GT
LT
EQ
Greater Than (and other comparison operators) replace the top two items on the stack with the result of comparison, comparing the item under the stack top with the stack top and placing the Boolean result on the stack. The net effect is like all other binary operators: One item is popped from the stack while the result replaces the other operand.
```temp = M[sp--];
M[sp] = M[sp] > temp; -- for example, could also be < or &eq;, etc
```
BTRUE dst
BFALSE dst
Branch if TRUE (and Branch if FALSE) pop the top item from the stack and if it is true (or if it is false) they branch to the dst. The top item on the stack is always popped, whether or not the branch is taken.
```if M[sp--] -- or if not
then pc = dst;
```
The above instruction set is sufficient to allow reference to global variables at static addresses, but for languages such as ALGOL 60 and its descendants, including C, C++ and Java, we need a way to load the addresses of local variables. The easiest way to do this is with a special register, the frame pointer fp. Within any procedure or function, this always points to the base of the stack frame (activation record) for the current context.
PUSHLA d
PUSH Local Address; the address of a local variable at displacement d in the activation record is pushed onto the stack.
```M[++sp] = d + fp;
```

There are numerous alternatives to the above. You could have an instruction to push the frame pointer itself, then push the displacement, then add, if you wanted to eliminate the combination of arithmetic with a constnt.

## Hand Translation to Stack Code

Now, consider the example at the head of this section. Ignoring the details of the procedure call and return, the code generated for the example code would be as follows:

```procedure sort( ref i: int32, ref j: int32 ) -- semicolon here?
var k: int32;
PUSHI  0 -- allocate space for k, initialization is a side effect.
if i < j then
PUSHLA I -- push the address of @i
LOAD     -- get the value of @i
LOAD     -- get the value of i
PUSHLA J -- push the address of @j
LOAD     -- get the value of @j
LOAD     -- get the value of j
LT
BFALSE ENDIF
k = i;
PUSHLA K -- get the address of k
PUSHLA I -- get the address of @i
LOAD     -- get the value of @i
LOAD     -- get the value of i
POP
i = j;
PUSHLA I -- get the address of @i
LOAD     -- get the value of @i
PUSHLA J -- get the address of @j
LOAD     -- get the value of @j
LOAD     -- get the value of j
POP
j = k;
PUSHLA J -- get the address of @j
LOAD     -- get the value of @j
PUSHLA K -- get the address of k
LOAD     -- get the value of k
POP
EI:
end;
end;
```

If you imagine a totally sub-optimal implementation of a stack machine, you imagine each stack machine instruction involves one or more memory cycles to access the top elements of the stack. Looking at the above, this suggests that a simple assignment such as i=1 could take 3 or more instruction cycles. Even the Burroughs 5000 computer avoided this by dedicating two registers inisde the CPU to holding the top two elements of the stack. State bits in the CPU were used to record whether the stack was entirely in memory, or whether the top word of the stack was in a register, or whether the top two words were in registers. The net result was efficient execution with few memory cycles actually required. In later machines such as the AAMP, 4 registers were used to hold up to the top 4 elements of the stack.

## A Low-Quality Compiler

If you want to implement a low quality compiler for a machine like the ARM, an easy way to do so is to generate stack code. Each of the stack oriented instructions given above can be replaced with a simple sequence of ARM instructions.

## A Medium-Quality Compiler

Because the displacement from the stack top to the frame pointer at each point in the code is a constant, so long as the frame pointer is stored in an index register, there is no need to do any arithmetic on the stack pointer at run time. Instead, indexed addressing off of the frame pointer can be used to address the memory locations that the low quality compiler would have addressed through the stack pointer.

## Packaging these ideas

Even in a very high quality compiler, it may be useful to interface the parser to the code generator through the stack model. The parser can easily call pushi(const) when it needs the constant as an operand in an upcoming instruction. In an early version of the compiler, the pushi() routine could simply generate the corresponding simple stack-machine code (or instruction sequences that accomplish the same thing). Later, the body of pushi() can be improved to, for example, stage the constant in some internal memory of the code generator so that a later instruction, such as add() could generate an add immediate instruction as the optimal way to add a constant to whatever the other operand is. Thus a simple stack architecture may hide inside the compiler even when the underlying machine has a completely different architecture.

## Connecting the Parser to Code Generation

Given a recursive-descent syntax directed parser and a stack-oriented interface to a code generator, we can generate code for assignment statements and expression evaluators in a very direct way. Consider the code to parse a reference as a starting point:

```type * parse_reference( block* context ){
/* returns the type of the object to which the top of stack refers */
assert(lex_this.type = IDENT);

/* does not handle up-level and parameter by reference addressing
nor does it handle references to procedures and functions */

binding * b = context->lookup( lex_this.value );
assert( b->kind == VARIABLE );

code_pushla( b->displacement );
type * t = b->type;

while (ispunc( lex_this, setX(P_DOT, P_AT ... ) )) {
if (lex_this.val == P_AT) { /* @ -- pointer */
assert( t->kind == POINTER );
t = t->reftype;

} else if (lex_this.val == P_DOT) { /* . -- field */
assert(lex_this.type = IDENT);
assert( t->kind == RECORD );
binding * b = t->block->lookup( lex_this.value );
code_pushi( b->displacement );
t = b->type;
} else ...
}
return t;
}
```

In the same spirit, the code to parse a simplified expression grammar might look like this:

```type * parse_expression( block * context ){
type * t = parse_factor( context );

while (ispunc( lex_this, setX( P_PLUS, P_MINUS ... ) )) {
lexeme operator = lex_this;

type * t1 = parse_factor( context );
assert( compatible_arithmetic_types( t, t1 ) );

if (operator.value = P_PLUS) {
t = ... must compute result type ...
} else if (operator.value = P_PLUS) {
code_sub();
t = ... must compute result type ...
}
}
return t;
}
```

In the above examples, note that the assert() routine is used as a short-hand for error processing. If assert() is used literally, as written, this routine would simply check its parameter. If the parameter is true, it would return with no action. If the parameter is false, it would issue a generic fatal error message such as "assertion not met". To be useful, the error message should be far more constructive, requiring a variety of different error interfaces. Typically, the lexeme that caused the error should be a parameter, since this is likely to carry information about the source line number from which that lexeme was parsed.