Lecture 26, Code Generation

Part of the notes for CS:4908:0002 (22C:196:004)
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Basic Distinction

Many compilers do not directly generate machine code. Instead, the semantic action routines driven by the parser operate in terms of a high-level virtual machine provided by the interface to the code generator. The code generator then translates the stream of high-level virtual instructions into actual instructions for the target architecture. This has several benefits:

There are several ways to divide code generation from the semantic action routines:

Generate a file of code for the virtual machine

The Pascal P compiler from the very early 1970s generated its output as a file of P-code, and Java compilers today generate J-code. In both cases, these are relatively high-level machine languages that are interpreted, at run-time, by an interpreter that actually carries out the instructions of the program. Usually, this interpreter is a program written in assembly language or some other language that is translated to real machine code.

The most obvious way to write a code generator for such a compiler is to write a program that reads the high-level code as its input and generates assembly code or machine code for the target machine as its output, but there are other options.

In the 1960s, Burrough's corporation built a series of computers, starting with the B5000, where the native machine language was designed specifically to support the language Algol 60. These machines incorporated a large number of important innovations including the idea of a hardware stack and what we now call byte-coded instructions. The Burroughs stack machines had 6-bit bytes (they called them characters or syllables) and a 48-bit word, making them look very odd from a modern perspective. Burroughs is largely history, but the ideas of the Burroughs architecture were the basis for the Collins Adaptive Processor Architecture of the 1970s and the Rockwell-Collins AAMP (Advanced Architecture Micro Processor) of the 1980s. The latter is a modern machines, with 8-bit bytes and 32-bit word. Rockwell Collins, of Cedar Rapids Iowa, built a hardware J-machine called the JEM that was spun off around the year 2000 to a company called Ajile Systems Inc.

Back in 1993, the first Pascal compiler for the 16-bit PDP-11 computer generated assembly code that was mostly something very close to the P-code generated by the the Pascal-P compiler. This assembly code was converted to PDP-11 assembly code by a vary complex macro package that took care of all details of register allocation and instruction selection. This was possible because Macro-11, the macro assembler for the PDP-11, was a very powerful assembler with excellent macro features.

IBM took a distinctively different approach to this. The IBM System 38, as originally sold in 1979, had a very complex instruction set that directly supported a highly secure strongly-typed memory model. The original implementation was not very efficient, however. As the machine evolved, the programmer's specification for the machine language did not change (except for bug fixes), but the actual CPU architecture evolved toward a far more conventional one. Because the official architecture was strongly typed, a program could not examine its own code. As a result, IBM was able to move most of the complexity of the architecture into the loader. In effect, the loader became a code generator that translated the high-level System-38 instruction set into a far more conventional machine language. Eventually, the System-38 was replaced by the IBM AS-400. On that machine, the CPU was an off-the-shelf Power PC chip, with a small amount of auxiliary logic, but it still supported the secure computing model of the System 38 because of the translation done by the loader.

The virtual machine is just an internal compiler component

Most compilers do not actually generate a file of virtual machine code that is processed by a second system. Instead, the virtual machine appears as a component inside the compiler. If we use a stack machine as our high-level code generation model, we could generate a file of high-level code that looks like the following when we compile the code i=i+1 where i is a global variable:

PUSHGA i    -- push the address of the global variable i
PUSHGA i    -- push the address of the global variable i
LOAD        -- replace the stack top with the value it points to
PUSHI 1     -- push the constant 1
ADD         -- add the top items on the stack
POPA        -- store the result using the address on the stack

Inside a compiler with an integrated code generator, call it the object cg, the compiler might make the following sequence of calls to the code generator as it processes i=i+1:

cg_pushga( "i" )  -- called from assignment::compile
cg_pushga( "i" )  -- called from reference::compile
cg_load( "i" )    -- called from factor::compile
cg_pushi( 1 )     -- called from factor::compile
cg_add( 1 )       -- called from comparand::compile
cg_pop()          -- called from assignment::compile

Alternative High-Level Architectures

The high-level architecture used in the above example is just one possibility. It is a natural one, following from the work done by Burroughs Corporation in the early 1960s, Pascal's P-code, and Java's J-code. There are alternatives.

One idea is to have the semantic routines in the compiler explicitly allocate storage for every variable, including the anonymous variables that hold intermediate results during expression allocation. These variables allocated by the semantic routines will not necessarily be used, but it means that the semantic routines always operate in terms of either global variable addresses (a special case) or local variable addresses.

In this case, the assignment i=i+1 would translate to something like this:

add        const 1   global i  global i

Some compiler texts describe this type of high-level code as triples because the arithmetic operators are all 3-address instructions, with two source operands and a destination operand. Some operators, such as negation and assignment work naturally in terms of two operands, so these operators would naturally have unused operand positions.

Stack Architectures

We can describe the semantics of a stack machine's instruction set in terms of a stack stored in memory (the array M), and a stack-pointer register sp, but we have some choices to make.

We can summarize the above alternatives with this picture:

       ---------low addresses---------

          1       2       3       4
      |       |       |       |       |
      | stack | stack |       |       |
      |       |       |       |       |
SP -->|__top__|_______|__top__|_______|
      |       |       |       |  top  |
      |       |       |       |       |
      |       |       | stack | stack |
      |       |       |       |       |
          1       2       3       4

       --------high addresses---------

On the Digital Equipment Corporation PDP-11, the stack grew down and the stack pointer always pointed to the item on the stack top. The Intel x86 family of machines continues in the same tradition (many details of the x86 architecture owe quite a bit to the PDP-11). Many more recent architectures (but by no means all) imitate the PDP-11 simply because the developers had no reason to change things, but there are good reasons to do so, so other architectures take other directions.

It is easy to experimentally test whether the architecture has a stack that grows up or down on your target machine. Try running a little C program like this, for example:

/* stacktest.c -- program to see which way the stack grows */

void probe( intptr_t a ) {
    int i;
    if (a < (intptr_t)&i) {
        puts( "stack grows up" );
    } else {
        puts( "stack grows down" );

int main(void) {
    int i;
    probe( (intptr_t)&i );

It is harder to figure out how whether the stack pointer points to the top item on the stack or to the free location just beyond the top item. This generally involves inspecting the code to determine how it works.

The ARM hardware permits all 4 stack implementations, but software must pick just one of them. The following document from arm.com gives the details for the standard function call on the ARM. Along the way, it completely defines the stack conventions of the machine:

-- Procedure Call Standard for the ARM Architecture

This document gives the details of the ARM stack; section 5.2.1 declares that the ARM stack is a full descending stack. That is, the stack pointer always points to the top item on the stack and push decrements the stack. Furthermore, operands on the ARM stack are always full words, so a one byte item on the stack is one word. At public interfaces, the stack is required to be 64-bit aligned (although this seems not to matter in many contexts).