22C:18, Lecture 25, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Compilers versus Interpreters

The calculator example from the previous section is an example of an interpreter for a programming language. Admittedly, the language is very simple, with no control structure and a trivial syntax, but it is a language. Interpreters operate on programming languages by scanning the source text and executing each command as it is encountered in the source.

Most spreadsheets use interpretive execution, as do many implementations of programming languages such as LISP and BASIC. Macro languages are also typically processed interpretively, although many programmers don't usually view macro procesors as programming languages in their own right.

Languages such as FORTRAN, Pascal, C, and Ada are rarely implemented by interpreters. Instead, these languages are compiled, translated from source form to machine code. Presenting a compiler for one of these programming languages is well beyond the scope of this course, but it is worth presenting an example of a very simple compiler:

Compiling the RPN Calculator's Input Lanugage

Consider the problem of compiling the input language used in the calculator example. The main program for a compiler based version of this calculator might be described as follows, ignoring issues of display formatting:

        program Calculator;
        var line: string;
            code: array of machine instructions;

        begin { main program }
            repeat
                kbgets(line);
                compile(line,code);
		code;
            forever;
        end;
We cannot write this as legal Pascal or C, so the above main program must be interpreted as pseudocode. The idea is to replace the call to calculate from the previous example with a call to compile, where compile takes a line of text and uses it to create a procedure in the array called code. Following the execution of the compiler procedure, the main program then calls the procedure that was produced by the compiler.

When this compiled code executes, it produces the output we expect from the calculator. This poses the first design constraint on our compiler: It must produce a callable procedure. The second constraint is artificial: We will restrict our compiler to a trivial translation, with no optimization at all. Thus, each individual calculator command should be reduced to exactly the same sequence of machine instrucitons every time it is encountered, with no reference to context.

enter:	ADDSI	R2,4
	STORES	0,R2

digit:  LOADS	R3,R2
	ADDSL	R3,R3,2
	SL	R3,R3,2
	ADDI	R3,digit
	STORES  R3,R2

plus:	LOADS	R3,R2
	ADDSI	R2,-4
	LOADS	R4,R2
	ADD	R4,R4,R2
	STORES	R4,R2

minus:	LOADS	R3,R2
	ADDSI	R2,-4
	LOADS	R4,R2
	SUB	R4,R4,R2
	STORES	R4,R2

print:	LOADS	R3,R2
	LOAD	R1,PPRINT
	JSRS	R1,R1
	ADDSI	R2,-4
What we have defined above is a standard sequence of instructions to be executed for each possible calculator command. In the interpreter, we embedded these instructions in a control structure that executed them immediately whenever the corresponding calculator command was encountered in an input string. In the corresponding compiler, we will string together copies of these instructions to build the compiler's output. Consider the following calculator input string, for example:
	E3E5-P
When this is presented to the compiler, we would expect output something like the following:
	STORES	R1,R2	; prologue

	ADDSI   R2,4	; E
	STORES  0,R2

	LOADS   R3,R2	; 3
	ADDSL   R3,R3,2
	SL      R3,R3,2
	ADDI    R3,3
	STORES  R3,R2

	ADDSI   R2,4	; E
	STORES  0,R2

	LOADS   R3,R2	; 5
	ADDSL   R3,R3,2
	SL      R3,R3,2
	ADDI    R3,5
	STORES  R3,R2

	LOADS   R3,R2	; -
	ADDSI   R2,-4
	LOADS   R4,R2
	SUB     R4,R4,R2
	STORES  R4,R2

	LOADS   R3,R2	; P
	LOAD    R1,PPRINT
	JSRS    R1,R1
	ADDSI   R2,-4

	LOADS	R1,R2	; epilogue
	JUMPS	R1,R1
Note that the copies of the blocks of code for each calculator command are strung together in the order of the commands, and note the inclusion of a prologue and epilogue that converts the whole string of commands into a procedure with a well defined calling sequence.

This particular example is missing a few niceities! For example, it fails to detect stack overflow or underflow. Worse yet, if the calculator program has unbalanced push and pop operations, it will not return correctly. This last error is quite severe! Error checking was omitted here because it obscures the underlying operations, but it is not difficult to add.

Copying Machine Code

Some compilers produce output as assembly language; that is, they output a text file that must be assembled before it is run. The compiler for our example calculator does not do this; instead, is expected to write executable machine code directly into the array called CODE. Doing this requires copying machine instructions from one place to another in memory.

The STUFFH and EXTH instructions are central to this. These are exactly analogous to the STUFFB and EXTB instrucitons used for byte manipulation, but they manipulate halfwords. Consider, for example, the following crude bit of code for compiling an enter instruction, to be executed each time an E command is encountered.

;---------------------------
ENTER:				; R3 points into the CODE array
	LOAD	R4,ADDSIR24	; get first instruction
	LOADS	R5,R3
	STUFFH	R5,R4,R3	; stuff it into place
	STORES	R5,R3
	ADDSI	R3,2		; advance code pointer
	LOAD	R4,STORES0R2	; get second instruction
	LOADS	R5,R3
	STUFFH	R5,R4,R3	; stuff it into place
	STORES	R5,R3
	ADDSI	R3,2		; advance code pointer
	JUMP	LOOP		; go handle next instruciton

	ALIGN	4
ADDSIR24:
	ADDSI	R2,4
	ALIGN	4
STORES0R2:
	STORES	0,R2
This is cumbersome code, and use of this approach will lead to a very large compiler, but it will also be quite fast. It might be better to write a general purpose halfword copy routine, so that the above could be rewritten as:
;---------------------------
ENTER:				; R3 points into the CODE array
        LEA	R4,CODEE	; get pointer to first instruction
        JSR	R1,COPY
	JUMP	LOOP
CODEE:
        ADDSI	R2,4
        STORES	0,R2
	H	0
This assumes that COPY is a procedure that copies consecutive halfwords from the memory pointed to by R4 to the memory pointed to by R3, incrementing both registers as it goes and stopping as soon as a null halfword is encountered, but before the null is copied into the CODE array.

Either of the above schemes works well for all but two cases. One is the code to call the print routine. In the version given here, this code uses relative addressing, so the halfword that actually points to the PRINT procedure must be adjusted in terms of the current value of the location counter. The other troublesome case is the code for digits, where all of the instructions are the same but for one halfword containing the binary value of the digit.