The calculator example from the previous section is an example of an interpreter for a programming language. Admittedly, the language is very simple, with no control structure and a trivial syntax, but it is a language. Interpreters operate on programming languages by scanning the source text and executing each command as it is encountered in the source.
Most spreadsheets use interpretive execution, as do many implementations of programming languages such as LISP and BASIC. Macro languages are also typically processed interpretively, although many programmers don't usually view macro procesors as programming languages in their own right.
Languages such as FORTRAN, Pascal, C, and Ada are rarely implemented by interpreters. Instead, these languages are compiled, translated from source form to machine code. Presenting a compiler for one of these programming languages is well beyond the scope of this course, but it is worth presenting an example of a very simple compiler:
Consider the problem of compiling the input language used in the calculator example. The main program for a compiler based version of this calculator might be described as follows, ignoring issues of display formatting:
program Calculator; var line: string; code: array of machine instructions; begin { main program } repeat kbgets(line); compile(line,code); code; forever; end;We cannot write this as legal Pascal or C, so the above main program must be interpreted as pseudocode. The idea is to replace the call to calculate from the previous example with a call to compile, where compile takes a line of text and uses it to create a procedure in the array called code. Following the execution of the compiler procedure, the main program then calls the procedure that was produced by the compiler.
When this compiled code executes, it produces the output we expect from the calculator. This poses the first design constraint on our compiler: It must produce a callable procedure. The second constraint is artificial: We will restrict our compiler to a trivial translation, with no optimization at all. Thus, each individual calculator command should be reduced to exactly the same sequence of machine instrucitons every time it is encountered, with no reference to context.
enter: ADDSI R2,4 STORES 0,R2 digit: LOADS R3,R2 ADDSL R3,R3,2 SL R3,R3,2 ADDI R3,digit STORES R3,R2 plus: LOADS R3,R2 ADDSI R2,-4 LOADS R4,R2 ADD R4,R4,R2 STORES R4,R2 minus: LOADS R3,R2 ADDSI R2,-4 LOADS R4,R2 SUB R4,R4,R2 STORES R4,R2 print: LOADS R3,R2 LOAD R1,PPRINT JSRS R1,R1 ADDSI R2,-4What we have defined above is a standard sequence of instructions to be executed for each possible calculator command. In the interpreter, we embedded these instructions in a control structure that executed them immediately whenever the corresponding calculator command was encountered in an input string. In the corresponding compiler, we will string together copies of these instructions to build the compiler's output. Consider the following calculator input string, for example:
E3E5-PWhen this is presented to the compiler, we would expect output something like the following:
STORES R1,R2 ; prologue ADDSI R2,4 ; E STORES 0,R2 LOADS R3,R2 ; 3 ADDSL R3,R3,2 SL R3,R3,2 ADDI R3,3 STORES R3,R2 ADDSI R2,4 ; E STORES 0,R2 LOADS R3,R2 ; 5 ADDSL R3,R3,2 SL R3,R3,2 ADDI R3,5 STORES R3,R2 LOADS R3,R2 ; - ADDSI R2,-4 LOADS R4,R2 SUB R4,R4,R2 STORES R4,R2 LOADS R3,R2 ; P LOAD R1,PPRINT JSRS R1,R1 ADDSI R2,-4 LOADS R1,R2 ; epilogue JUMPS R1,R1Note that the copies of the blocks of code for each calculator command are strung together in the order of the commands, and note the inclusion of a prologue and epilogue that converts the whole string of commands into a procedure with a well defined calling sequence.
This particular example is missing a few niceities! For example, it fails to detect stack overflow or underflow. Worse yet, if the calculator program has unbalanced push and pop operations, it will not return correctly. This last error is quite severe! Error checking was omitted here because it obscures the underlying operations, but it is not difficult to add.
Some compilers produce output as assembly language; that is, they output a text file that must be assembled before it is run. The compiler for our example calculator does not do this; instead, is expected to write executable machine code directly into the array called CODE. Doing this requires copying machine instructions from one place to another in memory.
The STUFFH and EXTH instructions are central to this. These are exactly analogous to the STUFFB and EXTB instrucitons used for byte manipulation, but they manipulate halfwords. Consider, for example, the following crude bit of code for compiling an enter instruction, to be executed each time an E command is encountered.
;--------------------------- ENTER: ; R3 points into the CODE array LOAD R4,ADDSIR24 ; get first instruction LOADS R5,R3 STUFFH R5,R4,R3 ; stuff it into place STORES R5,R3 ADDSI R3,2 ; advance code pointer LOAD R4,STORES0R2 ; get second instruction LOADS R5,R3 STUFFH R5,R4,R3 ; stuff it into place STORES R5,R3 ADDSI R3,2 ; advance code pointer JUMP LOOP ; go handle next instruciton ALIGN 4 ADDSIR24: ADDSI R2,4 ALIGN 4 STORES0R2: STORES 0,R2This is cumbersome code, and use of this approach will lead to a very large compiler, but it will also be quite fast. It might be better to write a general purpose halfword copy routine, so that the above could be rewritten as:
;--------------------------- ENTER: ; R3 points into the CODE array LEA R4,CODEE ; get pointer to first instruction JSR R1,COPY JUMP LOOP CODEE: ADDSI R2,4 STORES 0,R2 H 0This assumes that COPY is a procedure that copies consecutive halfwords from the memory pointed to by R4 to the memory pointed to by R3, incrementing both registers as it goes and stopping as soon as a null halfword is encountered, but before the null is copied into the CODE array.
Either of the above schemes works well for all but two cases. One is the code to call the print routine. In the version given here, this code uses relative addressing, so the halfword that actually points to the PRINT procedure must be adjusted in terms of the current value of the location counter. The other troublesome case is the code for digits, where all of the instructions are the same but for one halfword containing the binary value of the digit.