In writing a compiler for the example calculator, an interesting tradeoff comes up, that of compiling in-line code versus compiling calls to procedures. Consider the following example, a block of code that a compiler can generate for the enter command:
STORES 0,R2 ADDSI R2,4The following code uses a procedure to do the same thing:
CLR R1 JSR R1,R1,ENTERThis assumes that ENTER, is an external symbol somewhere in the first 64K of memory. This procedure might be defined as follows inside your compiler:
;-------------------------- INT ENTER ENTER: STORES 0,R2 ADDSI R2,4 JUMPS R1 ; returnWhich of the above two approaches is better? In this case, there is never any reason to favor the second approach! Executing code based on the first approach takes only 2 instructions, while the latter approach adds 3 additional instructions to call the procedure and then return. In addition to the memory used to store the procedure itself, the call used in the second approach takes 3 halfwords in the compiler's output code space, while the first approach takes only 2 halfwords.
The procedure approach gets far more competitive when code to check for stack overflow is added! In that case, a compiler that generated in-line code for the calculator's enter instruction might duplicate this code for each enter command:
LOAD R1,PUNAVAIL CMP R2,R1 BLT OK CLR R1 JUMP R1,STKOVF OK: STORES 0,R2 ADDSI R2,4This bit of code assumes that PUNAVAIL is a pointer to unavailable memory somewhere within 32K of the call, while STKOVF is the address to which control transfers on stack overflow; presumably, the latter results in the output of an error message and the termination of the current calculator program. This could be replaced with the following procedure call:
CLR R1 JSR R1,R1,ENTERThis is identical to the original procedure call, because all logic associated with checking for stack overflow is now moved into the procedure:
;-------------------------- INT ENTER ENTER: LOAD R1,PUNAVAIL CMP R2,R1 BLT OK JUMP STKOVF OK: STORES 0,R2 ADDSI R2,4 JUMPS R1 ; returnNow, which approach is better? Clearly, using the procedure approach will result in far more compact compiler output at the expense of only a few extra instructions executed per enter command. There are, of course, many compromizes between these two extremes!
Our RPN calculator's programming language is very limited, but we can easily extend it to support enough features to allow serious programming. Consider, for example, the problems of adding memory and control structures to this calculator's language:
Many programmable calculator languages support an array of memory locations, and it is not hard to add such an array to our language; here, we will call it M. Consider adding two new commands to access this memory, save and recall:
E5E2S
E2RP
The sequence ESERER is a useful one! This duplicates the top item on the calculator's stack. Similarly, ESERERP prints the top item on the calculator's stack and restores the stack to what it held before the sequence began, making it useful for debugging.
The only warning worth mentioning about implementing these commands is that the calculator's memory should be a distinct array! Many early calculators only allowed 8 to 16 variables, while more modern calculators frequently allow as many as 1000 variables. Given that our calculator is limited to 80 letters per calculator program, it is not hard to compute the maximum size memory a program is likely to need, but it is also possible to write programs that use a much larger memory!
To make a programmable calculator truly interesting, it should have the ability to execute programs that contain loops! The simplest way to support this would be to allow conditional gotos of some kind, but it is more fun to support some kind of structured notion of iteration. Consider the following:
E{ES ERR P ERE1+}
How can these iteration commands be implemented in an interpreter for our calculator? There are two ways to do this, and each illustrates useful ideas.
Consider, as a basic framework, the following bit of Pascal code, an abridged version of the code given when the calculator example was originally presented:
procedure calculate(var line: string); const stacksize = 80; var stack: array[0..stacksize] of integer; sp: 0..stacksize; i: 0..stringsize; begin sp := 0; i := 0; while s[i] <> endofstring do begin case s[i] of 'E':begin stack[sp] := 0; sp := sp + 1; end; ... { other commands } '{': { code for begin loop } '}': { code for end loop } end; i := i + 1; end; end;To handle loops in the interpreter, we will simply need to move the input string pointer (the variable i above) back to the start of the loop whenever the end of the loop is encountered. One way to do this is to search back to the start of the loop:
'{': { ignore begin loop } '}': { code for end loop } repeat i := i - 1; until s[i] = '{';This code will work only with loops that are not nested. To handle nesting, we need an auxiliary integer variable, call it level, which is used to skip over inner loops. This leads to the following code:
'}': { code for end loop } begin level := 1; repeat i := i - 1; if s[i] = '{' then level := level - 1 else if s[i] = '}' then level := level + 1; until level = 0; end;This is ugly, but it will correctly skip over any number of nested loops to get to the start of the correct loop.
The above approach uses a search for the start of the loop, but it is clearly more elegant to use something analogous to a direct branch. This can be done quite easily by remembering the location of the loop start when the start of the loop is encountered, as illustrated by the following:
'{': { remember begin loop } looploc := i; '}': { code for end loop } i := looploc;Again, this code will work only with loops that are not nested. To handle nesting, we need to replace the above with code that uses a stack, for example:
'{': { remember begin loop } push( i ); '}': { code for end loop } i := pop;The stack used for pushing and popping loop start and end locations must not be the same stack that is used for pushing and popping the calculator's operands!
Loops without a loop exit mechanism are of only limited interest, so consider adding the following commands to the calculator:
{E1E2=} -- an infinite loop (1 = 2 is always false) {E1E2>} -- an infinite loop (1 > 2) is always false) {E1E2<} -- never iterates (1 < 2 is always true).
M[0] = vvv; for (;;) { if (M[0] < 1) break; printf( "%d ", M[0] ); M[0] = M[0] - 1; }Translating this to our calculator language gives:
E12ES { ERE1< ERP ERE1-ES }Spaces have been used to separate this into 6 blocks of calculator instructions, each of which corresponds exactly to one of the lines of the above C code!
How can we implement these loop exit commands in an interpretive implementation of our calculator? Consider using a simple search for the end of loop whenever a loop exit fails, and merging this with the first implementation of loops given above:
'{': { ignore begin loop } '}': { code for end loop } repeat i := i - 1; until s[i] = '{'; '=': { code for compare on equal } begin sp := sp - 2; if stack[sp] = stack[sp + 1] then begin { search for end loop } repeat i := i + 1; until s[i] = '}'; end; end;This code works only for non-nested loops. Adding support for nesting is trivial and is left as an exercise.