22C:18, Lecture 28, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Design Tradeoffs

In writing a compiler for the example calculator, an interesting tradeoff comes up, that of compiling in-line code versus compiling calls to procedures. Consider the following example, a block of code that a compiler can generate for the enter command:

	STORES	0,R2
	ADDSI	R2,4
The following code uses a procedure to do the same thing:
	CLR	R1
	JSR	R1,R1,ENTER
This assumes that ENTER, is an external symbol somewhere in the first 64K of memory. This procedure might be defined as follows inside your compiler:
;--------------------------
	INT	ENTER
ENTER:
	STORES	0,R2
	ADDSI	R2,4
	JUMPS	R1	; return
Which of the above two approaches is better? In this case, there is never any reason to favor the second approach! Executing code based on the first approach takes only 2 instructions, while the latter approach adds 3 additional instructions to call the procedure and then return. In addition to the memory used to store the procedure itself, the call used in the second approach takes 3 halfwords in the compiler's output code space, while the first approach takes only 2 halfwords.

The procedure approach gets far more competitive when code to check for stack overflow is added! In that case, a compiler that generated in-line code for the calculator's enter instruction might duplicate this code for each enter command:

	LOAD	R1,PUNAVAIL
	CMP	R2,R1
	BLT	OK
	CLR	R1
	JUMP	R1,STKOVF
OK:	STORES	0,R2
	ADDSI	R2,4
This bit of code assumes that PUNAVAIL is a pointer to unavailable memory somewhere within 32K of the call, while STKOVF is the address to which control transfers on stack overflow; presumably, the latter results in the output of an error message and the termination of the current calculator program. This could be replaced with the following procedure call:
	CLR	R1
	JSR	R1,R1,ENTER
This is identical to the original procedure call, because all logic associated with checking for stack overflow is now moved into the procedure:
;--------------------------
	INT	ENTER
ENTER:
	LOAD	R1,PUNAVAIL
	CMP	R2,R1
	BLT	OK
	JUMP	STKOVF
OK:	STORES	0,R2
	ADDSI	R2,4
	JUMPS	R1	; return
Now, which approach is better? Clearly, using the procedure approach will result in far more compact compiler output at the expense of only a few extra instructions executed per enter command. There are, of course, many compromizes between these two extremes!

Extending the Calculator Language

Our RPN calculator's programming language is very limited, but we can easily extend it to support enough features to allow serious programming. Consider, for example, the problems of adding memory and control structures to this calculator's language:

Memory

Many programmable calculator languages support an array of memory locations, and it is not hard to add such an array to our language; here, we will call it M. Consider adding two new commands to access this memory, save and recall:

Save - S
M[pop] = pop;
Given an address on the stack top and a value under that, save the value in that address and pop both off of the stack. The following example will save 5 in memory location 2, popping both from the stack:
     E5E2S
Recall - R
push( M[pop] )
Given an address on the stack top, pop that address and push the contents of the corresponding memory location. The following example will print the contents of memory location 2:
     E2RP
Note that the command sequence ER pushes the contents of memory location zero, while ES pops the stack, storing the popped value in location zero. The ease of accessing location zero of the calculator's memory suggests that this location should be used for a scratch location, while locations one and up should be used for less frequently used variables.

The sequence ESERER is a useful one! This duplicates the top item on the calculator's stack. Similarly, ESERERP prints the top item on the calculator's stack and restores the stack to what it held before the sequence began, making it useful for debugging.

The only warning worth mentioning about implementing these commands is that the calculator's memory should be a distinct array! Many early calculators only allowed 8 to 16 variables, while more modern calculators frequently allow as many as 1000 variables. Given that our calculator is limited to 80 letters per calculator program, it is not hard to compute the maximum size memory a program is likely to need, but it is also possible to write programs that use a much larger memory!

Loops

To make a programmable calculator truly interesting, it should have the ability to execute programs that contain loops! The simplest way to support this would be to allow conditional gotos of some kind, but it is more fun to support some kind of structured notion of iteration. Consider the following:

Begin Loop - {
End Loop - }
Enclosing a block of calculator commands in curly braces causes that block of commands to be iterated. Braces may be nested arbitrarily, and as we will discuss later, there are loop exit commands The following example will print the contents of successive memory locations:
     E{ES ERR P ERE1+}
Note that the sequence ERR uses memory zero as an index into memory, so ERRP prints the contents of M[M[0]]. Each iteration of this loop ends by incrementing the value on the stack top, and this program will iterate until it encounters an error -- probably a memory error caused by addressing a nonexistant memory element.

How can these iteration commands be implemented in an interpreter for our calculator? There are two ways to do this, and each illustrates useful ideas.

Consider, as a basic framework, the following bit of Pascal code, an abridged version of the code given when the calculator example was originally presented:

        procedure calculate(var line: string);
        const stacksize = 80;
        var stack: array[0..stacksize] of integer;
            sp: 0..stacksize;
            i: 0..stringsize;
        begin
            sp := 0;
            i := 0;
            while s[i] <> endofstring do begin
                case s[i] of
                'E':begin
                        stack[sp] := 0;
                        sp := sp + 1;
                    end;
		    ... { other commands }

		'{': { code for begin loop }

		'}': { code for end loop }

                end;
                i := i + 1;
            end;
        end;
To handle loops in the interpreter, we will simply need to move the input string pointer (the variable i above) back to the start of the loop whenever the end of the loop is encountered. One way to do this is to search back to the start of the loop:
		'{': { ignore begin loop }

		'}': { code for end loop }
		    repeat
			i := i - 1;
		    until s[i] = '{';
This code will work only with loops that are not nested. To handle nesting, we need an auxiliary integer variable, call it level, which is used to skip over inner loops. This leads to the following code:
		'}': { code for end loop }
		    begin
			level := 1;
			repeat
			    i := i - 1;
			    if s[i] = '{'
			      then level := level - 1
			    else if s[i] = '}'
			      then level := level + 1;
			until level = 0;
		    end;
This is ugly, but it will correctly skip over any number of nested loops to get to the start of the correct loop.

The above approach uses a search for the start of the loop, but it is clearly more elegant to use something analogous to a direct branch. This can be done quite easily by remembering the location of the loop start when the start of the loop is encountered, as illustrated by the following:

		'{': { remember begin loop }
		    looploc := i;

		'}': { code for end loop }
		    i := looploc;
Again, this code will work only with loops that are not nested. To handle nesting, we need to replace the above with code that uses a stack, for example:
		'{': { remember begin loop }
		    push( i );

		'}': { code for end loop }
		    i := pop;
The stack used for pushing and popping loop start and end locations must not be the same stack that is used for pushing and popping the calculator's operands!

Conditionals

Loops without a loop exit mechanism are of only limited interest, so consider adding the following commands to the calculator:

Exit if equal - =
Exit if greater - >
Exit if less - <
The loop exit commands pop and compare the top two elements on the stack and exit the loop if they are related in the indicated way. Their use is illustrated by the following examples:
     {E1E2=}   -- an infinite loop (1 = 2 is always false)
     {E1E2>}   -- an infinite loop (1 > 2) is always false)
     {E1E2<}   -- never iterates (1 < 2 is always true).
For a more interesting example, consider writing a calculator program that counts down to zero using memory location zero. In C, we could write this as follows:
	M[0] = vvv;
	for (;;) {
		if (M[0] < 1) break;
		printf( "%d ", M[0] );
		M[0] = M[0] - 1;
	}
Translating this to our calculator language gives:
	E12ES  {  ERE1<  ERP  ERE1-ES  }
Spaces have been used to separate this into 6 blocks of calculator instructions, each of which corresponds exactly to one of the lines of the above C code!

How can we implement these loop exit commands in an interpretive implementation of our calculator? Consider using a simple search for the end of loop whenever a loop exit fails, and merging this with the first implementation of loops given above:

		'{': { ignore begin loop }

		'}': { code for end loop }
		    repeat
			i := i - 1;
		    until s[i] = '{';

		'=': { code for compare on equal }
		    begin
			sp := sp - 2;
			if stack[sp] = stack[sp + 1] then begin
			    { search for end loop }
			    repeat
				i := i + 1;
			    until s[i] = '}';
			end;
		    end;
This code works only for non-nested loops. Adding support for nesting is trivial and is left as an exercise.