22C:18, Lecture 29, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Implementing Control Structures

Control structures pose a serious challenge to the author of a compiler for the example calculator programming language. In discussing ways of meeting this challenge, it is useful to slightly change the focus of the discussion and look at how it might be done in a macro assembler. Consider translating the calculator language to a series of macro calls, for example, translating {ERE1-ESERE<} to the following:

	LOOP
	ENTER
	RECALL
	ENTER
	DIGIT 1
	MINUS
	ENTER
	SAVE
	ENTER
	RECALL
	ENTER
	LESS
	ENDLOOP

It is easy to define macros for most of these:

	MACRO ENTER
	  STORES 0,R2
	  ADDSI  R2,4
	ENDMAC

	MACRO DIGIT x
	  LOAD   R3,R2,-4
	  ADDSL  R3,R3,2
	  SL     R3,1
          ADDI   R3,x
          STORE  R3,R2,-4
        ENDMAC

        MACRO MINUS
	  ADDSI  R2,-4
	  LOADS  R4,R2
	  LOAD   R3,R2,-4
          SUB    R3,R3,R4
          STORE  R3,R2,-4
        ENDMAC

Macros to handle loops and loop exit are harder. It is easiest to see how these macros can be constructed by first examining the case of a limited programming language. For example, a calculator language that allows only one loop. In that case, the following macros will suffice:

	MACRO LOOP
	  LOOPTOP:
	ENDMAC

	MACRO ENDLOOP
	  JUMP   LOOPTOP
	  LOOPEND:
	ENDMAC

	MACRO EQUALS
	  ADDSI  R2,-8
	  LOAD   R4,R2,4
	  LOADS  R3,R2
          CMP	 R3,R4
	  BNE	 .+6
          JUMP   LOOPEND
        ENDMAC

What happens in the above code when a calculator program contains multiple loops? The labels LOOPTOP and LOOPEND get reused for each loop, so assembly of this program will result in multiple label definitions.

Labels in Macros

In the above code, we used BNE .+3 for a local branch within a macro. This is cumbersome programming, but it avoids using a label within a macro, and so, avoids the problems with multiple label definitions that the the LOOP and ENDLOOP macros forced on us. All of these problems can be solved in assemblers that allow definition of labels at assembly time.

In the SMAL assembler, the following basic tools are used to generate labels: First, the assembler allows two macro parameters to be concatenated with no intervening spaces. This is done by using single quotes; when a macro is expanded, single quote marks are eliminated, as demonstrated here:

	MACRO CONCAT a,b
	  a'b
	ENDMAC

	CONCAT CATE,RPILLAR

The call to CONCAT produces the string CATERPILLAR, with no space between the text of the two actual parameters. (Actually, the assembler is a bit more sophisticated about using single quotes than the above text suggests; when it finds a string of consecutive single quotes, it does not eliminate all of them.)

The second feature we'll use for automatic label generation is call-by-value passing of macro parameters. Unlike most macro assemblers, the SMAL assembler has a variety of macro parameter passing modes. Most macro assemblers support only one mode, typically described as call-by-name or call-by-textual-substitution; in this mode, the full text of the actual parameter is substituted, verbatim, for every occurance of the corresponding formal parameter in the macro body when that macro is used.

Call-by-value parameters must be expressions; when a macro is called with a call-by-value parameter, the expression passed as an actual parameter is evaluated and its value is passed. Because this is a macro language, of course, the value must be passed in textual form, so it is converted to decimal and the string of decimal digits that represents the value is passed. Formal parameters that should be handled by value are marked with an equals sign, as demonstrated below:

	MACRO EVAL =x
	  x
	ENDMAC

	EVAL  1+2+3

This macro call expands to 6; had the equals sign been missing, it would have expanded to the string 1+2+3.

All of our macro-based solutions to problems involving multiple labels will be based on the following macro:

	MACRO CAT a,=b,c
	  a'b'c
	ENDMAC

Consider the following example:

	X = 5

	CAT L,X,L:

	CAT JUMP L,X,L

The first call to CAT above creates a line defining the label L5L, while the second call generates a JUMP to L5L. Note that it is perfectly legal to include spaces in a macro actual parameter; commas and semicolons may not be included.

Note that this approach to concatenation is specific to the SMAL language, but that most macro languages include some solution to this problem. For example, the ANSI standard C macro preprocessor uses the ## (double pound sign) symbol for concatenation, so the following C code defines a macro that concatenates its parameters:

	#define cat(a,b) a##b

In older versions of C where this does not work, the following definition creates an equivalent concatenation macro:

	#define subself__(t) t
	#define cat(a,b) subself__(a)b

(At this level, C compilers are very badly standardized; some C compilers that claim to be ANSI compliant, for example, do not support the first of the above definitions, and some fail to support either version of concatenation; furthermore, spaces surrounding actual parameters are handled very irregularly, so the macro call cat( a , b ) is not necessarily equivalent to cat(a,b).)

Multiple Loops Without Nesting

To support multiple loops, so long as we forbid nesting, we can do the following:

	LOOPNUM = 0

	MACRO LOOP
	  LOOPNUM = LOOPNUM + 1
	  CAT    L,LOOPNUM,T:
	ENDMAC

	MACRO ENDLOOP
	  CAT    JUMP L,LOOPNUM,T
	  CAT    L,LOOPNUM,E:
	ENDMAC

	MACRO EQUALS
	  ADDSI  R2,-8
	  LOAD   R4,R2,4
	  LOADS  R3,R2
          CMP	 R3,R4
	  BNE	 .+3
          CAT	 JUMP L,LOOPNUM,E
        ENDMAC

Here, we generate the labels L1T and L1E to mark the top and bottom of the first loop, while L2T and L2E mark the top and bottom of the second, L3T and L3E mark the top and bottom of the third, and so on.

Multiple Loops With Nesting

To support multiple loops with nesting, we need to save the identity of the current loop when we begin a nested loop and then restore the identity of the current loop when that nested loop ends. This can easily be done using a stack. Assuming we have PUSH and POP operations, this involves the following:

	LOOPNUM = 0

	MACRO LOOP
	  PUSH   CURLOOP
	  LOOPNUM = LOOPNUM + 1
	  CURLOOP = LOOPNUM
	  CAT    L,CURLOOP,T:
	ENDMAC

	MACRO ENDLOOP
	  CAT    JUMP L,CURLOOP,T
	  CAT    L,CURLOOP,E:
	  POP    CURLOOP
	ENDMAC

	MACRO EQUALS
	  ADDSI  R2,-8
	  LOAD   R4,R2,4
	  LOADS  R3,R2
          CMP	 R3,R4
	  BNE	 .+3
          CAT	 JUMP L,CURLOOP,E
        ENDMAC

Here, we use LOOPNUM only to assign new numbers to each loop, while CURLOOP is the number of the current loop, used to actually generate labels and jumps to those labels within the code. Whenever we begin any loop, we push CURLOOP onto a stack, and whenever we end any loop, we pop CURLOOP.

Of course, this leaves a problem: How can we implement a stack at assembly time? The answer is simple: Use an array and a stack pointer! We can use assembly time symbols to hold everything: STKPTR can be the name of the assembly time stack pointer, and symbols of the form SxE can hold the values of stack elements, where x is a decimal integer. The following macros use this idea:

	STKPTR = 0

	MACRO PUSH =x
	  CAT    S,STKPTR,E = x
	  STKPTR = STKPTR + 1
	END

	MACRO POP x
	  STKPTR = STKPTR - 1
	  CAT    x = S,STKPTR,E
	END

Translating from Macros to a Compiler

While the ideas discussed above were all presented as macros, they can be translated to a compiler in a straightforward way! We used three data structures above, the stack itself, made up of the variable STKPTR and the array SE, where we represented SE[x] by the identifier SxE.

Similarly, we can view LxT and LxE as two arrays, LT[x] and LE[x], where the LT[0] contains the memory address of the top of a loop, and LE[0] contains the memory address of the bottom of that loop.

The stack SE[x] is only used at compile time, but if our compiler does not make two passes to resolve forward references, we we need to preserve the arrays LT and LE until run time. For example, to jump to LT[x], we could use the following code:

	LIS 	R3,x<<2	 ; get 4x into R3
	LOAD	R3,R3,LT ; get LT[x] into R3
	JUMPS	R3       ; go there!

Assuming the compiler fills the global arrays LT and LE with the addresses of all loop tops and loop bottoms, and that the compiler correctly generates code to use the correct loop top or bottom, this approach allows single-pass compilation of our example calculator language, or in fact, of any other programming language!