22C:18, Lecture 9, Fall 1996

Douglas W. Jones
University of Iowa Department of Computer Science

Control Structures

One of the central jobs a compiler does for you is translate the control structures of your programs to sequences of assembly language instructions. If you compare the kinds of control structures found in high level languages with those found in machine languages, you will find a world of difference. In high level languages, you have the following local control structures:

sequential execution
goto
if statements
case or switch statements
definite loop statements
indefinite loop statements

In contrast, the typical machine language supports only:

sequential execution
branch -- equivalent to goto
conditional branch -- equivalent to if-goto

We will not discuss higher level control structures such as procedure and function calls later, and we will also defer the discussion of case and switch statements, focusing here on the more elementary control structures.

The If Statement

Consider the following bit of Ada code (the equivalent C or Pascal code should be fairly obvious):

        if x > y then
	    temp := x;
	    x := y;
	    y := temp;
	endif;

This bit of code sorts the two variables x and y into order. This can be translated to the SMAL Hawk assembly language as follows, assuming that the variables x and y are in registers R1 and R2:

		CMP	R1,R2	; evaluate condition

		BLE	endif	; test condition

		MOVE	R3,R1	; body of if statement
		MOVE	R1,R2
		MOVE	R2,R3

	endif:			; end of if statement

All if statements can be translated to assembly language in exactly this way! At the head is a block of instructions that evaluates the boolean expression controlling the if statement and puts its value in the condition codes. Following this is a conditional branch instruction that branches to the end of the if statement if the tested condition is false. The body of the if statement follows this, and following the body is a label marking the end of the if statement. It is, of course, up to the programmer to pick unique labels to mark the end of each if statement.

The If Statement With an Else Clause

Consider the following bit of Pascal code (the equivalent C should be fairly obvious):

        if x > y then begin
	    z := x;
	end else begin
	    z := y;
	end

This bit of code computes the maximum of the two variables x and y and puts it in z. This can be translated to the SMAL Hawk assembly language as follows, assuming that the variables x, y and z are in registers R1, R2 and R3:

		CMP	R1,R2	; evaluate condition

		BLE	else	; test condition

		MOVE	R3,R1	; body of then clause

		BR	endif	; end of then clause
	else:			; start of else clause

		MOVE	R3,R2	; body of else clause

	endif:			; end of if statement

This bit of code illustrates a general rule. There are 6 blocks of assembly language code here:

code to evaluate the boolean expression and put the result in the condition codes.
a conditional branch to the else clause if the boolean condition is false.
the body of the then clause, executed only if the condition is true. (Algol, Pascal and Ada give this its name; it has no natural name in C).
a branch to the end of the if statement, followed by a label marking the start of the else clause. These, together, make up the assembly language equivalent of the keyword else in a typical high level language.
the body of the else clause.
the label marking the end of the if statement.

The While Loop

Consider the following bit of C code:

	z = 0;
        while (x > 0) {
	    z = z + y;
	    x--;
	}

This little block of code is equivalent to z=z*x, so long as the variables are unsigned. This is one of the worst known multiplication algorithms, so it should never be used! In the SMAL Hawk assembly language, this translates to:

		CLR	z		; code outside of loop
	
	loop:				; loop prefix

		TESTS	x		; evaluate boolean

		BLE	endloop		; exit loop if false

		ADD	z,z,y		; loop body
		ADDSI	x,-1

		BR	loop		; end of loop body
	endloop:

Here again, we can identify distinct blocks of code that correspond to elements of the original code.

The For Loop

Most programming language manuals describe the semantics of the for loop by translation to an equivalent while loop. Thus, for example, the C reference manual defines:

	for (expression1; expression2; expression3) statement

as equivalent to:

	expression1;
	while ( expression2 ) {
		statement;
		expression3;
	}

Similarly, the Pascal for statement:

	for a := b to c do s;

can be defined as being almost equivalent to:

	a := b;
	while a <= c do begin
	    s;
	    a := a + 1;
	end;

Why is this code only "almost" equivalent to the Pascal for loop? Don't worry about that!

The key thing to remember is that, to hand translate a for loop to assembly language, first translate it to an equivalent indefinite loop (a while loop or an until loop of some kind) and then translate that to assembly language.

Lots of Labels

In translating a program to assembly language, you must invent many symbolic labels that were not there in the source program. In the above examples, the names loop, endloop, else and endif were used. In general, those names would cause problems in programs with more than one control structure!

In a compiler that generates machine code, the solution to this problem is to generate a sequence of labels such as L000, L001, L002, L003 and so on. In desparation, assembly language programmers can use the same approach, but a bit of care in selecting label names can make programs easier to read.

Inventing sensible label names can be difficult, given a desire to both convey information and to condense labels so they don't mess up the columnar structure of the assembly language code. The usual way to solve these problems is to use a labeling system. Consider the following:

All labels in the same procedure or function begin with the same two or three letter prefix. Within a typical procedure or function, there aren't many control structures, so you can use a suffix to indicate what control structure. For example, if you are writing a function to count array elements that meet some criteria, you might use labels such as the following:

Count   -- the function name
CntQuit -- the place you go to return from the function
CntLoop -- the top of a loop inside the function
CntEndl -- the end of a loop inside the function
CntElse -- the top of an else clause
CntEndf -- the end of an if statement

This kind of systematic naming scheme works well, but in large assembly language programs, even it begins to break down and labels begin to look like alphabet soup.