One of the central jobs a compiler does for you is translate the control structures of your programs to sequences of assembly language instructions. If you compare the kinds of control structures found in high level languages with those found in machine languages, you will find a world of difference. In high level languages, you have the following local control structures:
Consider the following bit of Ada code (the equivalent C or Pascal code should be fairly obvious):
if x > y then temp := x; x := y; y := temp; endif;This bit of code sorts the two variables x and y into order. This can be translated to the SMAL Hawk assembly language as follows, assuming that the variables x and y are in registers R1 and R2:
CMP R1,R2 ; evaluate condition BLE endif ; test condition MOVE R3,R1 ; body of if statement MOVE R1,R2 MOVE R2,R3 endif: ; end of if statementAll if statements can be translated to assembly language in exactly this way! At the head is a block of instructions that evaluates the boolean expression controlling the if statement and puts its value in the condition codes. Following this is a conditional branch instruction that branches to the end of the if statement if the tested condition is false. The body of the if statement follows this, and following the body is a label marking the end of the if statement. It is, of course, up to the programmer to pick unique labels to mark the end of each if statement.
Consider the following bit of Pascal code (the equivalent C should be fairly obvious):
if x > y then begin z := x; end else begin z := y; endThis bit of code computes the maximum of the two variables x and y and puts it in z. This can be translated to the SMAL Hawk assembly language as follows, assuming that the variables x, y and z are in registers R1, R2 and R3:
CMP R1,R2 ; evaluate condition BLE else ; test condition MOVE R3,R1 ; body of then clause BR endif ; end of then clause else: ; start of else clause MOVE R3,R2 ; body of else clause endif: ; end of if statementThis bit of code illustrates a general rule. There are 6 blocks of assembly language code here:
Consider the following bit of C code:
z = 0; while (x > 0) { z = z + y; x--; }This little block of code is equivalent to z=z*x, so long as the variables are unsigned. This is one of the worst known multiplication algorithms, so it should never be used! In the SMAL Hawk assembly language, this translates to:
CLR z ; code outside of loop loop: ; loop prefix TESTS x ; evaluate boolean BLE endloop ; exit loop if false ADD z,z,y ; loop body ADDSI x,-1 BR loop ; end of loop body endloop:Here again, we can identify distinct blocks of code that correspond to elements of the original code.
Most programming language manuals describe the semantics of the for loop by translation to an equivalent while loop. Thus, for example, the C reference manual defines:
for (expression1; expression2; expression3) statementas equivalent to:
expression1; while ( expression2 ) { statement; expression3; }Similarly, the Pascal for statement:
for a := b to c do s;can be defined as being almost equivalent to:
a := b; while a <= c do begin s; a := a + 1; end;Why is this code only "almost" equivalent to the Pascal for loop? Don't worry about that!
The key thing to remember is that, to hand translate a for loop to assembly language, first translate it to an equivalent indefinite loop (a while loop or an until loop of some kind) and then translate that to assembly language.
In translating a program to assembly language, you must invent many symbolic labels that were not there in the source program. In the above examples, the names loop, endloop, else and endif were used. In general, those names would cause problems in programs with more than one control structure!
In a compiler that generates machine code, the solution to this problem is to generate a sequence of labels such as L000, L001, L002, L003 and so on. In desparation, assembly language programmers can use the same approach, but a bit of care in selecting label names can make programs easier to read.
Inventing sensible label names can be difficult, given a desire to both convey information and to condense labels so they don't mess up the columnar structure of the assembly language code. The usual way to solve these problems is to use a labeling system. Consider the following:
All labels in the same procedure or function begin with the same two or three letter prefix. Within a typical procedure or function, there aren't many control structures, so you can use a suffix to indicate what control structure. For example, if you are writing a function to count array elements that meet some criteria, you might use labels such as the following:
Count -- the function name CntQuit -- the place you go to return from the function CntLoop -- the top of a loop inside the function CntEndl -- the end of a loop inside the function CntElse -- the top of an else clause CntEndf -- the end of an if statementThis kind of systematic naming scheme works well, but in large assembly language programs, even it begins to break down and labels begin to look like alphabet soup.