Chapter 4, Forward References

Two-pass assembly and other tricks

Part of the notes for 22C:50, Introduction to System Software
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

What is a Forward Reference?

The material covered in the previous chapters provides a sufficient basis for writing a simple assembler, but the assembler would have an important shortcoming. Forward references are not resolved. That is, the use of a symbol at one point in the text which is not defined until some later point gives the wrong result. Backward references, on the other hand, are handled correctly. Consider the example in Figure 4.1.

1                    |X = 1
2                    |Y = 2
3  0000: 01          | Z: B X
4  0001: 02          | W: B Y
5  0002: 00          |    B Z
6  0003: 01          |    B W

Figure 4.1. An assembly listing.

This example is presented in a very typical assembler output format. On the right half of each line, the assembler has listed the source text it was processing; here we the example assembler placed a vertical bar between the halves of the line. On the left part of each line, the assembler listed the source line number, and if processing the sorce text caused some value to be placed in memory, the assembler added the memory address and the value placed there.

In Figure 4.1, all symbols used as operands were defined on lines prior to the point they were used, so a trivial assembler built according to the model described in Chapters 1 and 2 will be able to produce this output.

Figure 4.2 shows the output which might result when a simpleminded assembler is given a file containing forward references, that is, uses of symbols prior to the point where they are defined.

1                    |X = Y
2  0000: ??          | Y: B Z
3  0001: ??          | Z: B X

Figure 4.2. Incorrect assembly of forward references.

Note the error messages in the listing! Even though each of the symbols used in this input does indeed receive a definition, the definitions of the symbols used here appear after their uses! During the processing of line 1, for example, the symbol Y is not yet defined, since the definition follows on line 2. Similarly, in processing line 2, Z is not defined. The error on line 3 is somewhat more complex! X is defined on line 1, but it is defined as being equal to Y, and Y was unknown at the time line 1 was processed, so X was defined to an undefined value. The errors on lines 1 and 2 are said to be due to forward references, that is, to references to symbols which will only be defined later in the program. The error on line 3 can be described as being indirectly due to a forward reference.

The programmer who wrote the source code used in Figure 4.2 probably expected the output shown in Figure 4.3.

1                    |X = Y
2  0000: 01          | Y: B Z
3  0001: 00          | Z: B X

Figure 4.3. Correct assembly of forward references.

Similar forward reference problems occur in all high level languages. For example, consider the problems of compiling the C code shown in Figure 4.4. This small program fragment computes the sum of all odd integers between 1 and 10 inclusive.

/* 1 */ j = 0;
/* 2 */ for (i = 1; i <= 10; i++)
/* 3 */   if (i & 1)
/* 4 */      j = j + i;
Figure 4.4. Input to a compiler

A compiler, on reading the code from Figure 4.4, will arrive at an intermediate level description of the same program before it generates machine code. This intermediate representation might be equivalent to the C code given in Figure 4.5.

/* 1  */   j = 0;
/* 2a */   i = 1;
/* 2b */ loop_top:
/* 2c */   if (!(i <= 10)) goto loop_exit;
/* 3a */   if (!(i & 1)) goto end_if;
/* 4  */   j = j + i;
/* 3b */ end_if:
/* 2d */   i = i + 1;
/* 2e */   goto loop_top;
/* 2f */ loop_exit:
Figure 4.5: Intermediate version of code from Figure 4.4

In introductory computer science courses, where languages with goto statements are taught, their use is usually forbidden, and frequently, the languages used in introductory courses do not even contain goto statements. One job of a compiler is to translate the high-level control structures of the high level language to more primitive control structures, and this translation is usually done by replacing each high-level control structure with some combination of labels, conditional gotos and unconditional gotos.

In Figure 4.5, for example, the simple if statement from line 3 of Figure 4.4 has been replaced by a pair of new lines, a conditional goto on line 3a, and a label on line 3b. In effect, these combine to say "if the condition specified in the if statement is not met, goto the label after the if statement body."

The for loop on line 2 of Figure 4.4 is even more complex, being divided into 6 lines in Figure 4.5. Line 2a is the initialization part of the for loop, line 2b is the label to which control transfers at the start of each iteration, and line 2c is the loop termination test, done before each iteration of the loop body. After the loop body, we find the loop control variable update on line 2d, the control transfer back to the top of the loop on line 2e, and the loop exit label on line 2f.

Looking at the gotos and labels used in Figure 4.5, we find that only one, loop_top, is defined before it is used. The others, loop_exit and end_if are defined after the lines that reference them. Therefore, we can conclude that the compilation of this simple program fragment will produce two forward references that must be resolved before the resulting machine language can be executed. Many compilers simply leave this problem to the assembler, producing output such as the DEC PDP-8 assembly language code shown in Figure 4.6.

	CLA		/ 1  AC=0
	DCA	J	/ 1  j=AC		a = 0;
	NL0001		/ 2a AC=1
	DCA	I	/ 2a i=AC		i = 1;
L00019,			/ 2b		    loop_top:
	CLA CLL		/ 2c AC=0
	TAD	I	/ 2c AC=AC+i
	TAD	[-10]	/ 2c AC=AC+(-10)	
	SZL		/ 2c skip if no carry	if (!(i <= 10))
	JMP	L00020	/ 2c			   goto loop_exit;
	NL0001		/ 3a AC=1
	AND	I	/ 3a AC=AC&i
	SNA		/ 3a skip if AC nonzero	if (!(i & 1))
	JMP	L00021	/ 3a			   goto end_if;
	CLA		/ 4  AC=0
	TAD	J	/ 4  AC=AC+j
	TAD	I	/ 4  AC=AC+i
	DCA	J	/ 4  j=AC		j = j + i;
L00021,			/ 3b		    end_if:
	ISZ	I	/ 2d i=i+1		i = i + 1;
	JMP	L00019	/ 2e			goto loop_top;
L00020,			/ 2f		    loop_exit:

Figure 4.6: PDP-8 Assembly language translation of Figure 4.4

When compilers generate labels as they expand control structures, they typically introduce them sequentially, making no effort to use sensible names. In the example shown in Figure 4.6, the compiler simply generated the label L00019 as it passed the loop top, L00020 as it generated the branch to the end of the loop, and L00021 as it generated the branch to the end of the if statement. Of these, L00020 and L00021 are both forward references, since the assembler will need the values of these symbols prior to reaching the points where these symbols are used as labels.

The Two-Pass Solution

Perhaps the oldest, and certainly the most general solution to the forward reference problem rests on the observation that all of the referenced symbols should be defined after processing a program even though they may not be defined when they are needed during processing. Therefore, if the output of the assembler after processing the input once is discarded, except for the contents of the symbol table, and then the input is processed again, with the symbol table already loaded.

If parse_program is the top-level entry point to a recursive descent parser for the assembly language syntax, and st_init initializes the symbol table, the body of the main program for a two-pass assembler might have the form shown in Figure 4.7:

begin { main program }
     reset( input );
     rewrite( output );
end { main program };
Figure 4.7. A two-pass assembler in Pascal.

The code in Figure 4.7 assumes that parse_program does not initialize the symbol table, and it assumes two functions from the standard Pascal library, reset and rewrite. The reset operation re-opens a file for reading from the beginning, while the rewrite operation re-opens a file for writing, erasing any material previously stored in that file. C programmers can use the fseek(stdin,0,0) and fseek(stdout,0,0) to achieve the same effects.

The biggest disadvantage of this brute force approach to two-pass assembly is that any computation which went into producing the listing or object code for the first pass is completely wasted! Thus, for example, if given the input used to produce Figures 4.2 and 4.3, it will first produce and discard the output shown in Figure 4.2, complete with error messages, before producing the output shown in Figure 4.3.

The extra computation needed to produce the listing which is discarded after the first pass can be suppressed by modifying the listing generator package (probably tightly coupled to the lexical analyzer) to include provisions for turning the listing on and off. During the first pass through the source, the listing should be suppressed, while during the second pass, the listing should be enabled. This might be done by a parameter to parse_program telling it whether a listing ought to be generated, or it might be done by a parameter on the initializer for the listing and or lexical analysis package, as illustrated in Figure 4.8.

void main( int argc, char * argv[] );
     get_args( argc, argv );
     lex_init( FALSE );
     lex_init( TRUE );
} /* end main */

Figure 4.8. An improved two-pass assembler, in C.

In Figure 4.8, instead of using the C input streams stdin and stdout, we have included provisions for a function get_args that is charged with figuring out what input and output files to use. This may process the command line arguments that were given to the main program using the parameters argc and argv, it may use the standard input and output streams stdin and stdout, or something else. Whatever it does, we assume that it informs the lexical analysis package of the streams to be read and written, and we assume that lex_init() is responsible for opening or reopening these streams, as required. The parameter to lex_init() informs the lexical analyzer whether or not output is to be generated during the following pass through the input.

In the simplest approach, this parameter is only used to control each statement that actually writes output to the output file. The parameter can be used far more effectively if it is used to turn off all computation involved in generating the listing when no listing is required. In the extreme case, possible only if the parameter is also passed to the parser, it can be used to suppress the parsing of all fields of each input line which do not matter during the first pass. Thus, most operand fields would be treated as comments during the first pass!

The extreme case mentioned in the previous paragraph leads to what has become the conventional view of two-pass assembly; in this view, the two passes are processed by entirely different parsers. During the first pass of such an assembler, only labels and those opcodes required to determine the operand size are processed, while all that remains is skipped as if it was commentary. Only during the second pass is the parser for the complete grammar used.

The second pass of an extreme two-pass assembler can completely ignore the label fields of instructions, paying attention only to the opcode and operand fields. There is some advantage, however, checking the label definitions during the second pass, since in complex assembly languages, doing so allows a class of assembly errors (called phase errors) to be detected. Phase errors are those where the value of the location counter is advanced differently during the two passes. This is not possible in the simple example assembly language being discussed, but features that we will add later will make it possible.

Sometimes, the first pass of an extreme two-pass assembler produces an output file which is then read in by the second pass. The advantage of this is that the first pass can record, with each line of input, the positions of some or all of the lexemes on that line and some of the results of parsing. For example, in the example assembly language, each line might be sent from one pass to the next with a record indicating whether the line was a definition, or a statement, and, for statements, whether the line contained a label and what the value of the location counter was prior to processing that line. The use of such an intermediate file can almost completely eliminate redundant computations in the two passes, but it adds to the input/output burden of the system.

The advantage of writing assemblers with a separate procedure for each pass is that this leads to very fast assembly, since no flag must be tested to determine what to evaluate. On the other hand, this approach results in two very similar procedures which must be maintained in parallel, and which occupy almost twice the memory space occupied by the single parse_program procedure we have used. Usually, we can only afford the overhead of maintaining two different versions of this parser if the language is extremely simple! The opportunities for programming errors escalate immensely as the complexity of the language grows!

It should be noted that, independently of how the two passes are implemented, two-pass assemblers cannot resolve all possible forward references! It is possible to construct chains of indirect forward references using symbol definitions which can only be resolved by additional passes through the source. The example in Figure 4.9 illustrates such a chain.

A = B
B = C
C = 1
  W A

Figure 4.9. Code requiring 3 passes to assemble.

After the first pass through the code in Figure 4.9, the assembler has a definiton for C, but both A and B would still be undefined. After the second pass, A would still be undefined, but because C was already defined, B will have been defined. Only after the third pass will all of A, B and C be defined! In fact, this assembly language allows programs to be written which require a number of passes equal to the number of definition lines in the program. Very few assemblers bother to support more than two passes, and some even forbid any indirect forward references.

The Chaining Solution

For some computer architectures, it is possible to write an assembler which solves the forward reference problem and yet does not require multiple passes. Such one-pass assemblers aren't fully general, they do not solve all forward references, but they solve only those that routinely occur in real programs. Thus, artifically convoluted code such as was given in Figure 4.9 is completely ignored, but solid attention is devoted to the problem of generating correct code for forward branches such as those illustrated in Figure 4.6.

The most naive way to write a one-pass assembler is to maintain two different classes of associations with symbols in the symbol table. For those symbols that have already been defined, we maintain the symbol's value. For those symbols where the definitions have been encountered but definitions are not yet known, we maintain a list of all the places that symbol's value should be put in memory when a definition is eventually encountered. Consider the example in Figure 4.10.

     B X	; 1
     B Y	; 2
  X: B Y	; 3
  Y: B X	; 4

Figure 4.10. An example program.

After assembling line 1 of the program shown in Figure 4.10, the symbol table will record that the value of X should be put in byte 0. After assembling line 2, the symbol table records that, in addition, the value of Y is needed in byte 1 of memory. As line 3 is assembled, a value for X is found (it is used as a label, so it gets the value of the location counter, 2, and this value is put in location 0 because the symbol table recorded this need.

Therefore, after assembing line 3, the symbol table holds the information shown in Figure 4.11:

Symbol   Value   Where Needed

  X        2
  Y     unknown    1 2

Figure 4.11. The symbol table after processing line 3 of Figure 4.10.

That is, we now know the value of X, and we need values for Y to put in memory locations 1 and 2.

It is relatively easy to patch up the contents of memory when definitions are found, but it is difficult to patch up the human-readable assembly listing. Patching up the listing is only possible under file abstractions that support random-access, as in C, where the ftell() primitive can be used to remember the position in an output stream, and the fseek() primitive can be used to return to that position to change the contents of the stream. Most one-pass assemblers make no effort to patch up the listing file!

This scheme does not generalize! For example, suppose our assembly language allowed arbitrary expressions such as a/2+b as operands. If a and b are already known, the assembler can simply divide the value of a by 2 and then add the value of b, but if one or both are forward references, there is no simple address we can record in the list associated with a or b saying "when you get a value for this symbol, put it here in memory also."

The question remains, where should the list of addresses that need to be fixed stored? The classical solution is to keep this list in the actual memory locations that need repair! The resulting solution to the forward reference problem is called chaining. Chaining is commonly used in one-pass compilers and in linking loaders. It is not as common in assemblers because it interferes with the production of a nice assembly listing, and assembly language programmers frequently need to examine the listing during the debugging process.

If we use chaining to assemble the example program from Figure 4.10, the a snapshot of the assembly process after assembling line 2 might appear as is illustrated in Figure 4.12.

1  0000: 00          |    B X        Symbol table
2  0001: 01          |    B Y
-----------------------------          X used at 00
                       X: B Y          Y used at 01
                       Y: B X

Figure 4.12. Partial chained assembly of the code from Figure 4.10

After processing one more line of input, the state of the assembly process would advance as shown in Figure 4.13.

1  0000: 02          |    B X        Symbol table
2  0001: 01          |    B Y
3  0002: 01          | X: B Y          X defined as 02
-----------------------------          Y used at 02
                       Y: B X

Figure 4.13. Partial chained assembly of the code from Figure 4.10

In advancing from Figure 4.12 to 4.13, the assembler first found X used as a label when the location counter was 2. Therefore, X has the value 2. Since X had previously been used at location 0, the new value for X is stored in location 0 before it is stored in the symbol table. After this is done, the assembler discovers that it needs a value for Y, but it does not yet have one. Therefore, it stores the previous chain address from the symbol table and updates the symbol table to point to the new location where Y's value is needed. This is called adding a link to the chain for Y. After the assembly is completed, the state of the system is as shown in Figure 4.14.

1  0000: 02          |    B X        Symbol table
2  0001: 03          |    B Y
3  0002: 03          | X: B Y          X defined as 02
4  0003: 02          | Y: B X          Y defined as 03

Figure 4.14. Partial chained assembly of the code from Figure 4.10

In advancing from the state shown in Figure 4.13 to the state shown in Figure 4.14, the assembler first finds a definition of Y. Since Y has been used before, it follows the chain back through memory from location 2 to location 1, replacing each with the new value for Y, 3. This is called unwinding the chain Then the assembler updates the symbol table to reflect the new definition of Y, and finally, it assembles the value of X into location 3.

The symbol table used to support chaining must associate an additional piece of information with each value in the table, a flag indicating whether that value field contains the address of the most recent use of the symbol or the value of the symbol. Aside from this, we need not change the symbol table. If a symbol table entry contains the address of the most recent use of the symbol, we say that the entry contains the head of the chain of addresses where the symbol was used.

The only real design problem encountered in implementing chained forward reference resolution involves how the chain list of forward references is terminated. One obvious solution, which costs memory, is to store a count of the number of previous forward references in the symbol table. The alternative is to somehow mark the end of the chain. In high level list processing languages, a reserved value is almost always provided for this purpose, frequently called nil or null. In an assembly language, on the other hand, it is hard to reserve a value, since all possible values are frequently legitimate addresses. There is, however, one value which can never continue a chain, and that is the address of the current link in the chain. This solution has been used in Figures 4.12 and 4.13.

Another problem with chaining is that it only works for forward references which involve enough bits to store a pointer. The example used in Figures 4.12 through 4.14, with single byte forward references, would not work if the memory continued beyond 256 bytes! If an address is 16 bits, allowing addressing of 64K memory locations, then each chain link must occupy 2 bytes, and single byte quantities may never involve forward references. If an address is 32 bits, allowing addressing of 4G memory locations, each link in the chain must occupy 4 bytes and forward references could never be used to set the values of individual bytes or words.

Chaining can be quite messy on machines with large words and small addresses; for example, on the PDP 10, where the word size was 36 bits but an address was only 18 bits. On such a machine, it is usual to limit the use of forward references to a standard format, for example, the position of the memory address field in instruction words (typically the least significant half of a word). If this is not done, there must be a record, for each link of the chain, of what field in the word is referenced by that link.


The reader should be familiar with the following general terms after reading this section:

forward references
indirect forward references
two-pass assembly


  1. Modify the two-pass assembler given in Figure 4.7 so that it can handle the input shown in Figure 4.9. The result will be a three-pass assembler.

  2. Consider the following code in our example assembly language:
    ; an example bit of assembly code
    ROOT:    W      FATHER
    FATHER:  W      SON1
             W      SON2
    SON1:    W      NIL
             W      NIL
    ; ----------------------
    SON2:    W      GRANDSON
             W      NIL
    	 W      NIL
    NIL      =      0

    a) Assemble this code by hand, using your freedom to look around and do things in a human way. Show the result in a reasonable listing format.

    b) Assemble this code carefully following the two pass model, and show the symbol table at the point marked by the dashed line during each pass..

    c) Assemble this code carefully following the chaining model, and show the symbol table and assembled code in memory when the assembler has reached the point marked by the dashed line.

  3. The intermediate file used for communication between the first and second pass of an extreme two-pass assembler will frequently be a file of fixed-length records. Assume that the maximum allowed length of an input line is 80 characters.

    a) Give a Pascal or Ada record type definition or a C or C++ structure definition for an appropriate record to be written to the intermediate file between passes. Assume that the information in each record will include the entire source line, the location counter for that line, a numeric code for the op-code, and the value of the operand field, if it was already evaluated during the first pass.

    b) How many bytes will your record type or structure from part a occupy? Assume a 16 bit word. (This is an unrealistic word size, and in fact, an unrealistic question! Extreme two-pass assemblers were most common in the era of 36-bit mainframes with 6-bit bytes.)

    c) Compare the cost of using such an intermediate file with the cost of having the second pass simply re-read the source file as in Figure 4.7. Specifically, how many more bytes will be transferred to or from files (per line assembled) using an intermediate file? What percentage increase in the amount of input/output per line does this represent? Assume that the average assembler input line is 45 characters, that the average listing line is 70 characters.

  4. In the era when few computers had more than 64K bytes of memory, chaining assemblers almost all embedded the chain of locations which need to be fixed up in those locations; why was this done? In this era of large inexpensive RAM, is there still any reason to do so?

  5. Consider the following two design alternatives:
    1. Chaining is done inside the symbol table package, perhaps as defined in Figure 3.8, so that a call to "define", as a side effect, unwinds the chain, and a call to "lookup" may, as a side effect, add a link to a chain.
    2. Chaining is done in the main body of the assembler statements which implement the B or W assembly instructions, so that the symbol table package simply treats the "defined" flag as a field of the value stored with each symbol and leaves it to the user to figure out what to do with it.

    a) Which of these is more compatible with the abstract data type or object oriented design philosophies?

    b) Which of these allows easy enforcement of which kinds of forward references are legal and which are not because they cannot be chained?

  6. Write a procedure called "unwind" which takes, as parameters, the index of the symbol table entry holding a symbol representing a chain head and the value to be placed in all entries of the chain.

  7. If expressions are allowed in the operand fields of instructions, explain why forward references cannot be allowed in such expressions when chaining is used.


The classic detailed description of how to write an assembler occurs in Section 3.2 of

Systems Programming by J. J. Donovan (McGraw Hill, 1972).
This includes a marvelous collection of ill-structured and completely anti-object oriented flowcharts of the two passes of an extreme two-pass assembler. Since this book predates many of the ideas of structured programming and almost all of the ideas of abstract data types, it is primarily of historic interest. Most later textbook descriptions of how an assembler works are, sadly, based on Donovan's description.

The extreme two-pass approach without an intermediate file is clearly but briefly described in Section 3.8 of the text

PDP-11 Structured Assembly Language Programming by R. W. Sebsta (Benjamin-Cummings, 1985).
This is particularly sad because the PDP-11 assembler, MACRO-11, used a 2-pass model quite close to that illustrated in Figure 4.7.

The two-pass assembler given in Appendix B of Gust's text,

Introduction to Machine and Assembly Language Programming by Gust (Prentice-Hall, 1986)
is a well written Pascal example illustrating a complete extreme two-pass assembler. Pass 1 is about 3 pages of code, pass 2 is about 5 pages long.

Figure 2.4 and the accompanying text in

System Software, An Introduction to Systems Programming by L. L. Beck (Addison-Wesley, 1985)
contains a clear description of the extreme two-pass assembler (with an intermediate file). An extended description of this approach is also included in Chapter 8 of J. L. Peterson's text
Computer Organization and Assembly Language Programming by J. L. Peterson (Academic Press, 1978)
Figure 7.7 and the accompanying text in Peterson's book provides a clear description of chaining.

Chapter 13 of The text

Assembly Language for the PDP-11 by C. Kapps and R. L. Stafford (Prindle, Weber & Schmidt, 1981)
includes a discussion of two-pass assembly and the forward reference problem. This discussion describes the PDP-11 assembler as using essentially the approach shown in Figure 4.7. Kapps and Stafford describe the use of chaining as a one-and-a-half pass assembly process, since, in effect, unwinding a chain involves something like a partial pass through the output of the assembler.