Assignment 2, Solved

Part of the homework for 22C:50, Summer 2003
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Homework 2

Clearly describe the errors mentioned at the end of the assignment for MP1. (Don't explain how to fix them, just explain what the EAL assembler, version 0.0, didn't do correctly.)
```
    44 0024: 00    |    B #0G                ; illegal hex digit
                            ^
comment expected
    45 0025: 16EA    |    B 9876543210         ; number way too large
    46 0026: 159    |    B 345                ; number too large
```
The lines in question are quoted from the assembly listing above. On line 44, the problem is that, to a human reader, #0G looks like it ought to be one lexeme, yet the lexical analyzer broke it in two when it found the letter G. It would have been nicer if the lexical analyzer had broken it the way a human reader does, counting it as one lexeme and then complaining about the illegal use of G as a hexadecimal digit.
On line 45, the number 9876543210 is obviously larger than can be represented as a 32-bit integer. The assembler ought to have noticed this and complained that the number was too large for it. It didn't. Then, to compound the injury, the output shows a 16-bit value where only an 8-bit value should have been shown.
The second error noted on line 45 is repeated in line 46, this time with an operand value that requires 9 bits. This clearly shows that the assembler doesn't check to see if one-byte values are within the range of values that fit in one byte.

Redo the definition of <number> from Figure 2.18 (on page 20 of the notes) to handle the requirements of MP1.

	<number> ::= # <basenum>
	          |  <digit> { <digit> } [ # <basenum> ]
	<basenum> ::= <letterordigit> { <letterordigit>}

Redraw figure 2.19 (on page 21 of the notes) to handle the requirements of MP1.

          ________________________________________________ 
        /                                                  \
start   \                                     identifier   /|
  -------->----------(letter)-------->--------------------  |
    /         \  \             /            \               |
    \         /   |           |\            /|              |
      (blank)     |           |  -(letter)-  |              |
                  |            \            /               |
                  |              -(digit)--                 |
                  |\                          number       /|
                  |  (#)----(hexdigit)---->---------------  |
                  |       /            \                    |
                  |       \____________/                    |
                  |\                          number       /|
                  |  ----(digit)---------->---------------  |
                  |    /         \ \          number       /|
                  |    \_________/   -(#)---(letordig)----  |
                  |                       /            \    |
                  |                       \____________/    |
                  |\                          punctuation  /|
                  |\ ---------(:)--------->--------------- /|
                  |\ -----(;)----------------------------- /|
                   \ --------------(line end)------------- / 
                     -(end of file)-----------------------

In the EAL assembler, the symbol table package includes a definition of table that holds the stringpool handles of symbols. Where are the associated values stored? Explain, in terms of data abstraction, why it might be a good idea to store the values somewhere else instead of storing them in a data structure declared locally to the symbol table package?

In the abstract, a symbol table associates symbols with values, not with symbol handles! A search through the code for the assembler shows that parser.h holds a table value_table which is declared with the comment "the value field of the symbol table". This is a table of items of OBJECT_VALUE, and examples of its use show that it is indexed by values of type SYM_HANDLE.
This makes a little sense! Only the parser knows or cares about the types of values that are associated with symbols, so that half of the symbol table involves information which the symbol table abstract data type really doesn't need to know about.
A large fraction of the class ended up going on at length about the nature of symbol handles and how symbol handles are the values associated with symbols by the symbol table package. That is really evading the question, since from the outside, to a user of the assembler, the concept of "value associated with the symbol" clearly does not refer to symbol handles!
If you examine the result of assembling lines 17 to 21 of the test file distributed with the EAL assembler, you can determine, without ever looking at the source code, how the EAL assembler processes forward references. How does the EAL assembler process forward references, and what is it about lines 17 to 21 that discloses this?
```
    17 000A: 0040  |    W   LC     ; use of something defined later
    18             |LC  =   32     ; a definition
    19 000C: 0020  |    W   LC     ; use of that definition
    20             |LC  =   64     ; a redefinition
    21 000E: 0040  |    W   LC     ; use of the redefinition
```
The lines in question are quoted above. The backward references on lines 19 and 21 each use the most recent definiton from the immediately preceeding lines, so these tell us nothing special, but the forward reference on line 17 tells all.
With chaining, we expect the first definition encountered during the assembler's one pass to be used in resolving all forward references. This would have predicted that line 17 would use the definition from line 18, and it didn't.
With two passes, the forward references should use the final value from the first pass, in this case, that is the value given on line 20, and this is exactly how line 17 was assembled, so this must be a two-pass assembler.