Midterm Solutions and Commentary

Part of material for 22C:122/55:132, Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Midterm Exam

Mean   = 9.93
Median = 9.7          X       X
                      X X     X X X X
 _________X_X_X___X___X_X___X_X_X_X_X___X_X_X_X___X_____X___X_X______
   . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15. 16. 17. 18.

Homework 1 to 7

Mean =   28.86                            X
Median = 31.7                             X
                                          X
                                          X
                                          X
                                  X     X X
            X               X     X     X X X
 _____X_____X_X_____X___X___X_X_X_X___X_X_X_X______
   14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34. 36

Overall Midterm + Homework

Mean   = 38.8
Median = 40.3                       X                   X
              X                     X X       X X   X   X   X
 ___________X_X_______X___X_X_X___X_X_X_____X_X_X_X_X___X___X_____X___
   20. 22. 24. 26. 28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52
          C       +      -       B       +      -       A      +

Background

The instruciton set of the example 16-bit computer can be compactly summarized as follows:

   Register 0 is the program counter!

  15___ _______ _______ ___________0
   |_._|_._._._|_._._._|_._._|_._|_|
   |op |r1     |r2     |op'  |sk |st

    0 0  operate      result = ALU( R[r1], R[R2], op' )
                      if SKIP(result, sk) then PC = PC + 1
                      if st then R[r1] = result

    ALU operations include a+b, a-b, b-a, a&b, a|b, and a^b
    SKIP tests are: no skip (00), <0 (01), =0 (10), and >0 (11)
		
  15___ _______ _______ ___________0
   |_._|_._._._|_._._._|_._._._._._|
   |op |r1     |r2     |dd         |
               |ea = R[r2]+dd      |

    0 1  load         R[r1] = M[ea]
    1 0  store        M[ea] = R[r1]
    1 1  load address R[r1] = ea

Questions

Given that a single-port edge-triggered register file is used to implement the CPU, what is the minimum number of clock cycles required to execute an operate instruction that both skips and saves its result? Start your count from the moment the instruction is in the instruction register ready for execution. Document your answer by showing, for each clock cycle, the operations done on the register file, making reasonable assumptions about auxiliary registers and data paths that may be required. (You ought to be able to do this with one or two lines of documentation per cycle!) (4 Points)
We assume instruction fetch, including incrementing the program counter, is already completed:
1. Temp = R[r2]
2. R[r1] = ALU( R[r1], Temp, op' )
3. R[0] = ALU( R[0], 1, add )
We had to assume a temporary register because the register file had only one port, but because the register file is edge-triggered, we can, in one cycle, update any one register, so we don't need a second temporary register.
Nobody got this perfectly, but 5 earned 3.5 out of 4 points. The usual problem was a failure to exploit the edge-triggered character of the registers. 21 others earned more than half credit.
The register transfer logic for my implementation of the example architecture has the following outputs to the control unit:

op
the 2-bit opcode from IR_15-14

skip
a one-bit input formed as a function of the sign of the ALU output, whether the alu output is zero, and the sk field, IR_2-1

a) Give the truth table for the skip tests, as a function of IR₂, IR₁, ALU<0 and ALU=0 (in that order). (2 Points)
```
   IR2  IR1  ALU<0  ALU=0  |  Skip
  -------------------------+--------
    0    0     0      0    |   0
    0    0     0      1    |   0
    0    0     1      0    |   0
    0    0     1      1    |   -
    0    1     0      0    |   0
    0    1     0      1    |   0
    0    1     1      0    |   1
    0    1     1      1    |   -
    1    0     0      0    |   0
    1    0     0      1    |   1
    1    0     1      0    |   0
    1    0     1      1    |   -
    1    1     0      0    |   1
    1    1     0      1    |   0
    1    1     1      0    |   0
    1    1     1      1    |   -
```
When in doubt, write out the truth table in full with the entires in canonical binary order. Many students who had problems here had them because they tried to abbreviate. The don't-care outputs are because the two ALU status lines can never both be true.
19 did well here, while 4 earned partial credit.
b) Given that no microinstruction needs to test both the op and skip outputs from the register-transfer logic, propose a sensible way to allow the microcode to control wich condition is tested and to allow unconditional execution where no condition needs testing. (2 Points)
```
                               /|-----------------/---- op
                 __      _ 2  | |            _    2
                |  |--/-|--/--| |-------/---|---------- skip
            ____|or|  n |_0   | |        2  |_0
           |    |  |           \|-----00
          n/    |__|--/-----   |
           |          n     |  |
           |    ________    |  |
           |   |        |   |  |
            ---| µstore |   |  |
       µaddr   |________|   |  |
              _____|______  |  /2
       ------|>_____|_|___| |  |
                     |  |___|  |
                     |_________|
```
There were no perfect scores; only 7 had 1.5 out of 2 points, and 3 had partial credit. At least 10 students wrote elaborate microcode instead of proposing a way to allow microcode to do the job efficiently. Common smaller errors included using a 2-way multiplexor instead of a 3-way one, or having a one-bit output from the multiplexor.

c) The st bit, IR₀, is not listed here because it can be anded with the register-clock line inside the data half of the system. Similarly, the ALU function select lines can be handled this way, being derived directly from IR_5-3. The problem is, we also need to use the ALU for indexed addressing. Therefore the microinstruction has a 1-bit field indicating whether the ALU should add or use IR_5-3 as its operation. Draw a register-transfer level logic diagram that explains how this output from the control unit is used to make this selection. (2 Points)
```
         IR5-3  IR5-0  other data
           |      |_____  |  |
   from    |           _|_|_ |
 control---|-------o---\___/ |
   unit    |       |    _|___|_
            ------|\   |       |
                  | |--|f ALU  |
           add ---|/   |_______|
                           |
```
4 earned full credit, 6 came very close, and 6 others earned partial credit. The second multiplexor for the data input to the ALU was worth 0.2 points. A surprising number simply routed everything into the ALU and declared that the solution was inside the box, requiring no explanation.
The indexed addressing facilities of this machine could be implemented using either unsigned arithmetic, so index displacements run from 0 to 63, or using signed displacements, so index displacements run from -32 to +31. Given that we can't do both, you must select one or the other. Explain your reasoning in making this selection, with reference to each of the following issues: Data structure sizes, PC-relative addressing, the need for efficient ways to load small constants, and the need for fast addressing of global variables. Your answer should take the form of a brief essay. (4 points)

Signed displacements allow "Load-address R0 R0 disp" to be used for forward and backward PC-relative branches. With unsigned displacements, backward branches could not be done this way. Since small loops are very common, signed displacements are desirable for this!
Indexed addressing is important in accessing fields of an object relative to a pointer to that object. The natural way to arrante pointers to objects is to the first word of the object, in which case, signed displacements would limit the object size to 32 words, while unsigned displacements would allow 64 word objects. Small objects are the most common, so signed displacements should not be a serious liability, and nothing prevents us from using the convention that pointers to an object point to some place in the middle of a larger object, allowing use of signed displacements to address fields of larger objects.
To load a small constant, we could set aside one of our 16 registers and set it to zero, thus allowing constants from 0 to 63 to be loaded, in one case, or from -32 to +31 in the other case. The most important constants are between 0 and 16, so neither approach is obviously better! Furthermore, if we set aside a register that always holds the value 20 in it, for example, we could use signed displacements from this to load constants between -12 and +51, a range that is very likely to encompass the most common negative constants while extending the positive range.
If we reserve one register to always hold zero, we can quickly reference absolute memory locations 0 to 63 or -32 to +31. The former suggests some kind of page zero, while the latter allows access to the same number of static global addresses, but divided between the two ends of memory. All of the suggestions above for small constants apply here too.
In sum, the need to efficiently encode small loops probably will dominate these other issues, leading us to use signed displacements. This poses small problems in the other areas mentioned in the question, but as shown above, there are easy solutions to these problems.
Only one student addressed all the issues; many simply ignored the outline of the issues that should be addressed, hitting on some issues at random. Others appear to have had severe difficulty understanding this architecture. 15 earned more than half credit.
Give that A is an arbitrary fixed 16-bit memory address.
a) How would you encode a jump to the code at this address? It's easy to give the answer to this question in binary! (2 points)
```
  15___ _______ _______ ___________0
   |0.1|0.0.0.0|0.0.0.0|0.0.0.0.0.0|   load PC PC (next location)
   |op |r1     |r2     |dd         |
    _______________________________
   |               A               |   A  
```
2 did well here, 2 were off by one, using a displacement of 1 because they assumed that the PC was incremented later in the process. 6 earned more than half credit, either limiting A to 6 bits, failing to give the effective address computation that gives A or something that indeed loaded an arbitrary address by using a bizarrely inefficient algorithm (in one case, involving up to 1024 instructions!). 7 gave answers that led to an infinite regress (to load the value, first load the value).

b) How would you load this address into a register? This question is a bit harder because it's harder to find a good place to put A in order to load it; as a result, your answer will involve more English and less binary! (2 points)
```
  15___ _______ _______ ___________0
   |0.1| . . . |0.0.0.0| . . . . . |   load r1 PC (some nearby location)
   |op |r1     |r2     |dd         |
```
Here, we use PC-relative addressing to address some nearby word of the program that holds the constant A. This nearby word must be within the 64-word range of addresses reachable by PC-relative addressing, and it must (except in bizarre circumstances) not be in line to be executed. Usually, this means that it will be after a nearby unconditional branch. Such a branch will usually be somewhere nearby, but if it isn't, we can introduce an arbitrary branch just to jump over a batch of large constants.
3 did well here, and 1 more had an off-by-one error in the displacement field. 8 others had more than half credit, including some bizarrely inefficient solutions, vague solutions, or solutions that did not generalize to allow more than one constant. Again, there were solutions involving infinite regress, such as, to load A, first load the address of A into a register. 10 earned no credit.

c) Compare the efficiency of the instruction encodings you explored in parts a and b with the efficiency of the instruction encodings for these same operations on the PDP-11. What you remember from doing assignment 6 ought to suffice as background for this problem. (Space and time efficiency are reasonable grounds for comparison.) (2 points)
The PDP-11 unconditional jump to an absolute address is identical in complexity to that given here -- it is a 16-bit jump instruction, using pc-auto-increment-indirect addressing to get the destination from the immediate following word. The PDP-11 load-immediate instruction is a memory-to-register move using pc-auto-increment addressing for its memory operand. This uses the same number of words of memory as the solution suggested here, but they are always consecutive, eliminating the challenge of finding a nearby place for the constant. It would be harder to make a PDP-11 execute these instructions as quickly as our example machine because of the complexity of the PDP-11 fetch execute cycle.
In sum, these two examples do not expose huge differences in potential between the PDP-11 and the example architecture.
2 gave excellent answers, and 2 earned half credit. Others earned fractional credit while failing to give much if any evidence and offering slapdash arguments, at best. 10 offered blank answers or answers that failed to compare with the PDP-11 and so failed to earn any credit.