Exam 3: Final

Part of the homework for 22C:60 (CS:2630), Spring 2014
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Exam 3

Mean   =  9.58                         X
Median = 10.2                          X X
               X     X         X   X   X X   X
               X   X X X       X   X   X X   X
             X X   X X X     X X   X   X X X X   X
  2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15. 16. 17. 18

Exams 1 to 3

Mean   = 22.45     X       X
Median = 23.2      X       X     X
                   X       X     X X X
               X   X       X X   X X X       X
             X X   X   X   X X   X X X X X X X   X
  8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 34. 36. 38. 40

Machine Problems 1 to 5

                             X       X
Mean   = 17.29               X     X X
Median = 18.5                X     X X       X
                 X X       X X X   X X X X   X X
                 X X       X X X   X X X X X X X X
  2 . 4 . 6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28

Top 10 of 12 Homework Scores

                                  X     X
Mean   = 22.95                X   X X   X
Median = 22.9                 X   X X   X       X X   X
                      X       X X X X   X   X X X X X X
 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 30

Total Scores

Mean   = 62.70      X               X
Median = 62.8   X X X       X     X X
                X X X     X X     X X         X
      X       X X X X X   X X X X X X X X     X
 36. 40. 44. 48. 52. 56. 60. 64. 68. 72. 76. 80. 84. 88. 92
      + +|- -   C   + +|- -   B   + +|- -   A   + +

Exam Solutions and Commentary

Global background: Many species of small hawks are called sparrowhawks. The Sparrowhawk subset of the Hawk architecture has no long instructions. For example, it has just LOADS to load from memory, and no LOAD instruction. Similarly, there is no LIL for long constants, but combinations of LIS and ORIS still work. Aside from being the basis of nice exam questions, the Sparrowhawk is an appropriate architecture for low price, low performance computers.

If the Hawk load instruction is used on a Sparrowhawk, it causes an undefined instruction trap. A trap handler for undefined instructions on the Sparrowhawk can be written that implements the full Hawk instruction set. Macro definitions in sparrowhawk.h allow some SMAL Hawk instructions to assemble into using only Sparrowhawk instructions.

  1. The trap handlers on the Sparrowhawk must be rewritten to avoid long instructions. Here is the start of the undefined instruction trap handler for the Sparrowhawk; it is based on the code in section 13.3 of the Hawk manual, but with significant changes:
    .       =       INSTRTRAP
            CPUSET  R2,TSV
            LIS     R2,(INSTRSAVEAREA + svR1) >> 16
            ORIS    R2,(INSTRSAVEAREA + svR1) >> 8 & #FF
            ORIS    R2,(INSTRSAVEAREA + svR1)      & #FF
            STORES  R1,R2
            LIS     R1,INSTRAPSAVE >> 8
            ORIS    R1,INSTRAPSAVE & FF
            JUMPS   R1

    The above is exactly 8 halfword Sparrowhawk instructions. INSTRTRAPSAVE must be a label in the first 64K of the address space. INSTRSAVEAREA must be a label in the first 8 meg of the address space, and it must refer to a save area where the fields svPC, svR1 through svR15 are consecutive words, and svPSW is also a field of the save area.

    a) When control reaches INSTRTRAPSAVE, only R1 has been properly saved in svR1. Some of the registers that must still be saved are obvious. R3 through R15 must be saved in svR3 through svR15. What other registers must be saved, and in what fields of the save area? (2 points)

    ____TSV______ goes in _____svR2______

    ____TPC______ goes in _____svPC______

    ____PSW______ goes in _____svPSW_____

    _____________ goes in _______________

    1/5 did perfect work, 1/5 earned no credit. Among the others, the most common mistakes were to suggest that R2 should be saved in svR2 when in fact it has already been saved in TSV, or to suggest that PC should be saved in svPC when in fact the trap already saved it in TPC. There was no credit or penalty for saving TMA in svMA, since the TMA register is irrelevant during illegal instruction traps but saving it routinely in all trap handlers is reasonable.

    b) You might be tempted to begin the code at INSTRTRAPSAVE as follows:

    INSTRTRAPSAVE:  ; on arrival R2 points to svR1
            ADDSI   R2,4
            STORES  R3,R2   ; save R3 in svR3
            ADDSI   R2,4
            STORES  R3,R2   ; save R3 in svR3

    This does not work because it irreparably destroys information that must be saved. What crucial unsaved information is destroyed by the above code, and what instruction destroys it? (1 point)

    ____PSW______ destroyed by _____ADDSI_____

    Only 1/16 got this, but 1/3 understood that the ADDSI was the problem. 1/3 earned no credit.

    The exam contained an error here. The first ADDSI R2,4 should have been ADDSI R2,8. The result of this error is that R3 is saved in svR2. This does no irreparable damage since the improperly asved data could be moved to the correct save location later; therefore, it does not change the correct answer, but some people seem to have been distracted by it.

  2. Background: The Hawk CPU accesses the MMU through a pair of special registers, MMUDATA and TMA. In the Hawk, these are accessed using the CPUGET and CPUSET instructions.  

    a) How can a Hawk operating system prevent unauthorized access to these registers. (1 point)



    1/4 got this. A few more implied it vaguely for partial credit. Wrong answers were very strange. How, for example, does the use of multiple stacks change anything?

    b) Alternately, the MMU interface registers could have been addressed as I/O device interface registers. How can a Hawk operating system block access to device interface registers? (1 point)



    1/16 got this. 1/5 earned partial credit by suggesting changing the access rights to the I/O interface registers -- at least this uses the MMU to control access. Again, there were strange suggestions about the use of multiple stacks.

    c) Alternatively, the MMU could have been accessed as if it was a coprocessor. How can a Hawk operating system block unauthorized access to a coprocessor, if it can? (1 point)



    1/5 got this. Nothing in the Hawk coprocessor interface allows control over access to the coprocessors. Turning off a coprocessor does nothing, since user programs are free to use COSET to turn it back on.

  3. A quick question: What is the relationship between the virtual and the physical address when the MMU is turned off? (1 point)



    7/10 got this right. A few more did not say this outright but gave answers that implied it, for partial credit. The most puzzling wrong answers implied that memory addressing was impossible with the MMU turned off. This would make the machine useless since the MMU is initially off when the machine starts, and in the case of the Hawk, the MMU is turned off every time there is an interrupt or trap.

  4. Background: Consider a machine with a cache memory able to hold copies of 1024 of the more recently referenced words from main memory. It uses the least significant 10 bits of the word address to select an entry in its internal RAM, so technically, it is a 1-way set associative cache. Now consider this program fragment:
        int a[1024][1024]; /* a 1024 by 1024 array of integers */
                           /* a[i] is an array of 1024 integers */
        for (i = 0; i < 1023; i++) {
           for (j = 0; j < 1024; j++) {
              if (a[i+1][j] == (a[i][j] + 1)) a[i][j] = a[i+1][j];

    a) Would this program run faster or slower if we change the order of the array subscripts, that is, replace all occurences of a[i][j] by a[j][i] and similarly, replace a[i+1][j] by a[j][i+1]?

    Faster or Slower? (0.5 point) ___Faster_____

    1/2 got this. The reasoning is straightforward. The inner loop body references both a[i+1][j] and a[i][j], and given the storage layout of the array a these will be exactly 1024 words apart, so the least significant bits of the word address will be the same, implying that the cache will only be able to store one or the other. This defeats the effectiveness of the cache. If, instead, the code used a[j][i] and a[j][i+1], the two words are consecutive and therefore the addresses differ in their least significant 10 bits.

    b) Would this program run faster or slower if we change the declaration to int a[1025][1025], assuming that our CPU allows multiplication by 1025 to be almost as fast as multiplication by 1024.

    Faster or Slower? (0.5 point) ______________

    1/6 got this. The central issue here, assuming that multiplying by 1025 is fast, is that the addresses a[i+1][j] and a[i][j] now differ by 1025, so they are different in their least significant 10 bits, allowing the cache to do its job.

    c) One phrase in the above description of a cache required careful wording. It would have been tempting to write "hold copies of the 1024 most recently referenced." What is wrong with this alternative wording? (1 point)



    1/4 got this, and 1/4 gave vague answers implying this for partial credit. A number of students decided to focus on the difference between "least recently used" and "least frequently used", an odd decision since the word "frequently" was never mentioned and none of the cache schemes discussed in class focused on the frequency of use.

    One student commented that "this is an English major question." It is. When reading specifications for computer systems (hardware or software) fine shades of meaning have very real consequences. This is an area where computer scientists and lawyers have quite a bit in common.

    d) To compare the potential speed of multiplication by 1025 versus multiplication by 1024 on the Hawk, give the best code you can for each of these: (2 point)

    R3 times 1024: ____SL______R3,10____________________________


    R3 times 1025: ____ADDSL___R3,R3,10_________________________


    2/5 got both right, while 1/3 got the first correct. 1/7 had both right, except for using the wrong shift count. These were not off-by-one errors such as shifting by 9; they tended to be way off. 1/10 gave grossly inefficient solutions, involving more than two instructions.

    Of greatest concern were the 3/10 who earned no credit for their attempts to multiply by 1025 because they got the math wrong. This kind of elementary math ought to be easy, and only 5 students stayed the entire two hours, so time pressure is not the problem here.

  5. Background: Consider the sparrowhawk.h macro for ADDI; the macros for CMPI and even LOAD and STORE will be similar, except where PC-relative addressing is used.
         MACRO   ADDI =rd, =rs, =const
           LIS   R1, const >>> 8
           ORIS  R1, const & #FF
           ADD   rd, rs, R1

    The >>> operator in SMAL is a signed right shift. This and many other macros in sparrowhawk.h use R1 as an auxiliary register, so programmers using sparrowhawk.h must keep in mind that using a Hawk long instructions will sometimes wipe out R1.  

    a) On the Sparrowhawk, programs that use the sparrowhawk.h macros will be faster than programs that use the trap handler to implement Hawk long instructions.

    True or False? (0.5 points) ___True_______

    4/5 got this right. All you need to do is compare the 3 instructions in the example macro with the incomplete code for the trap handler given in problem 1 and you can see that the macro must be faster.

    b) On the Sparrowhawk, programs that use the sparrowhawk.h macros will have smaller object code than programs that use the trap handler to implement Hawk long instructions.

    True or False? (0.5 points) ______________

    1/3 got this right, while 2/3 got it wrong. Again, the answer should have been straightforward. The example Sparrowhawk macro takes 3 halfword instructions to do something that the Hawk does with a single instruction occupying a double-halfword. Therefore, since 3 is greater than 2, Sparrowhawk code will be bigger. Even if the trap handler is included as part of the object code, it is fixed size, so for large applications, the bloat caused by the larger instructions will dominate.

    If people had flipped coins here, they would have done better.

    c) Any program that works correctly when assembled with hawk.h (using the trap handler to implement Hawk long instructions) can be assembled using sparrowhawk.h and it will still work correctly.
    True or False? (0.5 points) ____False_____

    3/4 got this right.

    One problem is that some Hawk program-counter-relative addressing will not work correctly because some Sparrowhawk macros are bigger than their Hawk equivalents. This will have its most immediate impace on branch instructions.

    A second problem has to do with the fact that some Sparrowhawk macros use R1 as a scratch register. This will break some programs that use R1.

    d) Any program that works correctly when assembled with sparrowhawk.h can be assembled using hawk.h and it will work correctly using the trap handler to implement Hawk long instructions.
    True or False? (0.5 points) ____True______

    7/10 got this right.

    e) A slightly more efficient macro, using conditional assembly, could use just an LIS instruction without an ORIS instruction when const is in the range from:

    __-128___ to ___127___ (inclusive) (0.5 points)

    3/4 got this right. Among wrong answers, 1/10 gave 0 to 127 (evidently forgetting that both ADDI and LIS sign extend their operands. Even more extreme errors demonstrated serious failures to understand two's complement arighmetic: 1/17 gave -129 to 128, 1/25 gave -256 to +255. All of these answers earned a small amount of partial credit. 1/6 earned no credit, giving answers that did not hint at an understanding that the LIS instruction had an 8-bit constant operand.

    f) Why did our macro need to use the >>> operator instead of the >> operator on the LIS instruction? (1 point)



    1/5 got this, and 2/5 earned partial credit for mentioning the issue of signed versus unsigned operands. It is difficult to categorize the answers that earned no credit.

  6. Background: The simplest macro definition for LIL in sparrowhawk.h uses three consecutive Sparrowhawk instructions. The structure of the standard definition given in the previous problem for ADDI hints about how it might be written, although there is no need to use R1.

    a) Complete this macro for sparrowhawk.h: (2 point)

         MACRO   LIL =dst, =const

    1/5 got this. 1/7 earned no credit. 1/3 used LIS followed by ORIS (imitating the ADDI macro from the previous question) to give an answer that worked for 16-bit constants, ignoring the fact that LIL permits a 24-bit constant, for half credit. 1/4 made a mess of the shifting and masking, and 1/6, usually after giving a good answer for 16-bit constants, added gratuitous nonsense (frequently LOAD instructions) in what might have been a failed attempt to extend the answer to 24 bits.

    b) Will the sparrowhawk.h LIL work for absolute symbols? _Yes_
    (0.5 points)

    6/7 got this.

    c) Will the sparrowhawk.h LIL work for relocatable symbols? _No__
    (0.5 points)

    2/5 got this. This question built on question 5 from the second midterm. That was a hard question, so giving students a second chance made sense. In short, the problem with relocatable symbols is that the value must be broken up into 3 8-bit chunks using shift and mask operations, and the assembler only permits these operations to be applied to absolute symbols.

    c) Will the sparrowhawk.h LIL work for external symbols? _No__
    (0.5 points)

    1/2 got this. Again, this question built on question 5 from the second midterm, and the problem here is the same as in part c).

  7. Background: Consider the control system for a machine tool. The precision of the tool is fixed, so there is no need for floating point. Because it is an antique tool, everything about its desgin was done in binary fractions of an inch (1/4, 1/16, 1/64). At one point in the control system, you are given that:

    R3 contains the current position, in units of 1/1024 inch.

    R4 contains the position change, in units of 1/64 inch.

    A Problem: Write code to increment the current position by the position change. Your code must not destroy the contents of R4 and it must leave the new position in R3. (2 point)





    3/10 got this right, although many of these used separate MOVE and SL instructions. One gave a correct answer that involved on the order of 20 instructions; that was penalized for gross inefficiency. 3 left it blank, and 1/5 gave totally nonsensical answers.

    Among those earning partial credit, 1/5 threw in a gratuitous LOAD or STORE instruction, 1/5 used a shift instruction, but shifted the wrong number of bits or the wrong direction, and 1/6 omitted the shift entirely.