Exam 3: Final

Solutions and Commentary

Part of the homework for 22C:60, Fall 2007
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Exam III

Median = 9.0
                              X X           X X       X X
     _________________X___X_X_X_X_____X_X_X_X_X___X___X_X_X___X_X_______
       0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15.

Total Exams I to II

Mean   = 23.55     X
                 X X X   X   X   X   X   X
 ________________X_X_X_X_X___X_X_X___X_X_X_X_X___________________
  10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34. 36. 38. 40

Machine Problems 1 to 5

mean   = 19.01                                        X        
                                            X       X X
                                  X       X X       X X
 _______X_______________X___X_X_X_X___X_X_X_X_______X_X_X__________
   0 . 2 . 4 . 6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 

Homeworks, the top 11 scores from assignments 1 to 13

mean  = 22.25                             X
                                          X
                                    X     X       X X     X       X
 _______________________X_____X___X_X___X_X___X___X_X_X___X_X_X___X_______
   0 . 2 . 4 . 6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34

Total Scores

mean   = 64.81                          X
                                        X   X X         X       X
__________________X_____X___X_____X_X_X_X_X_X_X_______X_X___X___X_X___X_
 20. 24. 28. 32. 36. 40. 44. 48. 52. 56. 60. 64. 68. 72. 76. 80. 84. 88 
                  D   +  |  -   C   +  |  -   B   +  |  -   A   +   

Solutions and Commentary

  1. Background: Consider this Hawk program fragment:
                    LOAD    R3,PARRAY
                    LOAD    R4,10
            LOOP:
                    STORES  R0,R3
                    ADDSI   R3,4
                    ADDSI   R4,-1
                    BGT     LOOP
    

    a) Given that ARRAY is a global array of 10 integers, write appropriate SMAL code to declare ARRAY and PARRAY (1 point)

    _____________COMMON__ARRAY,40_______________________________________________
    
    _____PARRAY:_W_______ARRAY__________________________________________________
    

    1/3 did perfectly here. 1/3 did well except for giving an odd size for ARRAY (the most common odd size was 10, forgetting that the size is in bytes, but several gave sizes in the thousands).

    b) Write high level language code equivalent in effect to the HAWK code fragment given above in some procedural high-level language. (1 point)

    _____for (i = 0; i < 10; i++) array[i] = 0;_________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    

    1/4 did well. Almost as many earned no credit at all. The most common solutions that earned partial credit involved deep focus on the control structure while forgetting the loop body (the STORES). Fully half the class made this mistake. Typically, these partial credit answers did not abstract very far from the assembly language code, retaining one line of code per line of assembly code.

    c) Loop unrolling might improve the performance of this code. Show the body of the unrolled loop that you would replicate once for each iteration. (1 point)

    _____________STORES___R0,R3_________________________________________________
    
    _____________ADDSI____R3,4__________________________________________________
    
    ____________________________________________________________________________
    

    1/3 did well. 1/10 earned no credit. The most common error was to include code in the unrolled version to manipulate R4. 1/3 made this mistake. Since R4 is the loop counter, it serves no purpose when the loop is fully unrolled.

    d) Suppose the CPU has a small I-cache. Explain how loop unrolling could actually slow the execution of this code. (1 point)

    _____In the unrolled loop, no instruction is ever executed twice____________
    
    _____so the cache cannot help as it does in the iterative version___________
    
    _____If the cache offers more than a 2 times speedup, iterative is faster.__
    

    1/4 did well here. 1/4 earned partial credit by saying that the unrolled version could overflow the instruction cache, but did not explain the performance consequences. 1/2 gave irrelevant answers.

  2. Background: Consider this reasonably well commented but not at all optimized Hawk subroutine:
            INT     POWER10
    POWER10:
            ; on entry, R3 = a, R4 = b
            ; on exit,  R3 = a * 10**b
    LOOP:
     a:     TESTR   R4     
     b:     BLE     QUIT            ; while (b > 0) {
     c:     SL      R3,1
     d:     ADDSL   R3,R3,2         ;   a = a * 10
     e:     ADDSI   R4,-1           ;   b = b - 1
     f:     BR      LOOP            ; }
    QUIT:
            JUMPS   R1              ; return a
    

    a) Suppose this is run on a pipelined machine. How could you rearrange the code above to minimize pipeline interlocks? (The labels a,b,c,d,e,f are included above so you can give your answer by listing the labels in the order you want the instructions rearranged.) (1 point)

    _____________a b c e d f____________________________________________________
    

    1/3 did well here. 2/3 earned no credit by proposing reorderings that changed the meaning of the code. There was no partial credit. Typical changes to the meaning involved changes to the number of times lines c and d are executed, so that the POWER10 routine would no longer compute the reqired value.

    b) Suppose you needed to call POWER10 from a separately assembled source file. Write the code you would include in the calling source file (or in a header file used by the calling source file) to support linking to POWER10. (1 point)

    _____________EXT_____POWER10________________________________________________
    
    _____PPOWER10:_W_____POWER10__________________________________________________
    
    ____________________________________________________________________________
    

    This was the easiest question on the exam. Only a few students had any difficulty.

    c) When a subroutine calls POWER10 is there any need to push and pop the activation record? Why? This is a short answer question! (1 point)

    _____________No.  POWER10 does not use its activation record_______
    
    _____________nor does it call any other subroutines_________________________
    

    1/4 did well here. 1/3 said No, and then failed to offer a useful explanation, earning half credit. 1/6 gave wrong answers.

    mux-based circuit with feedback

  3. A Problem The logic circuit shown to the right is a flipflop, but the inputs and outputs are not labeled correctly. It might have Q or Q outputs, and it might inputs more properly labeled R, S, D, or C, or perhaps they are inverted, R, S, D or C. Write the correct labels in the picture. This requires that you figure out what kind of flipflop this is, and then express your answer by the labels in the picture. (2 points).

    2/5 did well here. 1/4 earned no credit. More had difficulty with the inputs than the outputs. A significant number of those who had difficulty added R and S labels on the intermediate paths in the multiplexor that are neither inputs or outputs. These labels were not asked for in the question and are misleading in any attempt to understand the circuit.

  4. A Problem Write SMAL Hawk code to multiply R3 by 0.390625. Note that 0.390625 = 100/256, as a result, this is not a hard problem. (2 points)
    _____________ADDSL___R3,R3,2________________________________________________
    
    _____________ADDSL___R3,R3,2________________________________________________
    
    _____________SR______R3,6___________________________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    

    1/3 did well, 2/5 earned no credit. Partial credit was offered to those who divided before multiplying (for a significant loss of precision) or who managed to multiply by strange values not associated with the problem. Among those who did well, there was considerable variation in optimization, so full credit was given for the optimal answer, while small penalties were offered for solutions requiring 4 or 5 instructions.

  5. Background Exceptions, as implemented in chapter 13, were global variables holding two fields: The address of the current handler and the address of the current handler's activation record. This made it expensive to save and restore exceptions at the start and end of a try-catch block. In the world of computing, there is always an alternative way to do anything. Consider this alternative implementation of exceptions:

    An exception is a one-word global variable holding a pointer to the activation record of the current handler. If an activation record contains a handler, it must begin as follows:

    RETAD   =       0       ; return address
    HANDLER =       4       ; address of the handler's code
    OLDEX   =       8       ; previous value of the exception
    
    Assume that PRANGE is the (constant) pointer to the global variable RANGE holding the range exception. a) Write code to raise the range exception using this new model. (1 point)
    _____________LOAD____R2,PRANGE______________________________________________
    
    _____________LOADS___R2,R2__________________________________________________
    
    _____________LOAD____R1,R2,HANDLER__________________________________________
    
    _____________JUMPS___R1_____________________________________________________
    
    ____________________________________________________________________________
    

    This was a hard problem. One student had a good answer. Many students clearly started writing too soon, copying irrelevant code from the text even though the problem statement said that this design was an alternative to the design in the text.

    In thinking about this code, it is important to understand that the text in the problem statement really is a complete description of an alternative to the scheme in the text. Single words carry heavy import here.

    b) Given that MYHAND is the label on a handler, write code to save the old range exception and install MYHAND as the current handler. (2 points)

    _____________LOAD____R1,PRANGE______________________________________________
    
    _____________LOADS___R1,R3__________________________________________________
    
    _____________STORE___R3,R2,OLDEX____________________________________________
    
    _____________STORES__R2,R1__________________________________________________
    
    _____________LEA_____R1,MYHAND______________________________________________
    
    _____________STORE___R1,R2,HANDLER__________________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    

    As above, this was a hard question. Nobody got it right, although tere was significant partial credit. Certain elements, such as the initial loading of an index register with the pointer to the exception variable, were fairly obvious, as were the loading of the address of the handler and the need to store that address somewhere.

    Some students opted to copy the code from the notes, although it used a completely different data structure. That was not worthless, since it did include these key parts.

    c) In comparing the exception handling mechanism in the text with the one described here, there are space-time tradeoffs, but there is also a difference in expressive power. How is this scheme less flexible than the scheme discussed in the notes? (2 points)

    _____________This alternative model does not permit one subroutine__________
    
    _____________to install handlers simultaneously for multiple exceptions,____
    
    _____________and it makes it messy to nest try/catch blocks for a single____
    
    _____________exception in one subroutine.___________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    

    One got this right. Many answers suggested that students had not taken the time to understand the question. The fact that an exception is a one-word variable, for example, does not limit its expressive power. That word is a pointer to a block of memory (the activation record) that has many fields. This exception handling model is, incidentally, perfectly appropriate for languages such as C++ and Java where there is only one exception, and the (logical) type of the exception is passed as a parameter to the handler.

  6. Background: Consider the following fragment of code from the illegal instruction trap handler in Chapter 13 of the notes (the caption was "checking what instruction caused the trap):
    ; see if the protection violation was CPUGET
    ; all registers have been saved
    ; R2 points to register save area
      a:    LOAD    R3,R2,PCSAVE    ; get the saved program counter
      b:    LOADS   R4,R3
      c:    EXTH    R5,R4,R3        ; the the halfword it points to
                    ; R4 is the instruction that caused the trap!
      d:    LIL     R6,#F0F0
      e:    AND     R5,R6           ; clear all but the opcode bits
      f:    CMPI    R5,OPCPUGET     ; is it a CPUGET instruction?
      g:    BEQ     ISCPUGET
    
      h:    LOAD    R3,R2,PSWSAVE   ; it is not CPUGET!
      i:    CPUSET  R3,PSW          ; restore saved PSW
      j:    LOAD    R1,PPROGERR
      k:    LOAD    R2,R1,EXAR
      l:    LOAD    R1,R1,EXHAND
      m:    CPUSET  R1,TPC
      n:    CPUGET  R0,TPC          ; raise PROGRAM_ERROR exception
    
    Recall that R2 points to the register save area, and that the identifiers R1SAVE, R2SAVE etc. are used for the offsets of the saved values of registers.

    a) Lines j through l begin to throw a program-error exception. In the normal sequence of instructions for throwing an exception, lines m and n would be replaced by JUMPS R1. Why did we need to use this strange code instead? (1 point)

    _____________JUMPS R1 and the CPUSET/CPUGET sequence above have the_________
    
    _____________same effect on the PC, but the CPUGET instruction also_________
    
    _____________restores the LEVEL field of the PSW.___________________________
    

    1/3 got this right. One got partial credit for an answere that was relevant but never got to the point.

    b) Suppose we decide that all trap handlers will return to the user program by a jump to TRAPEXIT (that was the code to restore registers and return). The above code doesn't do this! Instead, it does its own return in line n. Write replacement code for lines h through n that has the same net effect (raising the program error trap) but does so by altering the saved registers and then jumping to TRAPEXIT where they are restored. (3 points)

    _____________LOAD____R1,PROGERR_____________________________________________
    
    _____________LOAD____R3,R1,EXAR_____________________________________________
    
    _____________STORE___R3,R2,R2SAVE___________________________________________
    
    _____________LOAD____R3,R1,EXHAND___________________________________________
    
    _____________STORE___R3,R2,PCSAVE___________________________________________
    
    _____________JUMP____TRAPEXIT_______________________________________________
    
    ____________________________________________________________________________
    
    ____________________________________________________________________________
    

    One got this right, 1/3 got no credit. In general, troubles revolved around storing to R2SAVE and PCSAVE. A surprising number of students had difficulty with the final JUMP and tried replacing it with sequences ending with a JUMPS. This new code is significantly simpler to understand than the original code it replaces!