Assignment 8, due Mar 27

Solutions

Part of the homework for CS:2630, Spring 2015
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

  1. Background: In C, Java or Python, you can write expressions like 1<<i where the shift count is a constant. In contrast, the Hawk architecture (like many other computers) only has constant shift counts in its shift instructions. If you wanted to compute 1<<i on the Hawk, with the result in R3 and the unsigned variable i in R4, you could do it with the following simple iterative solution:
            LIS     R3,1
            TESTR   R4
            BZS     DONE
    LOOP:   SL      R3,1
            ADDSI   R4,-1
            BZR     LOOP
    DONE:
    

    Here is a clever alternative solution:

            LIS     R3,1
            BITTST  R4,0
            BBR     SKIP0
            SL      R3,1
    SKIP0:  BITTST  R4,1
            BBR     SKIP1
            SL      R3,2
    SKIP1:  BITTST  R4,2
            BBR     SKIP2
            SL      R3,4
    SKIP2:  BITTST  R4,3
            BBR     SKIP3
            SL      R3,8
    SKIP3:  BITTST  R4,4
            BBR     SKIP4
            SL      R3,16
    SKIP4:
    

    a) Assuming all instructions take the same amount of time, what is the smallest shift count (in R4) for which the clever alternative is faster? (0.4 points)

    For a shift count of 4, the clever solution takes 12 instructions, while the iterative solution takes 15 instructions.

    b) Under the same assumptions, what is the largest shift count (in R4) for which the simple iterative solution is faster? (0.4 points)

    For a shift count of 3, the iterative solution takes 12 instructions while the clever solution takes 13 instructions.

    In general, the clever solution takes 11 instructions plus one more instruction for each one bit in the shift count (between 0 and 31), while the iterative solution takes 3 instructions plus 3 more for each bit shifted.

    c) How many shift (or shift-add) instructions are included in the clever alternative solution? (Hint: You will probably have to look things up in the Hawk manual to find out which of the instructions are shift instructions. (0.2 points)

    There are 5 SL instructions, the obvious answer; these are actually ADDSL instructions. plus 5 BITTST instructins; these are really ADDSR instructions (because the bit numbers are less than 16). This makes a total of 10. See Chapter 6 of the Hawk manual for details.

  2. A problem:

    Give the fastest Hawk code you can to multiply by 2015. (1.0 points)

    First, 2015 is 5 × 13 × 31 or 5 × 403 or 13 × 155 or 31 × 65. Alternatively, 2015 is 11111011111 in binary. Since the binary representation has lots of one bits, we'll ignore it for now. The first obvious candidate is 5 × 13 × 31. This gives us:

            ADDSL  R3,R3,2    ;   R3 times  5
            MOVE   R1,R3      ; \ R3 times 13 = 1101
            ADDSL  R3,R1,1    ; |
            ADDSL  R3,R1,2    ; /
            NEG    R1,R3      ; \ R3 times 31 = 32 + (-1)
            ADDSL  R3,R3,5    ; /
    

    But this ignores the fact that 5 × 13 is 65, or 1000001 in binary. This gives the following:

            ADDSL  R3,R3,6    ;   R3 times 65
            NEG    R1,R3      ; \ R3 times 31 = 32 + (-1)
            ADDSL  R3,R3,5    ; /
    

    There are other approaches. For example 11111011111 is 100000000000 minus 100001. Neither of these have too many one bits, so we can write this code:

            MOVSL  R1,R3,11   ;   R1 = R3 times 100000000000
    	ADDSL  R3,R3,5    ;   R3 = R3 times 100001
            SUB    R3,R1,R3
    

    There are many variations on the above; consider this:

            NEG    R1,R3
    	ADDSL  R1,R1,5
    	ADDSL  R3,R1,11
    

    There don't seem to be any solutions shorter than 3 instructions.

  3. Background: Consider the first version of TIMESUL given in section 14.3 of the Hawk manual. This returns the low 32 bits of the unsigned product of two 32-bit numbers in R3.

    A problem: Modify this code so it returns the low 32 bits of the product in R3 and the high 32 bits of the product in R4. (1.0 points)

    Hint: Study how to do 64-bit shifts, a subject discussed in the section on Division in Chapter 9 of the notes and also in Section 10 of the Hawk manual. You should also look at which bits of R4 contain useful information as TIMESUL performs successive iterations.

    TIMESUL:; unsigned multiply, low 32-bits of the product
            ; expects:  R1 - the return address
            ;           R4 - the multiplier
            ;           R5 - the multiplicand
            ; returns:  R3 - the low 32 bits of product
            ; destroys: R4 - the high 32 bits of the product -- update comment
            ;           R6 - used as loop counter
    
            CLR     R3 
            LIS     R6,32   ; load loop counter
    TMULLP:                 ;   --- start of multiply step
            SL      R3,1    ;   shift product 
    ;       SL      R4,1    ;   shift multiplier -------- replaced instruction
            ADDC    R4,R4   ;   shift multiplier -------- new instruciton
            BCR     TMULCN  ;   if high bit was one
            ADD     R3,R3,R5;     add partial product to product
            ADDC    R4,R0   ;     keep the carry -------- new instruction
    TMULCN:                 ;   --- end of multiply step
            ADDSI   R6,-1   ;   count
            BGT     TMULLP  ; iterate
            JUMPS   R1      ; return
    

    It is easy to forget the second ADDC instruction, but adding the next partial product can cause a carry out of the low 32 bits of the result.