# Assignment 8, due Mar 27

## Solutions

1. Background: In C, Java or Python, you can write expressions like 1<<i where the shift count is a constant. In contrast, the Hawk architecture (like many other computers) only has constant shift counts in its shift instructions. If you wanted to compute 1<<i on the Hawk, with the result in R3 and the unsigned variable i in R4, you could do it with the following simple iterative solution:
```        LIS     R3,1
TESTR   R4
BZS     DONE
LOOP:   SL      R3,1
BZR     LOOP
DONE:
```

Here is a clever alternative solution:

```        LIS     R3,1
BITTST  R4,0
BBR     SKIP0
SL      R3,1
SKIP0:  BITTST  R4,1
BBR     SKIP1
SL      R3,2
SKIP1:  BITTST  R4,2
BBR     SKIP2
SL      R3,4
SKIP2:  BITTST  R4,3
BBR     SKIP3
SL      R3,8
SKIP3:  BITTST  R4,4
BBR     SKIP4
SL      R3,16
SKIP4:
```

a) Assuming all instructions take the same amount of time, what is the smallest shift count (in R4) for which the clever alternative is faster? (0.4 points)

For a shift count of 4, the clever solution takes 12 instructions, while the iterative solution takes 15 instructions.

b) Under the same assumptions, what is the largest shift count (in R4) for which the simple iterative solution is faster? (0.4 points)

For a shift count of 3, the iterative solution takes 12 instructions while the clever solution takes 13 instructions.

In general, the clever solution takes 11 instructions plus one more instruction for each one bit in the shift count (between 0 and 31), while the iterative solution takes 3 instructions plus 3 more for each bit shifted.

c) How many shift (or shift-add) instructions are included in the clever alternative solution? (Hint: You will probably have to look things up in the Hawk manual to find out which of the instructions are shift instructions. (0.2 points)

There are 5 SL instructions, the obvious answer; these are actually ADDSL instructions. plus 5 BITTST instructins; these are really ADDSR instructions (because the bit numbers are less than 16). This makes a total of 10. See Chapter 6 of the Hawk manual for details.

2. A problem:

Give the fastest Hawk code you can to multiply by 2015. (1.0 points)

First, 2015 is 5 × 13 × 31 or 5 × 403 or 13 × 155 or 31 × 65. Alternatively, 2015 is 11111011111 in binary. Since the binary representation has lots of one bits, we'll ignore it for now. The first obvious candidate is 5 × 13 × 31. This gives us:

```        ADDSL  R3,R3,2    ;   R3 times  5
MOVE   R1,R3      ; \ R3 times 13 = 1101
NEG    R1,R3      ; \ R3 times 31 = 32 + (-1)
```

But this ignores the fact that 5 × 13 is 65, or 1000001 in binary. This gives the following:

```        ADDSL  R3,R3,6    ;   R3 times 65
NEG    R1,R3      ; \ R3 times 31 = 32 + (-1)
```

There are other approaches. For example 11111011111 is 100000000000 minus 100001. Neither of these have too many one bits, so we can write this code:

```        MOVSL  R1,R3,11   ;   R1 = R3 times 100000000000
ADDSL  R3,R3,5    ;   R3 = R3 times 100001
SUB    R3,R1,R3
```

There are many variations on the above; consider this:

```        NEG    R1,R3
```

There don't seem to be any solutions shorter than 3 instructions.

3. Background: Consider the first version of TIMESUL given in section 14.3 of the Hawk manual. This returns the low 32 bits of the unsigned product of two 32-bit numbers in R3.

A problem: Modify this code so it returns the low 32 bits of the product in R3 and the high 32 bits of the product in R4. (1.0 points)

Hint: Study how to do 64-bit shifts, a subject discussed in the section on Division in Chapter 9 of the notes and also in Section 10 of the Hawk manual. You should also look at which bits of R4 contain useful information as TIMESUL performs successive iterations.

```TIMESUL:; unsigned multiply, low 32-bits of the product
; expects:  R1 - the return address
;           R4 - the multiplier
;           R5 - the multiplicand
; returns:  R3 - the low 32 bits of product
; destroys: R4 - the high 32 bits of the product -- update comment
;           R6 - used as loop counter

CLR     R3
LIS     R6,32   ; load loop counter
TMULLP:                 ;   --- start of multiply step
SL      R3,1    ;   shift product
;       SL      R4,1    ;   shift multiplier -------- replaced instruction
ADDC    R4,R4   ;   shift multiplier -------- new instruciton
BCR     TMULCN  ;   if high bit was one