Assignment 8, due Mar 27
LIS R3,1 TESTR R4 BZS DONE LOOP: SL R3,1 ADDSI R4,-1 BZR LOOP DONE:
Here is a clever alternative solution:
LIS R3,1 BITTST R4,0 BBR SKIP0 SL R3,1 SKIP0: BITTST R4,1 BBR SKIP1 SL R3,2 SKIP1: BITTST R4,2 BBR SKIP2 SL R3,4 SKIP2: BITTST R4,3 BBR SKIP3 SL R3,8 SKIP3: BITTST R4,4 BBR SKIP4 SL R3,16 SKIP4:
a) Assuming all instructions take the same amount of time, what is the smallest shift count (in R4) for which the clever alternative is faster? (0.4 points)
For a shift count of 4, the clever solution takes 12 instructions, while the iterative solution takes 15 instructions.
b) Under the same assumptions, what is the largest shift count (in R4) for which the simple iterative solution is faster? (0.4 points)
For a shift count of 3, the iterative solution takes 12 instructions while the clever solution takes 13 instructions.
In general, the clever solution takes 11 instructions plus one more instruction for each one bit in the shift count (between 0 and 31), while the iterative solution takes 3 instructions plus 3 more for each bit shifted.
c) How many shift (or shift-add) instructions are included in the clever alternative solution? (Hint: You will probably have to look things up in the Hawk manual to find out which of the instructions are shift instructions. (0.2 points)
There are 5 SL instructions, the obvious answer; these are actually ADDSL instructions. plus 5 BITTST instructins; these are really ADDSR instructions (because the bit numbers are less than 16). This makes a total of 10. See Chapter 6 of the Hawk manual for details.
Give the fastest Hawk code you can to multiply by 2015. (1.0 points)
First, 2015 is 5 × 13 × 31 or 5 × 403 or 13 × 155 or 31 × 65. Alternatively, 2015 is 11111011111 in binary. Since the binary representation has lots of one bits, we'll ignore it for now. The first obvious candidate is 5 × 13 × 31. This gives us:ADDSL R3,R3,2 ; R3 times 5 MOVE R1,R3 ; \ R3 times 13 = 1101 ADDSL R3,R1,1 ; | ADDSL R3,R1,2 ; / NEG R1,R3 ; \ R3 times 31 = 32 + (-1) ADDSL R3,R3,5 ; /
But this ignores the fact that 5 × 13 is 65, or 1000001 in binary. This gives the following:ADDSL R3,R3,6 ; R3 times 65 NEG R1,R3 ; \ R3 times 31 = 32 + (-1) ADDSL R3,R3,5 ; /
There are other approaches. For example 11111011111 is 100000000000 minus 100001. Neither of these have too many one bits, so we can write this code:MOVSL R1,R3,11 ; R1 = R3 times 100000000000 ADDSL R3,R3,5 ; R3 = R3 times 100001 SUB R3,R1,R3
There are many variations on the above; consider this:NEG R1,R3 ADDSL R1,R1,5 ADDSL R3,R1,11
There don't seem to be any solutions shorter than 3 instructions.
A problem: Modify this code so it returns the low 32 bits of the product in R3 and the high 32 bits of the product in R4. (1.0 points)
Hint: Study how to do 64-bit shifts, a subject discussed in the section on Division in Chapter 9 of the notes and also in Section 10 of the Hawk manual. You should also look at which bits of R4 contain useful information as TIMESUL performs successive iterations.
TIMESUL:; unsigned multiply, low 32-bits of the product ; expects: R1 - the return address ; R4 - the multiplier ; R5 - the multiplicand ; returns: R3 - the low 32 bits of product ; destroys: R4 - the high 32 bits of the product -- update comment ; R6 - used as loop counter CLR R3 LIS R6,32 ; load loop counter TMULLP: ; --- start of multiply step SL R3,1 ; shift product ; SL R4,1 ; shift multiplier -------- replaced instruction ADDC R4,R4 ; shift multiplier -------- new instruciton BCR TMULCN ; if high bit was one ADD R3,R3,R5; add partial product to product ADDC R4,R0 ; keep the carry -------- new instruction TMULCN: ; --- end of multiply step ADDSI R6,-1 ; count BGT TMULLP ; iterate JUMPS R1 ; return
It is easy to forget the second ADDC instruction, but adding the next partial product can cause a carry out of the low 32 bits of the result.