Assignment 8, Solutions
Part of
the homework for 22C:60 (CS:2630), Spring 2012
|
unsigned int timespi( unsigned int i ) { return (i * 333)/106; }
Note that, to 64 bits of precision in hex, 1/106 = 0.026A439F656F1826
a) Write the fastest SMAL Hawk instruction sequence you can find to multiply by 333, using the tricks discussed in Chapter 9 of the notes and Chapter 14 of the Hawk manual. (0.5 points)
333 = 3 × 111 = 3 × 3 × 37 -- prime factors
But, we have a fast trick for multiplying by 9, so we'll use 9 × 37. This gives the following, using the cookbook in Section 14.2 of the Hawk manual:
ADDSL R3,R3,3 ; R3 times 9 MOV R1,R3 ; \ R3 times 37 = 5 + 32 ADDSL R1,R1,2 ; | ADDSL R3,R1,5 ; /This looks very good, but it might be faster to use the binary representation, with one ADDSL per one bit, plus a single MOV instruction. On conversion, 33310 is 1010011012, which has 5 one bits. As such, this looks like a dead end. On careful inspection, though, this is the sum of two interesting numbers, 101000101 and 1000. We can use this to advantage:
MOVSL R1,R3,3 ; R1 = R3 * 8 (1000) ADDSL R3,R3,2 ; R3 = R3 * 5 (101) \ together, give 101000101 ADDSL R3,R3,6 ; R3 = R3 * 65 (1000001) / ADD R3,R3,R1 ; put it all togetherIn conclusion, we have a tie. The first answer was easy (using cookbook methods), while the second took inventing and was unlikely to show up in any solutions to the homework.
b) Write the fastest SMAL Hawk instruction sequence you can find to divide by 106, using all the tricks. (0.5 points)
Given that 1/106 = 0.026A439F656F1826, we need either the 32 most significant bits of the fraction, with correction, or or the 33 most significant nonzero bits:
0000 0010 0110 1010 0100 0011 1001 1111 -- most significant bits 0000 0010 0110 1010 0100 0011 1001 1111 0110 0101 \______________nonzero bits_____________/We need a shift and add for each one bit in either solution. The second solution has 3 more 1 bits, but as detailed in Section 14.6 of the Hawk manual, using only the 32 most significant bits would require multiplying by 106 to check if the result was off by one. 106 binary is 1101010, so it will take at least three instructions to do this, suggesting that we are better off using the second solution:
MOVE R1,R3 ; R3 = R1 * 1.0 SRU R3,2 ; R3 = R1 * 0.01 ADDSRU R3,R1,3 ; R3 = R1 * 0.00101 ADDSRU R3,R1,1 ; R3 = R1 * 0.100101 ADDSRU R3,R1,2 ; R3 = R1 * 0.01100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.101100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.1101100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.11101100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.111101100101 ADDSRU R3,R1,3 ; R3 = R1 * 0.001111101100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.1001111101100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.11001111101100101 ADDSRU R3,R1,5 ; R3 = R1 * 0.0000111001111101100101 ADDSRU R3,R1,3 ; R3 = R1 * 0.0010000111001111101100101 ADDSRU R3,R1,2 ; R3 = R1 * 0.010010000111001111101100101 ADDSRU R3,R1,2 ; R3 = R1 * 0.01010010000111001111101100101 ADDSRU R3,R1,1 ; R3 = R1 * 0.101010010000111001111101100101 ADDSRU R3,R1,3 ; R3 = R1 * 0.001101010010000111001111101100101 ADDSRU R3,R1,7 ; R3 = R1 * 0.0000001001101010010000111001111101100101
a) Shifting the value 1 place left. (0.2 points)
SL R3,1 ; the low half, shifts one bit into carry ADDC R3,R3 ; the high half, and capture the carry bit
b) Shifting the value 6 places left. Strong hint: Yes, 6 repetitions of the code from part a) will work, but the problem can be solved in half as many instructions. (0.8 points)
MOVE R1,R3 ; keep a copy of the low half SL R3,6 ; shift the low half, discarding 6 high bits SL R4,6 ; shift the high half SR R1,10 ; align the copy of the discarded bits of the low half OR R4,R1 ; align the copy of the discarded bits of the low half
a) Explain the most important difference between the two. Your explanation must be concise and at a high level. (0.4 points)
The version in Chapter 9 shifts the multiplier right while shifting the multiplicand left, using an indefinite loop and processing the least significant bit of the multiplier first. The version in Section 14 of the hawk manual shifts both of them left, using a definite loop and processing the most significant bit of the multiplier first.
The most important difference might be the difference in shift direction, because this implies the other two.
b) Modify the code from Section 14.3 so it computes the 64-bit product of two unsigned 32-bit operands. (Hint: the modification required involves changing exactly one instruction.) (0.6 points)
TIMESUL:; unsigned multiply, low 32-bits of the product ; inputs ; - R1 return address ; - R4 multiplier (unsigned!) ; - R5 multiplicand (unsigned!) ; outputs ; - R3 low 32 bits of product (unsigned!) ; - R4 high 32 bits of product (unsigned!) *** a ; - R5 multiplicand ; - R6 scratch -- used as loop counter CLR R3 ; LIS R6,32 ; load loop counter TMULLP: ; --- start of multiply step SL R3,1 ; shift product ADDC R4,R4 ; shift multiplier and capture top product bit *** b BCR TMULCN ; if high bit of multiplier was one ADD R3,R3,R5; add partial product to product ADDC R4,R0 ; capture any carry produced above *** c TMULCN: ; --- end of multiply step ADDSI R6,-1 ; count BGT TMULLP ; iterate JUMPS R1 ; returnThe changed lines are marked with stars. Line a is a changed comment to document the effect of the other two changes. Line b is the crucial change, converting two independent 32-bit shifts into a single 64-bit shift. Line c is important only when adding the multiplicand to the product produces a carry out of the low 32 bits. This will be true only for the very largest multiplicands and can easily be missed.