Assignment 8 Solutions
Part of
the homework for 22C:60, Fall 2010
|
h = 0;
l = b;
repeat the following n times {
if l is odd thebn h = h + a
hl = hl >> 1}
now hl = a × b
The problem: Write a Hawk subroutine to multiply two numbers using the above algorithm. The 32-bit multiplicands are passed in R3 and R4 and the 64-bit product is returned in R3 (the low half) and R4 (the high half). (1 point)
MULTIPLY: ; given R3: 32-bit multiplier b ; R4: 32-bit multiplicand a ; returns R3-R4: the 64-bit product hl, R3 l, R4 h. ; uses R5: copy of multiplicand a ; R6: iteration count MOVE R5,R4 ; -- copy multiplicand LIS R6,32 ; count = 32 MULTLOOP: ; do { BITTST R3,0 BCR MULTENDF ; if (l is odd) { -- multiplier odd ADD R4,R4,R5 ; h = h + a -- add partial product MULTENDF: ; } SRU R3,1 ; * SRU R4,1 ; * ADJUST R3,CMSB ; hl = hl >> 1 * ADDI R6,-1 ; count = count - 1 BGT MULTLOOP ; } while (count > 0) JUMPS R1The above solution has a subtle flaw: It is only guaranteed to work for unsigned multiplicands that are 31 bits long. Carry out of the topmost bit is lost. Nobody is expected to catch this for the homework.
- Background: The minimal standard pseudorandom number generator is a popular way to generate sequences of numbers that appear to be random. This uses a 32-bit global variable called, seed, an unsigned integer. Each time a new random value is needed, seed is updated using the parameterless subroutine random(). The algorithm is as follows:
a) Write the RANDOM routine for the Hawk, using the multiply and divide routines in the hawk monitor. You will probably need to compute the constants q and r using a calculator. (1 point)constant a = 16,807
constant m = 2,147,483,647
constant q = m / a
constant r = m % anote seed will always be in the range 1 to m-1, inclusive.
random() is {
seed = a * (seed % q) - r * (seed / q)
if seed < 0 then seed = seed + m}
; Constants for RANDOM A = 16807 M = 2147483647 Q = 127773 ; m / a R = 2836 ; m % a ; Activation record for RANDOM ;RETAD = 0 SEEDMQ = 4 ; (SEED % Q) RSEEDQ = 4 ; R * (SEED / Q) ARSIZE = 8 RANDOM: ; parameterless, updates SEED, a global variable ; uses R3 to R7 STORES R1,R2 ADDI R2,R2,ARSIZE LIL R3,SEED LOADS R3,R3 LIL R4,Q LIL R1,DIVU JSRS R1,R1 ; get (seed / q) and (seed % q) STORE R4,R2,SEEDMQ-ARSIZE LIL R4,R LIL R1,MUL JSRS R1,R1 ; get r*(seed / q) STORE R3,R2,RSEEDQ-ARSIZE LOAD R3,R2,SEEDMQ-ARSIZE LIL R4,A LIL R1,MUL JSRS R1,R1 ; get a*(seed % q) LOAD R4,R2,RSEEDQ-ARSIZE SUB R3,R3,R4 ; s = a*(seed % q) - r*(seed / q) BGE RANDONE ; if (s < 0) { LIW R4,M ADD R3,R3,R4 ; s = s + m RANDONE: ; } LIL R4,SEED STORES R3,R4 ; seed = s ADDI R2,R2,-ARSIZE LOADS R1,R2 JUMPS R1b) Can you come up with an efficient way to multiply by the constant a using shift and add sequences? (0.5 points)
a = 16,807 = 100,0001,1010,01112
This leads directly to the following, the best solution:MOVE R4,R3 ADDSL R3,R4,6 ; times 100 0001 ADDSL R3,R4,1 ; times 100 0001 1 ADDSL R3,R4,2 ; times 100 0001 101 ADDSL R3,R4,3 ; times 100 0001 1010 01 ADDSL R3,R4,1 ; times 100 0001 1010 011 ADDSL R3,R4,1 ; times 100 0001 1010 0111a = 16,807 = 75
This leads directly to the following somewhat slower solution:NEG R4,R3 ; \ times 7 = 8 + (-1) ADDSL R3,R4,3 ; / NEG R4,R3 ; \ times 7 = 8 + (-1) ADDSL R3,R4,3 ; / NEG R4,R3 ; \ times 7 = 8 + (-1) ADDSL R3,R4,3 ; / NEG R4,R3 ; \ times 7 = 8 + (-1) ADDSL R3,R4,3 ; / NEG R4,R3 ; \ times 7 = 8 + (-1) ADDSL R3,R4,3 ; /a = 16,807 = 7 × 492
This leads directly to the following pretty good solution:NEG R4,R3 ; \ times 7 = 8 + (-1) ADDSL R3,R4,3 ; / MOVE R4,R3 ADDSL R3,R4,1 ; times 11 ADDSL R3,R4,4 ; times 110001 = 49 MOVE R4,R3 ADDSL R3,R4,1 ; times 11 ADDSL R3,R4,4 ; times 110001 = 49c) Can you come up with an efficient way to multiply by the constant r using shift and add sequences? (0.5 points)
r = 2836 = 1011,0001,01002
This leads directly to this solution:MOVE R4,R3 ADDSL R3,R4,2 ; times 101 ADDSL R3,R4,1 ; times 1011 ADDSL R3,R4,4 ; times 1011 0001 ADDSL R3,R4,2 ; times 1011 0001 01 SL R3,2 ; times 1011 0001 0100r = 2836 = 4 × 709 where 709 = 10,1100,01012
This leads to a slightly different but no faster solution:SL R3,2 ; times 4 MOVE R4,R3 ADDSL R3,R4,2 ; times 101 ADDSL R3,R4,1 ; times 1011 ADDSL R3,R4,4 ; times 1011 0001 ADDSL R3,R4,2 ; times 1011 0001 01Although it was not part of the assignment, to complete the job, it would also be good to have a fast way to divide by the constant q:
q = 127,773 1/q = 0.0000,0000,0000,0000,1000,0011,0100,1110,0000,1011,0101,1110,00 ...
This means we can get an exact quotient as follows:MOVE R4,R3 SRU R3,1 ; R3 = R4 * 0.10 ADDSRU R3,R4,1 ; R3 = R4 * 0.110 ADDSRU R3,R4,1 ; R3 = R4 * 0.1110 ADDSRU R3,R4,1 ; R3 = R4 * 0.1 1110 ADDSRU R3,R4,2 ; R3 = R4 * 0.101 1110 ADDSRU R3,R4,2 ; R3 = R4 * 0.1 0101 1110 ADDSRU R3,R4,1 ; R3 = R4 * 0.11 0101 1110 ADDSRU R3,R4,2 ; R3 = R4 * 0.1011 0101 1110 ADDSRU R3,R4,6 ; R3 = R4 * 0.10 0000 1011 0101 1110 ADDSRU R3,R4,1 ; R3 = R4 * 0.110 0000 1011 0101 1110 ADDSRU R3,R4,1 ; R3 = R4 * 0.1110 0000 1011 0101 1110 ADDSRU R3,R4,3 ; R3 = R4 * 0.100 1110 0000 1011 0101 1110 ADDSRU R3,R4,2 ; R3 = R4 * 0.1 0100 1110 0000 1011 0101 1110 ADDSRU R3,R4,1 ; R3 = R4 * 0.11 0100 1110 0000 1011 0101 1110 ADDSRU R3,R4,6 ; R3 = R4 * 0.1000 0011 0100 1110 0000 1011 0101 1110 SRU R3,16 ; R3 = R4 / 127,773Having done this, we now need to compute the remainder as well. To do that, we need to multiply by q.
q = 127,773 = 1,1111,0011,0001,11012 q = 127,773 = 9 × 14197 = 9 × 11,0111,0111,01012
There is no disadvantage to brute force here, so we get:MOVE R5,R3 ADDSL R5,R3,1 ; times 11 ADDSL R5,R3,2 ; times 11 01 ADDSL R5,R3,1 ; times 11 011 ADDSL R5,R3,1 ; times 11 0111 ADDSL R5,R3,2 ; times 11 0111 01 ADDSL R5,R3,1 ; times 11 0111 011 ADDSL R5,R3,1 ; times 11 0111 0111 ADDSL R5,R3,2 ; times 11 0111 0111 01 ADDSL R5,R3,2 ; times 11 0111 0111 0101Now, R3 holds R4 ÷ q, and, R5 holds (R4 ÷ q) × q, so, to complete taking the remainder, we just need to subtract.
SUB R4,R4,R5; compute remainderNote that, when we substitute this mess into the code for part a, we don't need local variables because we no longer call any subroutines and the intermediate results can all be saved in registers instead of locals.