Assignment 8 Solutions

Part of the homework for 22C:60, Fall 2010
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Homework

Background: The classic multiplication algorithm used on early computers took two one-word numbers, call them a and b, and produced a two-word product, call it hl where h is the high word of the product and l is the low word of the product. If the word size is n bits, the algorithm can be expressed as follows:

h = 0;
l = b;
repeat the following n times {

if l is odd thebn h = h + a
hl = hl >> 1

}
now hl = a × b

The problem: Write a Hawk subroutine to multiply two numbers using the above algorithm. The 32-bit multiplicands are passed in R3 and R4 and the 64-bit product is returned in R3 (the low half) and R4 (the high half). (1 point)

MULTIPLY:
        ; given   R3:    32-bit multiplier b
        ;         R4:    32-bit multiplicand a
        ; returns R3-R4: the 64-bit product hl, R3 l, R4 h.
        ; uses    R5:    copy of multiplicand a
        ;         R6:    iteration count
        MOVE    R5,R4           ; -- copy multiplicand
        LIS     R6,32           ; count = 32
MULTLOOP:                       ; do {
        BITTST  R3,0
        BCR     MULTENDF        ;    if (l is odd) { -- multiplier odd
        ADD     R4,R4,R5        ;       h = h + a -- add partial product
MULTENDF:                       ;    }  
        SRU     R3,1            ;                 *
        SRU     R4,1            ;                 *
        ADJUST  R3,CMSB         ;    hl = hl >> 1 *
        ADDI    R6,-1           ;    count = count - 1
        BGT     MULTLOOP        ; } while (count > 0)
        JUMPS   R1
The above solution has a subtle flaw: It is only guaranteed to work for unsigned multiplicands that are 31 bits long. Carry out of the topmost bit is lost. Nobody is expected to catch this for the homework.
Background: The minimal standard pseudorandom number generator is a popular way to generate sequences of numbers that appear to be random. This uses a 32-bit global variable called, seed, an unsigned integer. Each time a new random value is needed, seed is updated using the parameterless subroutine random(). The algorithm is as follows:

constant a = 16,807
constant m = 2,147,483,647
constant q = m / a
constant r = m % a
note seed will always be in the range 1 to m-1, inclusive.
random() is {

seed = a * (seed % q) - r * (seed / q)
if seed < 0 then seed = seed + m

}
a) Write the RANDOM routine for the Hawk, using the multiply and divide routines in the hawk monitor. You will probably need to compute the constants q and r using a calculator. (1 point)
; Constants for RANDOM
A       =       16807
M       =       2147483647
Q       =       127773  ; m / a
R       =       2836    ; m % a

; Activation record for RANDOM
;RETAD  = 0
SEEDMQ  = 4     ; (SEED % Q)
RSEEDQ  = 4     ; R * (SEED / Q)
ARSIZE  = 8

RANDOM:
	; parameterless, updates SEED, a global variable
	; uses R3 to R7
        STORES  R1,R2
        ADDI    R2,R2,ARSIZE
        LIL     R3,SEED
        LOADS   R3,R3
        LIL     R4,Q
        LIL     R1,DIVU
        JSRS    R1,R1           ; get (seed / q) and (seed % q)
        STORE   R4,R2,SEEDMQ-ARSIZE
        LIL     R4,R 
        LIL     R1,MUL
        JSRS    R1,R1           ; get r*(seed / q)
        STORE   R3,R2,RSEEDQ-ARSIZE
        LOAD    R3,R2,SEEDMQ-ARSIZE
        LIL     R4,A 
        LIL     R1,MUL
        JSRS    R1,R1           ; get a*(seed % q)
        LOAD    R4,R2,RSEEDQ-ARSIZE
        SUB     R3,R3,R4        ; s = a*(seed % q) - r*(seed / q)
        BGE     RANDONE         ; if (s < 0) {
        LIW     R4,M
        ADD     R3,R3,R4        ;    s = s + m
RANDONE:                        ; }
        LIL     R4,SEED
        STORES  R3,R4           ; seed = s
        ADDI    R2,R2,-ARSIZE
        LOADS   R1,R2
        JUMPS   R1
b) Can you come up with an efficient way to multiply by the constant a using shift and add sequences? (0.5 points)
a = 16,807 = 100,0001,1010,0111₂
This leads directly to the following, the best solution:
	MOVE    R4,R3
        ADDSL   R3,R4,6 ; times 100 0001
        ADDSL   R3,R4,1 ; times 100 0001 1
        ADDSL   R3,R4,2 ; times 100 0001 101
        ADDSL   R3,R4,3 ; times 100 0001 1010 01
        ADDSL   R3,R4,1 ; times 100 0001 1010 011
        ADDSL   R3,R4,1 ; times 100 0001 1010 0111
a = 16,807 = 7⁵
This leads directly to the following somewhat slower solution:
        NEG     R4,R3   ; \ times 7 = 8 + (-1)
        ADDSL   R3,R4,3 ; /
        NEG     R4,R3   ; \ times 7 = 8 + (-1)
        ADDSL   R3,R4,3 ; /
        NEG     R4,R3   ; \ times 7 = 8 + (-1)
        ADDSL   R3,R4,3 ; /
        NEG     R4,R3   ; \ times 7 = 8 + (-1)
        ADDSL   R3,R4,3 ; /
        NEG     R4,R3   ; \ times 7 = 8 + (-1)
        ADDSL   R3,R4,3 ; /
a = 16,807 = 7 × 49²
This leads directly to the following pretty good solution:
        NEG     R4,R3   ; \ times 7 = 8 + (-1)
        ADDSL   R3,R4,3 ; /
        MOVE    R4,R3
        ADDSL   R3,R4,1 ; times 11
        ADDSL   R3,R4,4 ; times 110001 = 49
        MOVE    R4,R3
        ADDSL   R3,R4,1 ; times 11
        ADDSL   R3,R4,4 ; times 110001 = 49
c) Can you come up with an efficient way to multiply by the constant r using shift and add sequences? (0.5 points)
r = 2836 = 1011,0001,0100₂
This leads directly to this solution:
        MOVE    R4,R3
        ADDSL   R3,R4,2 ; times 101
        ADDSL   R3,R4,1 ; times 1011
        ADDSL   R3,R4,4 ; times 1011 0001
        ADDSL   R3,R4,2 ; times 1011 0001 01
        SL      R3,2    ; times 1011 0001 0100
r = 2836 = 4 × 709 where 709 = 10,1100,0101₂
This leads to a slightly different but no faster solution:
        SL      R3,2    ; times 4
        MOVE    R4,R3
        ADDSL   R3,R4,2 ; times 101
        ADDSL   R3,R4,1 ; times 1011
        ADDSL   R3,R4,4 ; times 1011 0001
        ADDSL   R3,R4,2 ; times 1011 0001 01
Although it was not part of the assignment, to complete the job, it would also be good to have a fast way to divide by the constant q:
q = 127,773 1/q = 0.0000,0000,0000,0000,1000,0011,0100,1110,0000,1011,0101,1110,00 ...
This means we can get an exact quotient as follows:
        MOVE    R4,R3
        SRU     R3,1	; R3 = R4 * 0.10
        ADDSRU  R3,R4,1	; R3 = R4 * 0.110
        ADDSRU  R3,R4,1	; R3 = R4 * 0.1110
        ADDSRU  R3,R4,1	; R3 = R4 * 0.1 1110
        ADDSRU  R3,R4,2	; R3 = R4 * 0.101 1110
        ADDSRU  R3,R4,2	; R3 = R4 * 0.1 0101 1110
        ADDSRU  R3,R4,1	; R3 = R4 * 0.11 0101 1110
        ADDSRU  R3,R4,2	; R3 = R4 * 0.1011 0101 1110
        ADDSRU  R3,R4,6	; R3 = R4 * 0.10 0000 1011 0101 1110
        ADDSRU  R3,R4,1	; R3 = R4 * 0.110 0000 1011 0101 1110
        ADDSRU  R3,R4,1	; R3 = R4 * 0.1110 0000 1011 0101 1110
        ADDSRU  R3,R4,3	; R3 = R4 * 0.100 1110 0000 1011 0101 1110
        ADDSRU  R3,R4,2	; R3 = R4 * 0.1 0100 1110 0000 1011 0101 1110
        ADDSRU  R3,R4,1	; R3 = R4 * 0.11 0100 1110 0000 1011 0101 1110
        ADDSRU  R3,R4,6	; R3 = R4 * 0.1000 0011 0100 1110 0000 1011 0101 1110
        SRU     R3,16	; R3 = R4 / 127,773
Having done this, we now need to compute the remainder as well. To do that, we need to multiply by q.
q = 127,773 = 1,1111,0011,0001,1101₂ q = 127,773 = 9 × 14197 = 9 × 11,0111,0111,0101₂
There is no disadvantage to brute force here, so we get:
	MOVE    R5,R3
        ADDSL   R5,R3,1 ; times 11
        ADDSL   R5,R3,2 ; times 11 01
        ADDSL   R5,R3,1 ; times 11 011
        ADDSL   R5,R3,1 ; times 11 0111
        ADDSL   R5,R3,2 ; times 11 0111 01
        ADDSL   R5,R3,1 ; times 11 0111 011
        ADDSL   R5,R3,1 ; times 11 0111 0111
        ADDSL   R5,R3,2 ; times 11 0111 0111 01
        ADDSL   R5,R3,2 ; times 11 0111 0111 0101
Now, R3 holds R4 ÷ q, and, R5 holds (R4 ÷ q) × q, so, to complete taking the remainder, we just need to subtract.
	SUB	R4,R4,R5; compute remainder
Note that, when we substitute this mess into the code for part a, we don't need local variables because we no longer call any subroutines and the intermediate results can all be saved in registers instead of locals.