Assignment 10, due April 10

Solutions

Part of the homework for CS:2630, Spring 2015
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: 50 years ago, the best selling computer in the world was the PDP-8, a very small computer (even by the standards of 1965) with a 12-bit word. A binary common floating-point format on this machine uses 3 words for each floating point number: The first holds the exponent, as a 2's complement binary number from -2048 to +2047. The second and third words are the mantissa, most significant bits first, a two's complement fixed point binary fraction with the point just to the right of the sign, normalized (if possible) so the value is greater than 0.5. There is no hidden bit. Here are some examples:
```
  1 = 000000000001 010000000000 000000000000 (exp = 1, mantissa = 1/2)
 10 = 000000000100 010100000000 000000000000 (exp = 4, mantissa = 5/8)
0.1 = 111111111101 011001100110 011001100110 (exp = -3, mantissa = 8/10)
```
a) Given a floating point number in IEEE format in R3, write code to extract the exponent from that number and convert it to PDP-8 format in the least significant 12 bits of R4, leaving R3 unchanged. You may use R5 as a scratch register. Assume that the number is neither unnormalized nor a NaN. (0.5 points)
Some preliminary work: Let's look at the exponent encoding used for 0.5: In IEEE format, the exponent is 01111110 (126), while in PDP-8 format, the exponent is as given above (1). So, we can convert IEEE exponents to PDP-8 exponents by subtracting 125.
```
        MOVE    R4,R3   ; copy the number
        SL      R4,1    ; discard the sign of the mantissa
        SRU     R4,12
        SRU     R4,12   ; aligned the exponent as an 8-bit unsigned integer
        ADDI    R4,R4,-125
```
b) Given a floating point number in IEEE format in R3, write code to extract the mantissa from that number and convert it to PDP-8 format in the least significant 24 bits of R3 (the most significant 8 bits of R3 must be set to zero). You may use R5 and R6 as scratch registers, if necessary.
The same logic used for the exponent field applies here. Really, all we need to do is recover the hidden bit and change from signed magnitude representation to two's complement representation.
```
        MOVE    R5,R3   ; keep a copy of the number for the sign bit
        SL      R3,9
        SRU     R3,9    ; discard the exponent and sign
        LIW     R6,#00800000
        OR      R3,R6   ; set hidden bit
        BITTST  R5,31
        BBR     NOTNEG
        NEG     R3,R2   ; if the number was negative, negate it
NOTNEG:
```
Background: For small x, sin x ≅ x (where x is given in radians). For somewhat larger x, we can use the first few terms of the Taylor series, so:
sin x ≅ x – x³/6 + x⁵/120

A Problem: Given that the Hawk floating point coprocessor is already turned on, and given the value of x in R3, write Hawk code (not a subroutine, just straight line code) that computes the above approximation for sin x, leaving the result in R3. (1 point).
Note: You may need some place to store intermediate results. You can use R4 and up if necessary.
The first solution given here is done using brute-force methods, except that all integer constants were converted to floating point in advance and we compute the last term first; we use FPA0 to compute each term while accumulating the sum in FPA1:
```
        COSET   R3,FPA0
        COSET   R3,FPMUL+FPA0
        COSET   R3,FPMUL+FPA0   ; *
        COSET   R3,FPMUL+FPA0   ; *
        COSET   R3,FPMUL+FPA0   ; *- x**5
        LIW     R4,#42F00000    ; -- 120.0
        COSET   R4,FPDIV+FPA0   ; -- x**5/120
        COGET   R4,FPA0         ; *
        COSET   R4,FPA1         ; -- accumulate x**5/120

        COSET   R3,FPA0
        COSET   R3,FPMUL+FPA0   ; *
        COSET   R3,FPMUL+FPA0   ; *- x**3
        LIW     R4,#40C00000    ; -- 6.0
        COSET   R4,FPDIV+FPA0   ; -- x**3/6
        COGET   R4,FPA0         ; *
        COSET   R4,FPSUB+FPA1   ; -- accumulate -x**3/6 + x**5/120

        COSET   R3,FPADD+FPA1   ; *- accumulate x - x**3/6 + x**5/120
        COGET   R3,FPA1         ; *
```
If we do a bit of algebra first, we can do a better job:

x – x³/6 + x⁵/120 = x(1 – x²/6 + x⁴/120)
= x(1 + x²/–6 + x⁴/120)
= x(1 + x²/–6(1 + x²/–20)

We also convert the constants and constant fractions to IEEE format:

1 = 3F800000₁₆
–1/6 = BE2AAAAB₁₆
–1/20 = BD4CCCCD₁₆
```
        COPUT   R3,FPA0
        COPUT   R3,FPA0 + FPMUL ; -- x**2
        LIW     R4,#BD4CCCCD    ; -- -1/20
        COGET   R5,FPA0         ; -- set aside a copy of x**2
        COPUT   R4,FPA0 + FPMUL ; -- x**2/-20
        LIW     R6,#3F800000    ; -- 1.0
        COPUT   R6,FPA0 + FPADD ; -- 1 + x**2/-20
        LIW     R4,#BE2AAAAB    ; -- -1/6
        COPUT   R3,FPA0 + FPMUL ; -- x**2(1 + x**2/120)
        COPUT   R4,FPA0 + FPMUL ; *- x**2/-6(1 + x**2/120)
        COPUT   R6,FPA0 + FPADD ; *- 1 + x**2/6(1 + x**2/120)
        COPUT   R3,FPA0 + FPMUL ; *- x(1 + x**2/6(1 + x**2/120))
        COGET   R3,FPA0         ; *
```
The original was 20 machine instructions, while this version is only 16. The improvement is actually greater than that because, whenever a computationally intensive coprocessor instruction is followed by another coprocessor instruction that operates on the same floating point accumulator, the second instruction will almost certainly have to wait. These instructions are marked with stars in the comment fields above. There are 9 such instructions in the first solution, but only 4 in the second, so the second may be significantly faster.

Background: Hardware hackers used to love the IBM PC parallel port because, if you wanted to build a new device, perhaps a robotic machine of some sort, interfacing to the parallel port was the easiest way to go. If we wanted to add a parallel port to the Hawk, we might do it like this:

A Hawk Parallel Interface
FF100010

07 06 05 04 03 02 01 00
data
Parallel-port data register

FF100004

07 06 05 04 03 02 01 00
IE ER DR RD
Parallel-port status and control register
IE = interrupt enable (control)
ER = error (status)
DR = direction (control, in = 0)
RD = data ready (status)

Parallel ports were frequently bidirectional, able to serve both as input or output ports, hence the addition of a DR control bit to set the data transfer direction. As an input port (DR = 0), RD = 1 indicates that the data register contains new input data; reading the data register will reset RD. As an output port (DR = 1), RD = 1 indicates that the data register is ready for new output data; writing the data register will reset RD. (1 point)

A problem: Write Smal Hawk code for a PUTPAR routine that outputs one 8-bit byte to the parallel port. This should not use interrupts, it should set the direction to output, wait for ready, and then transfer data to the device.

First, here is a straightforward solution that will probably work correctly most of the time, at least if nothing complicated is going on.

; Parallel Port (PP) constant definitions:
PPBASE  =       #FF100010       ; base of I/O register block
PPDATA  =       0               ; displacement of data register
PPSTAT  =       4               ; displacement of control register
PPRDY   =       0               ; bit number of ready status bit
PPDIR   =       5               ; bit number of direction control bit

PUTPAR: ; given R3 = ch, the byte to output

        ; first, setup to address the parallel port
        LIL     R4,PPBASE               ; -- index all device regs from R4

        ; second, set the direction to 1 (output)
        LIS     R5,1<<PPDIR
        STORE   R5,R4,PPSTAT

        ; third, wait for device ready
PUTPPOLL:
        LOAD    R5,R4,PPSTAT
        BITTST  R5,PPRDY
        BBR     PUTPPOLL

        ; finally, output the data and return
        STORE   R3,R4,PPDATA
        JUMPS   R1

The weak point in the straightforward solution is that, as it sets the direction to in, it also resets all the other control bits; this is not necessarily a good idea, as some of these might matter. As a result, carefully written I/O drivers frequently contain code to set bits far more carefully. Here is an example rewrite of step 2 above:

        ; second, set the direction to 1 (output)
        LOAD    R5,R4,PPSTAT
        BITTST  R5,PPDIR        ; -- only change direction if necessary
        BBS     PUTPOUT         ; if (status.dir != 1) {
        LIS     R6,1<<PPDIR
        OR      R5,R6           ;   -- preserves all other control bits
        STORE   R5,R4,PPSTAT    ;   status.dir = 1
PUTPOUT:                        ; }

x – x³/6 + x⁵/120	= x(1 – x²/6 + x⁴/120)
	= x(1 + x²/–6 + x⁴/120)
	= x(1 + x²/–6(1 + x²/–20)

1	= `3F800000`₁₆
–1/6	= `BE2AAAAB`₁₆
–1/20	= `BD4CCCCD`₁₆