Assignment 10 Solutions

Part of the homework for 22C:60 (CS:2630), Spring 2012
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: Here is a floating-point representation that might have been of some small use on a 16-bit minicomputer. It is designed to explicitly avoid advanced concepts common in floating point representations today:
```
|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|s|   exp   |      mantissa     |
```
- s -- the sign bit; 1 = negative, 0 = positive.
- exp -- the exponent; biased so 00000 = -16 and 11111 = 15.
- mantissa -- a binary fraction, so that 10 0000 0000 = 0.5
  If the exponent is greater than 00000, the mantissa should always be normalized in the range from 0.5 to just less than 1.0.
Each part is worth 0.2 points.
a) What is the approximate decimal equivalent of FFFF₁₆ in this number system?

FFFF₁₆, in binary, is 1111 1111 1111 1111. Breaking this into fields for the given number representation, we have:
s = 1 — the number is negative.
exp = 11111 corresponding to positive 15.
mant = 0.1111111111 which is almost, but not quite, 1.0
So, the number is approximately -1.0 × 2¹⁵. That is, approximately -32,768.
The error in this approximation is 0.0000000001 × 2¹⁵. That is, 2^-10 × 2¹⁵, which is 2⁵, which is 32. So, the exact answer is 32,736.
We can get an approximate estimate of the error much more simply. A ten-bit decimal fraction is accurate to one part in 2¹⁰ which is about the same as one part in 10³, so we expect our first approximation to be off by about 1/1000. That is, we need to correct it by about 32 or 33, depending on how you round 1/1000 times 32,768. Our initial approximation was good enough for most purposes, and this second-order approximation is close to perfect.

b) What is the exact decimal equivalent of 1234₁₆ in this number system?

1234₁₆, in binary, is 0001 0010 0011 0100. Breaking this into fields for the given number representation, we have:
s = 0 — the number is positive.
exp = 00100 corresponding to negative 12, that is, 1/4096
mant = 0.1000110100 = 0.5 + 0.03125 + 0.015625 + 0.00390625 = 0.55078125
so, using a calculator, 0.55078125/4096 = 0.000134468078613

c) What is the normalized binary representation of of 1 in this number system?

It will be 0.5 × 2¹ since that is the only solution that puts the mantissa in the correct normalized range.
s = 0 — the number is positive.
exp = 10001 corresponding to positive 1.
mant = 1000000000 corresponding to 0.5
Put these pieces to gether and we get 0 10001 1000000000 or 4600₁₆.

d) What is the normalized binary representation of of 10₁₀ in this number system?

s = 0 — the number is positive.
exp = 10100 corresponding to positive 4 because we need to multiply by 16 which is 2⁴.
mant = 1010000000 corresponding to 10/16 which is 5/8
Put these pieces to gether and we get 0 10100 1010000000 or 5280₁₆.

e) What is the normalized approximate binary representation of of 0.1₁₀ in this number system?

Note that, in binary, 1/10 is 0.000110011001100 (repeating).
s = 0 — the number is positive.
exp = 01101 corresponding to negative 3, to account for the leading zeros in the binary equivalent of one tenth.
mant = 1100110011 corresponding to the normalized fraction part of one tenth.
Put these pieces to gether and we get 0 01101 1100110011 or 3733₁₆.
Background: Here is a floating-point representation that might have been of some small use on a 16-bit minicomputer. It is designed to make use of all of the advanced concepts typical of modern floating-point numbers
```
|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|
|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
|s|   exp   |      mantissa     |
```
- s -- the sign bit; 1 = negative, 0 = positive.
- exp -- the exponent; biased so 00001 = -15 and 11110 = 14.
  The special exponent value 00000 also represents -15, but indicates that the mantissa is not normalized.
  The special exponent value 11111 means not a number or infinity.
- mantissa -- a binary fraction.
  For normalized numbers, there is a hidden bit just to the left of the point, so the mantissa 00 0000 0000 = 1.0 and 10 0000 0000 = 1.5 (the hidden bit is always one as a consequence of normalization).
  For non-normalized numbers, the hidden bit is zero, so 00 0000 0000 = 0.0 and 10 0000 0000 = 0.5.
Each part is worth 0.2 points.
a) What is the binary equivalent of 1.0₁₀ in this number system.

It will be 1.0 × 2⁰ since that is the only solution that puts the mantissa in the correct normalized range.
s = 0 — the number is positive.
exp = 10000 corresponding to zero.
mant = .0000000000 corresponding to 1.0 (the one bit to the left of the point is hidden).
Put these pieces to gether and we get 0 10000 0000000000 or 4000₁₆.

b) What is the binary representation of the largest non-infinite positive number in this system?

s = 0 — the number is positive.
exp = 11110 (one less than 11111 which is not a number).
mant = .1111111111 which is almost 2 counting hidden one bit to the left of the point).
Put these pieces to gether and we get 0 11110 1111111111 or 7BFF₁₆.

c) What is the approximate decimal equivalent of your answer to part a? (That was a typo, this should have asked about part b!)

Almost 2.0 × 2^{15 which is almost 65536.}

d) What is the binary representation of the smallest non-zero positive number in this system?

s = 0 — the number is positive.
exp = 00000 corresponding to a non-normalized -15.
mant = .0000000001 which is 2^-10 (the hidden bit is zero because it is not normalized).
Put these pieces to gether and we get 0 00000 0000000001 or 0001₁₆.

e) What is the approximate decimal equivalent of your answer to part c? (That was a typo, this should have asked about part d!)

2^-25 is exact. Note that, in decimal, 2¹⁰ is approximately 1000, and 2²⁰ is approximately one million. So, we have approximately 2^-5 × 0.000001 (that is times one millionth). 2^-5 is .03125, so the approximate answer is 0.00000003125; my calculator gives an exact answer of 0.000000029802322

A Problem: Write a SMAL Hawk subroutine that takes, as an argument, a floating point number in the least significant 16 bits of R3 encoded in the format defined for problem 1. The subroutine should convert this number to the smallest integer greater than or equal to that number and return the result in R3 when it is done. Write textbook-quality code. Insufficient or excessive comments will be penalized. (1 point)

        SUBTITLE "FLTTOINT, float to int convert"

BIAS    =       16      ; the bias on the exponent field

FLTTOINT:       ; given    R3 = f, a floating point number
                ; returns  R3 = i, the integer equivalent
                ; uses     R4 = a copy of f
                ; uses     R3 = mmmmmmmmmm the mantissa
                ; uses     R6 = eeeee the exponent

        ; the floating point format is s eeeee mmmmmmmmmm

        MOVE    R4,R3           ; -- copy f so we can check bit 15, the sign
        TRUNC   R3,10           ; mmmmmmmmmm = f, bits 9 - 0
        MOVE    R5,R3
        SL      R5,10
        TRUNC   R5,5            ; eeeee = f, bits 14-10
        CMPI    R5,10+BIAS
        BEQ     FTOIDEN         ; if (eeeee != 10) { -- need to denormalize
        BGT     FTOIGT          ;   if (eeeee <= 10) {
FTOILT:                         ;     do {
        SR      R3,1            ;       mmmmmmmmmm = mmmmmmmmmm / 2
        ADDSI   R5,1            ;       eeeee = eeeee + 1
        CMPI    R5,10+BIAS
        BLT     FTOILT          ;     } while (eeeee < 10)
        BR      FTOIDEN         ;     -- now eeeee = 10
FTOIGT:                         ;   } else {
                                ;     do {
        SL      R3,1            ;       mmmmmmmmmm = mmmmmmmmmm * 2
        ADDSI   R5,-1           ;       eeeee = eeeee - 1
        CMPI    R5,10+BIAS
        BGT     FTOIGT          ;     } while (eeeee > 10)
                                ;     -- now eeeee = 10
                                ;   }
FTOIDEN:                        ; }
                                ; -- now eeeee = 10 and mmmmmmmmmm is integer
        TBIT    R4,15
        BBR     FTOIQT          ; if (f < 0) {
        NEG     R3              ;   mmmmmmmmmm = -mmmmmmmmmm
FTOIQT:                         ; }
        JUMPS   R1              ; return i