Assignment 10 Solutions

Part of the homework for 22C:60 (CS:2630), Spring 2012
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

  1. Background: Here is a floating-point representation that might have been of some small use on a 16-bit minicomputer. It is designed to explicitly avoid advanced concepts common in floating point representations today:
    |_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|
    |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
    |s|   exp   |      mantissa     |
    

    Each part is worth 0.2 points.

    a) What is the approximate decimal equivalent of FFFF16 in this number system?

    FFFF16, in binary, is 1111 1111 1111 1111. Breaking this into fields for the given number representation, we have:

    s = 1 — the number is negative.
    exp = 11111 corresponding to positive 15.
    mant = 0.1111111111 which is almost, but not quite, 1.0

    So, the number is approximately -1.0 × 215. That is, approximately -32,768.

    The error in this approximation is 0.0000000001 × 215. That is, 2-10 × 215, which is 25, which is 32. So, the exact answer is 32,736.

    We can get an approximate estimate of the error much more simply. A ten-bit decimal fraction is accurate to one part in 210 which is about the same as one part in 103, so we expect our first approximation to be off by about 1/1000. That is, we need to correct it by about 32 or 33, depending on how you round 1/1000 times 32,768. Our initial approximation was good enough for most purposes, and this second-order approximation is close to perfect.

    b) What is the exact decimal equivalent of 123416 in this number system?

    123416, in binary, is 0001 0010 0011 0100. Breaking this into fields for the given number representation, we have:

    s = 0 — the number is positive.
    exp = 00100 corresponding to negative 12, that is, 1/4096
    mant = 0.1000110100 = 0.5 + 0.03125 + 0.015625 + 0.00390625 = 0.55078125

    so, using a calculator, 0.55078125/4096 = 0.000134468078613

    c) What is the normalized binary representation of of 1 in this number system?

    It will be 0.5 × 21 since that is the only solution that puts the mantissa in the correct normalized range.

    s = 0 — the number is positive.
    exp = 10001 corresponding to positive 1.
    mant = 1000000000 corresponding to 0.5

    Put these pieces to gether and we get 0 10001 1000000000 or 460016.

    d) What is the normalized binary representation of of 1010 in this number system?

    s = 0 — the number is positive.
    exp = 10100 corresponding to positive 4 because we need to multiply by 16 which is 24.
    mant = 1010000000 corresponding to 10/16 which is 5/8

    Put these pieces to gether and we get 0 10100 1010000000 or 528016.

    e) What is the normalized approximate binary representation of of 0.110 in this number system?

    Note that, in binary, 1/10 is 0.000110011001100 (repeating).

    s = 0 — the number is positive.
    exp = 01101 corresponding to negative 3, to account for the leading zeros in the binary equivalent of one tenth.
    mant = 1100110011 corresponding to the normalized fraction part of one tenth.

    Put these pieces to gether and we get 0 01101 1100110011 or 373316.

  2. Background: Here is a floating-point representation that might have been of some small use on a 16-bit minicomputer. It is designed to make use of all of the advanced concepts typical of modern floating-point numbers
    |_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|
    |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
    |s|   exp   |      mantissa     |
    

    Each part is worth 0.2 points.

    a) What is the binary equivalent of 1.010 in this number system.

    It will be 1.0 × 20 since that is the only solution that puts the mantissa in the correct normalized range.

    s = 0 — the number is positive.
    exp = 10000 corresponding to zero.
    mant = .0000000000 corresponding to 1.0 (the one bit to the left of the point is hidden).

    Put these pieces to gether and we get 0 10000 0000000000 or 400016.

    b) What is the binary representation of the largest non-infinite positive number in this system?

    s = 0 — the number is positive.
    exp = 11110 (one less than 11111 which is not a number).
    mant = .1111111111 which is almost 2 counting hidden one bit to the left of the point).

    Put these pieces to gether and we get 0 11110 1111111111 or 7BFF16.

    c) What is the approximate decimal equivalent of your answer to part a? (That was a typo, this should have asked about part b!)

    Almost 2.0 × 215 which is almost 65536.

    d) What is the binary representation of the smallest non-zero positive number in this system?

    s = 0 — the number is positive.
    exp = 00000 corresponding to a non-normalized -15.
    mant = .0000000001 which is 2-10 (the hidden bit is zero because it is not normalized).

    Put these pieces to gether and we get 0 00000 0000000001 or 000116.

    e) What is the approximate decimal equivalent of your answer to part c? (That was a typo, this should have asked about part d!)

    2-25 is exact. Note that, in decimal, 210 is approximately 1000, and 220 is approximately one million. So, we have approximately 2-5 × 0.000001 (that is times one millionth). 2-5 is .03125, so the approximate answer is 0.00000003125; my calculator gives an exact answer of 0.000000029802322

  3. A Problem: Write a SMAL Hawk subroutine that takes, as an argument, a floating point number in the least significant 16 bits of R3 encoded in the format defined for problem 1. The subroutine should convert this number to the smallest integer greater than or equal to that number and return the result in R3 when it is done. Write textbook-quality code. Insufficient or excessive comments will be penalized. (1 point)
            SUBTITLE "FLTTOINT, float to int convert"
    
    BIAS    =       16      ; the bias on the exponent field
    
    FLTTOINT:       ; given    R3 = f, a floating point number
                    ; returns  R3 = i, the integer equivalent
                    ; uses     R4 = a copy of f
                    ; uses     R3 = mmmmmmmmmm the mantissa
                    ; uses     R6 = eeeee the exponent
    
            ; the floating point format is s eeeee mmmmmmmmmm
    
            MOVE    R4,R3           ; -- copy f so we can check bit 15, the sign
            TRUNC   R3,10           ; mmmmmmmmmm = f, bits 9 - 0
            MOVE    R5,R3
            SL      R5,10
            TRUNC   R5,5            ; eeeee = f, bits 14-10
            CMPI    R5,10+BIAS
            BEQ     FTOIDEN         ; if (eeeee != 10) { -- need to denormalize
            BGT     FTOIGT          ;   if (eeeee <= 10) {
    FTOILT:                         ;     do {
            SR      R3,1            ;       mmmmmmmmmm = mmmmmmmmmm / 2
            ADDSI   R5,1            ;       eeeee = eeeee + 1
            CMPI    R5,10+BIAS
            BLT     FTOILT          ;     } while (eeeee < 10)
            BR      FTOIDEN         ;     -- now eeeee = 10
    FTOIGT:                         ;   } else {
                                    ;     do {
            SL      R3,1            ;       mmmmmmmmmm = mmmmmmmmmm * 2
            ADDSI   R5,-1           ;       eeeee = eeeee - 1
            CMPI    R5,10+BIAS
            BGT     FTOIGT          ;     } while (eeeee > 10)
                                    ;     -- now eeeee = 10
                                    ;   }
    FTOIDEN:                        ; }
                                    ; -- now eeeee = 10 and mmmmmmmmmm is integer
            TBIT    R4,15
            BBR     FTOIQT          ; if (f < 0) {
            NEG     R3              ;   mmmmmmmmmm = -mmmmmmmmmm
    FTOIQT:                         ; }
            JUMPS   R1              ; return i