22C:122/55:132, Lecture Notes, Lecture 10, Spring 2001

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Bus Design Supplement for the Ultimate RISC

    The Ultimate RISC paper presents the following bus for the Ultimate RISC:

    	Data, 16 bits, bidirectional tristate ===================
    
    	Address, 16 bits, from IEU            ===================
    
    	Write, 1 bit, unidirectional          -------------------
    
    	Read, 1 bit, unidirectional           -------------------
    	
    A pulse on the write line transfers data to the attached memory or register if the address matches the address for that memory.

    When the read line is high, the contents of the addressed memory or register are transferred to the data line.

  2. A Minimal Ultimate RISC Register

    A single register R at address A on the ultimate RISC bus would look like the following:

    	Data, 16 bits     =========================o======o===
    	Address, 16 bits  ===o=====================|======|===
    	Write, 1 bit,     ---|----o----------------|------|---
    	Read, 1 bit,      ---|----|-o--------------|------|---
                               __|__  | |   ___        |      |
                              | = A | |  --|AND|-------|-----/_\
                      address |_____| |  --|___|       |      |
                      decoder    |    | |              |      |
                                  ----|-o              |      |
                                      | |   ___     ___|___   |
                                      |  --|AND|---|>  R   |  |
                                       ----|___|   |_______|  |
                                                       |      |
                                                        ------
    	
  3. An Output Device

    An output only centronics printer interface (an IBM PC compatable parallel port) would look like the following:

    	Data, 16 bits     =========================o======o===
    	Address, 16 bits  ===o=====================|======|===
    	Write, 1 bit,     ---|----o----------------|------|---
    	Read, 1 bit,      ---|----|-o--------------|------|---
                               __|__  | |   ___        |      |
                              | = A | |  --|AND|-------|-----/_\
                      address |_____| |  --|___|       |      |
                      decoder    |    | |              /16    /16
                                  ----|-o              |     _|_
                                      | |   ___     ___|___  | |  5
                                      |  --|AND|---|>  R   | |  --/-| 
                                       ----|___|   |_______| |      |
                                                       |     o--/---| printer
                                                       |     |  8   | connector
                                                       | |--/_\     |
                                                        -|   |      |
                                                         |---|---/--|
                                                         |   |   4  |
                                                         |-/-     
                                                           8
    	
    The printer or other device attached to the I/O interface is responsible for interpreting the signals from the host computer, but the following interface documentation gives typical interpretations:
             __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
    output  |\/|\/|OE|\/|SL|IN|LF|ST|  |  |  |  |  |  |  |  |
            |__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|
            unused|  |  | control   |        data out       |
    
             OE -- enable tri-state driver for data to connector
             SL -- select output to device
             IN -- initialize device
             LF -- control a device option (linefeed on printer)
             ST -- strobe to device
             __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
    input   |BS|AK|PE|OL|ER|\/|\/|\/|  |  |  |  |  |  |  |  |
            |__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|
                 status    | unused |        data in        |
    
             BS -- device busy
             AK -- device acknowledges strobe pulse
             PE -- out of paper
             OL -- device on line
             ER -- device error
    	
    The designer of the interface is completely uninterested in all of the inputs and outputs other than the OE line. The designer of the printer itself and the designer of the software to control the printer must, however, agree on the use of the other lines. The definitions given above are compatable with a large number of IBM PC parallel port interfaces, except that the PC interface is based on 8-bit registers, with distinct addresses defined for data in and data out to each.

    This design allows something that Centronics didn't anticipate, but it is a feature common on PC compatables -- that is, the use of tri-state data lines in the parallel port, so they may be used for both input and output.

  4. Memory

    If we have a 16 bit address bus, we have, at most, 64K of address space. Consider the following design of a 16K memory module for this bus:

    	Data, 16 bits     =========================o=====o===
    	Address, 16 bits  ===o=====================|=====|===
    	Write, 1 bit,     ---|-------o-------------|-----|---
    	Read, 1 bit,      ---|-------|-o-----------|-----|---
                               __|__     | |  ___      |     |
                              /|| | \    |  -|AND|-----|----/_\
               High (+5)  ----- | |      |  -|___|  ___|___  |
                     | | |  ----  |      | |  14   |  in   | |
                     R R | |  ___  ------|-|-/-----|Addr   | |
                     | | |  -|EQU|  ___  | |  ___  | 16Kx16| |
               -oS0o-|-o-|---|___|-|AND|-|-o-|AND|-|>      | |
            Low      |    ---|EQU|-|___|  ---|___| |__out__| |
            (0)-oS1o-o-------|___|                     |     |
                                                        -----
    	
    This diagram shows the details (at an electrical level) of the address decoder for a typical memory such as this. The pair of equivalence gates (EQU) and the AND gate make the comparison between the 2 most significant bits of the address on the bus and the 2-bit value set on the switches (or jumpers) S0 and S1. If the switch is closed (or the jumper is present), the value input to the comparator is 0, while if the switch is open (or the jumper has been cut) the value is 1. An open switch allows the resistor R (usually a few thousand ohms) to pull the input to the comparator up to logic 1, typically 5 volts, while a closed switch short circuits the input to logic zero, typically ground or 0 volts.

    Many devices have option select or address select jumpers or switches that operate as outlined above.

  5. A multi-register ALU

    Adding multiple registers to the ALU allows a far more interesting programming environment than is possible with the single register of the original Ultimate RISC. Of course, merely adding extra ALU modules to the bus, each with a different address, will make the system useful, but this means duplicating the entire ALU instead of just adding registers. A good design for a multi-register ALU for the ultimate RISC might be:

    	Data, 16 bits     =====================o==========o===
    	Address, 16 bits  ===o=================|==========|===
    	Write, 1 bit,     ---|-----o-----------|----------|---
    	Read, 1 bit,      ---|-----|-o---------|----------|---
                               __|___  | |  ___    |          |
                              /|  | |\ |  -|AND|---|---------/_\
                              _|_ | |  |  -|___|   |        __|__
                             |=X || |  | |       --|-------| g(x)|
                             |___|| |  | |  4   |  |       |__x__|
                               |  |  --|-|-/----o  |          |
                               |  |    | |      |  |      ----o
                               |  |    | |      |  |_____|    |
                               |  |    | |      | |a     b|   |
                               |  |    | |      | |  ALU  |   |
                               |  |    | |       -|f      |   |
                               |  |    | |        |_f(a,b)|   |
                               |  |    | |            |       |
                               |  |    | |  4      ___|___    |
                               |   ----|-|-/------|Addr   |   |
                               |       | |  ___   |       |   |
                                -------|-o-|AND|--|>16x16 |   |
                                        ---|___|  |__out__|   |
                                                     |        |
                                                      --------
    	
    The minimum necessary set of ALU functions is: Additional functions make programming far more convenient: It is not difficult to invent long lists of additional operations that are attractive candidates for addition to this system:

    The decision to use 16 registers in this ALU was arbitrary, but it is based on the fact that there are many successful 16 register machines. The decision to use 16 operations for this ALU was far more arbitrary. It is easy to find about 8 operations that are useful on reading and writing the ALU, but it is a challenge to find 16.

  6. Getting an ALU to deliver multiple operations

    Multi-function ALU's can be built many ways, but at heart, all are based on a full adder:

          A  B  Cin  |   S  Cout
        -------------+------------
          0  0  0    |   0  0
          0  0  1    |   1  0
          0  1  0    |   1  0
          0  1  1    |   0  1
          1  0  0    |   1  0
          1  0  1    |   0  1
          1  1  0    |   0  1
          1  1  1    |   1  1
    
    To add two 2-bit numbers A and B, we string these together as follows
         A1 B1           A0 B0
          | |    ____     | |    ____Cin
         _|_|___|_   |   _|_|___|_ 
        | A B Cin |  |  | A B Cin |
        |         |  |  |         |
        | Cout  S |  |  | Cout  S |
        |_________|  |  |_________|
      ____|     |    |____|     |
    Cout        |               |
                S1              S0
    
    Of course, the "ripple carry" logic used here makes the adder speed rather poor if the word size is large, but, as it turns out, increasing the adder speed is an independent problem from increasing its functionality.

    To increase its functionality, we must look at the implementation of the adder. Typically, the function of each full adder is explained as:

       Cout = (A and B) or (A and Cin) or (B and Cin)
       S    = A xor B xor Cin
    
    High speed adder design rests on improving the speed of the carry chain, that is, computing Cout quickly. The logic for the sum S, however, remains largely unchanged in high speed adders, and this is where we will concentrate our effort in improving functionality.

    The underlying logic function used in sum computation is the exclusive or. One of the most common implementations of the exclusive or function in digital logic is the following:

                                ____
         A ---o----------------|    |
              |                |NAND|---
              |    ____     ---|____|   |    ____
               ---|    |   |             ---|    |
                  |NAND|---o                |NAND|--- A xor B
               ---|____|   |    ____     ---|____|
              |             ---|    |   |
              |                |NAND|---
         B ---o----------------|____|
    
    We can augment this basic circuit with the following auxiliary inputs:
                                ____
                      Aenab ---|    |
         A ---o----------------|NAND|---
              |    ____     ---|____|   |    ____
               ---|    |   |             ---|    |
         Xenab ---|NAND|---o                |NAND|--- F
               ---|____|   |    ____     ---|____|
              |             ---|    |   |
         B ---o----------------|NAND|---
                      Benab ---|____|
    
    
    Having done this, we can view the result as computing 8 different functions of A and B. THese are:
         Xenab  Aenab  Benab  |  Function
       -----------------------+------------
           0      0      0    |  F = 0
           0      0      1    |  F = B
           0      1      0    |  F = A
           0      1      1    |  F = A or B
           1      0      0    |  F = 0     
           1      0      1    |  F = B and not A
           1      1      0    |  F = A and not B
           1      1      1    |  F = A xor B
    
    If we now build an the sum component for one bit of the adder as follows, we get a general purpose ALU:
                A        B   Cin
                |        |    |
        Bnot  --|----    |    | 
                |   _|___|_   |
                |  |       |  |
                |  |  XOR  |  |
                |  |___ ___|  |
                |__    |      |    
                  _|___|_     |
                 | A   B |    |
        Xenab ---|Xenab  |    |
        Aenab ---|Aenab  |    |
        Benab ---|Benab  |    |
                 |___F___|    |
                     |    ____|
                    _|___|_
                   | A   B |
                ---|Xenab  |
               | 1-|Aenab  |
        Cenab -o---|Benab  |
                   |___F___|
                       |
                       F
    
    An ALU based a complete full adder using this logic now performs the following set of useful functions, along with many useless functions:
       Xenab Aenab Benab Cenab Bnot |  function
     -------------------------------+-------------
         0     0     0     0     0  |    F = 0
         0     1     1     0     0  |    F = A or B
         1     0     1     0     0  |    F = B and not A
         1     1     0     0     0  |    F = A and not B
         1     1     1     0     0  |    F = A xor B
         1     1     1     1     0  |    F = A + B + Cin
         0     1     1     0     1  |    F = A or not B
         1     1     0     0     1  |    F = A and B
         1     1     1     1     1  |    F = (A - B - 1) + Cin
    
    Of course, we don't usually want to use 5 control signals to select among 9 useful alternative, particularly when some of those alternatives are of only marginal utility, so in a practical system, we will typically select eight of these, with a bit of logic (3-inputs and 5-outputs) to select the useful functions.