The Ultimate RISC paper presents the following bus for the Ultimate RISC:
Data, 16 bits, bidirectional tristate =================== Address, 16 bits, from IEU =================== Write, 1 bit, unidirectional ------------------- Read, 1 bit, unidirectional -------------------A pulse on the write line transfers data to the attached memory or register if the address matches the address for that memory.
When the read line is high, the contents of the addressed memory or register are transferred to the data line.
A single register R at address A on the ultimate RISC bus would look like the following:
Data, 16 bits =========================o======o===
Address, 16 bits ===o=====================|======|===
Write, 1 bit, ---|----o----------------|------|---
Read, 1 bit, ---|----|-o--------------|------|---
__|__ | | ___ | |
| = A | | --|AND|-------|-----/_\
address |_____| | --|___| | |
decoder | | | | |
----|-o | |
| | ___ ___|___ |
| --|AND|---|> R | |
----|___| |_______| |
| |
------
An output only centronics printer interface (an IBM PC compatable parallel port) would look like the following:
Data, 16 bits =========================o======o===
Address, 16 bits ===o=====================|======|===
Write, 1 bit, ---|----o----------------|------|---
Read, 1 bit, ---|----|-o--------------|------|---
__|__ | | ___ | |
| = A | | --|AND|-------|-----/_\
address |_____| | --|___| | |
decoder | | | /16 /16
----|-o | _|_
| | ___ ___|___ | | 5
| --|AND|---|> R | | --/-|
----|___| |_______| | |
| o--/---| printer
| | 8 | connector
| |--/_\ |
-| | |
|---|---/--|
| | 4 |
|-/-
8
The printer or other device attached to the I/O interface
is responsible for interpreting the signals from the host
computer, but the following interface documentation gives
typical interpretations:
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
output |\/|\/|OE|\/|SL|IN|LF|ST| | | | | | | | |
|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|
unused| | | control | data out |
OE -- enable tri-state driver for data to connector
SL -- select output to device
IN -- initialize device
LF -- control a device option (linefeed on printer)
ST -- strobe to device
__ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __
input |BS|AK|PE|OL|ER|\/|\/|\/| | | | | | | | |
|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|__|
status | unused | data in |
BS -- device busy
AK -- device acknowledges strobe pulse
PE -- out of paper
OL -- device on line
ER -- device error
The designer of the interface is completely uninterested in
all of the inputs and outputs other than the OE line. The
designer of the printer itself and the designer of the software
to control the printer must, however, agree on the use of the
other lines. The definitions given above are compatable with
a large number of IBM PC parallel port interfaces, except that
the PC interface is based on 8-bit registers, with distinct
addresses defined for data in and data out to each.
This design allows something that Centronics didn't anticipate, but it is a feature common on PC compatables -- that is, the use of tri-state data lines in the parallel port, so they may be used for both input and output.
If we have a 16 bit address bus, we have, at most, 64K of address space. Consider the following design of a 16K memory module for this bus:
Data, 16 bits =========================o=====o===
Address, 16 bits ===o=====================|=====|===
Write, 1 bit, ---|-------o-------------|-----|---
Read, 1 bit, ---|-------|-o-----------|-----|---
__|__ | | ___ | |
/|| | \ | -|AND|-----|----/_\
High (+5) ----- | | | -|___| ___|___ |
| | | ---- | | | 14 | in | |
R R | | ___ ------|-|-/-----|Addr | |
| | | -|EQU| ___ | | ___ | 16Kx16| |
-oS0o-|-o-|---|___|-|AND|-|-o-|AND|-|> | |
Low | ---|EQU|-|___| ---|___| |__out__| |
(0)-oS1o-o-------|___| | |
-----
This diagram shows the details (at an electrical level)
of the address decoder for a typical memory such as this.
The pair of equivalence gates (EQU) and the AND gate make
the comparison between the 2 most significant bits of the
address on the bus and the 2-bit value set on the switches
(or jumpers) S0 and S1. If the switch is closed (or the jumper
is present), the value input to the comparator is 0, while if
the switch is open (or the jumper has been cut) the value
is 1. An open switch allows the resistor R (usually a few
thousand ohms) to pull the input to the comparator up to
logic 1, typically 5 volts, while a closed switch short
circuits the input to logic zero, typically ground or 0 volts.
Many devices have option select or address select jumpers or switches that operate as outlined above.
Adding multiple registers to the ALU allows a far more interesting programming environment than is possible with the single register of the original Ultimate RISC. Of course, merely adding extra ALU modules to the bus, each with a different address, will make the system useful, but this means duplicating the entire ALU instead of just adding registers. A good design for a multi-register ALU for the ultimate RISC might be:
Data, 16 bits =====================o==========o===
Address, 16 bits ===o=================|==========|===
Write, 1 bit, ---|-----o-----------|----------|---
Read, 1 bit, ---|-----|-o---------|----------|---
__|___ | | ___ | |
/| | |\ | -|AND|---|---------/_\
_|_ | | | -|___| | __|__
|=X || | | | --|-------| g(x)|
|___|| | | | 4 | | |__x__|
| | --|-|-/----o | |
| | | | | | ----o
| | | | | |_____| |
| | | | | |a b| |
| | | | | | ALU | |
| | | | -|f | |
| | | | |_f(a,b)| |
| | | | | |
| | | | 4 ___|___ |
| ----|-|-/------|Addr | |
| | | ___ | | |
-------|-o-|AND|--|>16x16 | |
---|___| |__out__| |
| |
--------
The minimum necessary set of ALU functions is:
The decision to use 16 registers in this ALU was arbitrary, but it is based on the fact that there are many successful 16 register machines. The decision to use 16 operations for this ALU was far more arbitrary. It is easy to find about 8 operations that are useful on reading and writing the ALU, but it is a challenge to find 16.
Multi-function ALU's can be built many ways, but at heart, all are based on a full adder:
A B Cin | S Cout
-------------+------------
0 0 0 | 0 0
0 0 1 | 1 0
0 1 0 | 1 0
0 1 1 | 0 1
1 0 0 | 1 0
1 0 1 | 0 1
1 1 0 | 0 1
1 1 1 | 1 1
To add two 2-bit numbers A and B, we string these together as follows
A1 B1 A0 B0
| | ____ | | ____Cin
_|_|___|_ | _|_|___|_
| A B Cin | | | A B Cin |
| | | | |
| Cout S | | | Cout S |
|_________| | |_________|
____| | |____| |
Cout | |
S1 S0
Of course, the "ripple carry" logic used here makes the adder speed
rather poor if the word size is large, but, as it turns out, increasing the
adder speed is an independent problem from increasing its functionality.
To increase its functionality, we must look at the implementation of the adder. Typically, the function of each full adder is explained as:
Cout = (A and B) or (A and Cin) or (B and Cin) S = A xor B xor CinHigh speed adder design rests on improving the speed of the carry chain, that is, computing Cout quickly. The logic for the sum S, however, remains largely unchanged in high speed adders, and this is where we will concentrate our effort in improving functionality.
The underlying logic function used in sum computation is the exclusive or. One of the most common implementations of the exclusive or function in digital logic is the following:
____
A ---o----------------| |
| |NAND|---
| ____ ---|____| | ____
---| | | ---| |
|NAND|---o |NAND|--- A xor B
---|____| | ____ ---|____|
| ---| | |
| |NAND|---
B ---o----------------|____|
We can augment this basic circuit with the following auxiliary inputs:
____
Aenab ---| |
A ---o----------------|NAND|---
| ____ ---|____| | ____
---| | | ---| |
Xenab ---|NAND|---o |NAND|--- F
---|____| | ____ ---|____|
| ---| | |
B ---o----------------|NAND|---
Benab ---|____|
Having done this, we can view the result as computing 8 different functions
of A and B. THese are:
Xenab Aenab Benab | Function
-----------------------+------------
0 0 0 | F = 0
0 0 1 | F = B
0 1 0 | F = A
0 1 1 | F = A or B
1 0 0 | F = 0
1 0 1 | F = B and not A
1 1 0 | F = A and not B
1 1 1 | F = A xor B
If we now build an the sum component for one bit of the adder as follows,
we get a general purpose ALU:
A B Cin
| | |
Bnot --|---- | |
| _|___|_ |
| | | |
| | XOR | |
| |___ ___| |
|__ | |
_|___|_ |
| A B | |
Xenab ---|Xenab | |
Aenab ---|Aenab | |
Benab ---|Benab | |
|___F___| |
| ____|
_|___|_
| A B |
---|Xenab |
| 1-|Aenab |
Cenab -o---|Benab |
|___F___|
|
F
An ALU based a complete full adder using this logic now performs the
following set of useful functions, along with many useless functions:
Xenab Aenab Benab Cenab Bnot | function
-------------------------------+-------------
0 0 0 0 0 | F = 0
0 1 1 0 0 | F = A or B
1 0 1 0 0 | F = B and not A
1 1 0 0 0 | F = A and not B
1 1 1 0 0 | F = A xor B
1 1 1 1 0 | F = A + B + Cin
0 1 1 0 1 | F = A or not B
1 1 0 0 1 | F = A and B
1 1 1 1 1 | F = (A - B - 1) + Cin
Of course, we don't usually want to use 5 control signals to select among
9 useful alternative, particularly when some of those alternatives are of
only marginal utility, so in a practical system, we will typically select
eight of these, with a bit of logic (3-inputs and 5-outputs) to select
the useful functions.