22C:122/55:132, Notes, Lecture 5, Spring 2001

Douglas W. Jones
University of Iowa Department of Computer Science

Random Access Memory

Random access memory plays an important role in many settings. In addition to the large RAM used for main memory, smaller random access memories serve many roles.

Any random access memory can be modelled as an array of registers, and small RAMs frequently follow this model quite closely. The most naive implementation of a memory is as follows:

	                         data in
                                    |
                                    /n
                                    |
                      ---------o----o----o---------
                   __|__     __|__     __|__     __|__
                  | _-_ |   | _-_ |   | _-_ |   | _-_ |
               /|-|>____|  -|>____|  -|>____|  -|>____|
           ___| |----|-----    |    |    |    |    |
         clock| |----|---------|----     |    |    |
               \|----|---------|---------|----     |
               |     |          ---   ---          |
               |      -----------  | |  -----------
            2  |                _|_|_|_|_
           --/-o----------------\0 1 2 3/
         address                 \_____/
                                    |
                                    /n
                                    |
                                 data out

The registers used to implement a small RAM are almost universally simple latches. The larger the memory, the more likely some other technology is to be used, both simpler (capacitive memory in DRAM) or more complex (DRAM refresh logic). In the above, the notation suggests positive pulse triggered latches. Normally, we will use the following notation to indicate such a RAM:

                      |
                      /n
                      |
	        -------------
	       |   data in   |
	       |             |
	 --/---|addr         |
	    2  |     RAM     |
	       |    4 x n    |
	       |   _         |
	 ------|>_| |_       |
	       |             |
	       |   data out  |
	        -------------
                      |
                      /n
                      |

Having indicated that the RAM has 4n bits, the labels stating that the data in and data out lines are n bits wide and that the address line is log₂4 bits wide are redundant, but such redundancy can prevent misunderstanding!

The usual implementation of RAM does not involve a distinct multiplexor and demultiplexor. Instead, the address lines are decoded once and the decoder outputs distributed to the memory cells as follows:

	address inputs                   data inputs
          A       A                           D
          |1      |0         clock            |i
          o--     o--          |              o--
          | _|_   | _|_        |              | _|_
          | \ /   | \ /        |              | \ /
          |  O    |  O         |    ___       |  O
          |  |    |  |         o---|   \      
          |  |    |  |         |   |    )-- row 0 clock
          |  |    |  |   ___   |  -|___/
         -|--o----|--|--|   \  | |
          |  |    |  |  |    )-|-o--------- row 0 select
         -|--|----|--o--|___/  |    ___
          |  |    |  |         o---|   \
          |  |    |  |         |   |    )-- row 1 clock
          |  |    |  |   ___   |  -|___/
         -|--o----|--|--|   \  | |
          |  |    |  |  |    )-|-o--------- row 1 select
         -|--|----o--|--|___/  |    ___
          |  |    |  |         o---|   \
          |  |    |  |         |   |    )-- row 2 clock
          |  |    |  |   ___   |  -|___/
         -o--|----|--|--|   \  | |
          |  |    |  |  |    )-|-o--------- row 2 select
         -|--|----|--o--|___/  |    ___
          |  |    |  |         o---|   \
          |  |    |  |         |   |    )-- row 3 clock
          |  |    |  |   ___   |  -|___/
         -o--|----|--|--|   \  | |
          |  |    |  |  |    )-|-o--------- row 3 select
         -|--|----o--|--|___/  |
          |  |    |  |         |

Each row of the memory is one register, with select and clock signals from the left and data and inverted data inputs for each bit from above. Each bit of a static RAM memory is typically implemented as follows:

	  data inputs
		  _     
	     D    D
	     |i   |i      ___                           |
	     o----|------|   \          ___             |
	     |    |      |    )O-------|   \       |\   |
	     |    |    --|___/         |    )O-o---| )--o
	     |    |   |   ___       ---|___/   |   |/   |
	     |    o---|--|   \     |    ___    |h   |   |
	     |    |   |  |    )O---|---|   \   |    |   |
	     |    |   o--|___/     |   |    )O-|-   |   |
	     |    |   |            |  -|___/   | |  |   |
	     |    |   |            | |         | |  |   |
	     |    |   |            |  ---------  |  |   |
	     |    |   |             -------------   |   |
row clock ---|----|---o-----------------------------|---|---
row select --|----|---------------------------------o---|---
	     |    |                                     |
						   data output

The memory cell shown above has a tri-state output, so that the multiplexor function is distributed between memory cells instead of being abstracted out. An outside observer cannot see this, although some memory designs also include a tri-state data output on the entire memory, so that many memory chips can be wired in parallel, with only one of them enabled at any time.

A Theorem
Any program written using only:
- Simple variables (of any precision)
- Arrays of simple variables
- Assignment statements
- Addition, subtraction, logical and shift operations
- If statements
- While loops (test before loop body)
- Until loops (test after loop body)
Can be directly reduced to a digital system that implements the semantics of that program!
Note that the above list does not contain subroutine calls of any kind (procedure or function), it does not contain pointers, local variables or scope rules, nor does it contain any programming language features that support data abstraction.
These omissions are of no importance! Any program that can be run in finite memory can be rewritten in a form that does not use any of the omitted features. All non-recursive subroutines can be treated as macros, simply replacing calls with the code for the body of the called routine. Local variables of non-recursive routines can be replaced by global variables, after simple renaming to eliminate conflicts. All recursive algorithms can be rewritten as iterative algorithms that push and pop data from stacks. All uses of pointers can always be rewritten as uses of array indexing, and so on.
Our proof will commence by looking at the registers and register transfers needed to realize any program before looking at the control unit needed to sequence those register transfers.
Proof, Simple Variables
A constructive proof of this theorem begins with the simple observation that each simple variable in the program can be realized by a register. The number of bits in the register must be sufficient to represent all possible values of that variable. Thus, we rewrite
int I;
as
| ___|___ Ick | | ---|> I | |_______| | |
Here, Ick is the clock signal to register I. This receives a clock pulse whenever we assign to I.
Proof, Arrays
The proof continues by observing that each array can be realized by a RAM. The number of bits in each word of the memory is set by the type of the array elements. The number of entries is the array dimension. Thus, we rewrite
int A[x];
as
| ___|___ | | -/-|addr | | | | RAM | Ack | | ---|> A | |_______| | |
Here, Ack is the clock for the RAM array A; it receives a pulse whenever we assign to A. The address line receives the array subscript.
Proof, Assignment
A assignment statement consists of a variable to which a value is assigned, a combinational function that combines the operands to the statement, and a set of variables that are used as arguments to the combinational function. In general, we can represent assignment statements as:
A = f(B, C ...)
We can realize this in hardware as:
| B | | C | |_______| |_______| | ... | -- -- _|_|_|_ | | | f | |_______| | ___|___ Ack | | ---|> A | |_______| |
The above realization applies only when we're dealing with a variable that occurs on the right-hand-side of only one assignment statement anywhere in the program.
If a variable occurs on multiple right-hand-sides, we need to add a multiplexor. Consider the case where the following two assignments occur somewhere in a program:
A = f(B, C) A = g(D, E)
In this case, we may realize this in hardware as:
| B | | C | | D | | E | |_______| |_______| |_______| |_______| | | | | -- -- -- -- _|___|_ _|___|_ | | | | | f | | g | |_______| |_______| | | ------- ------- _|___|_ Asel \ 0 1 / -----------\ Amux/ \___/ | ___|___ Ack | | ----------|> A | |_______| |
Here, the value present on Asel determines which of two possible assignments to A are to be performed at the time a clock pulse is placed on Ack.
If an array occurs with different subscripts at different points in a program, a similar multiplexor is also required on the address input to the RAM representing that array.
Proof, RAM and register types
If some variable or array occurs on both the left hand and right hand side of the same assignment, the register (or RAM cells) used to implement that variable must be edge-triggered. In all other cases, simple latches (pulse triggered registers or RAM cells) suffice.
Note that any assignment where a variable occurs on both sides of that assignment can be rewritten, at the cost of one temporary variable, to allow simple latches to be used. For example:
A[i] = A[i]+1
Can be rewritten as:
T = A[i]+1 A[i] = T
Given the fact that most RAM implementations are not edge-triggered, we will apply this to all assignments where RAM is involved. Note that the temporary variable introduced by this transformation is essentially serving as the master stage in a master-slave register, in this case, A[i] serves as the slave stage!
RAM offers a few other constraints. Most RAM implementations have only one address input, so if we find an expression that involves two different array subscripts in the same array, we will typically need to introduce either an expensive 2-port memory or a temporary variable, as:
X = A[i]+A[j]
Can be rewritten as:
T = A[i] X = T + A[j]
Boolean expressions
The boolean expressions tested by if, while and until statements can be evaluated using AND, OR and NOT gates based on the component terms of the expressions. Where 2-valued variables (boolean or integer) are involved, these may be directly incorporated into the logic of the boolean expression.
Where a variable must be tested for zero, or where the sign bit or any other bits of the variable is tested for particular values, we will rewrite, for example:
A == 0
as
| A | |_______| | / ___|___ | | | =0? | |_______| A=0 | <-----
Here, we show the boolean variable exiting to the left in order to conform to the convention that the control unit is off to the left.
The Control Unit and the effect of Optimization
The control unit implements the control structure of the program by generating all clock and multiplexor control signals in the appropriate order, responding to the boolean signals from logic such as that shown above, as needed.
At the top level, the naive digital system that realizes a program will have the following form:
_________ _________ | | booleans | | | |<------------| | | | | register| | control | mux control | transfer| | unit |------------>| system | | | clocks | | | |------------>| | |_________| |_________|
In the above, the booleans include one wire carrying the result of each boolean test required in the program, the mux control lines include inputs to all the multiplexors used to select RAM addresses and register inputs, and the clocks include the clock lines to each register.
The naive digital system is rarely optimal. Frequently, for example, it is possible to directly select a register input by taking a boolean representing a condition tested by an if statement and using that directly to control a multiplexor. Consider the following program fragment:
if (a < 0) { b = -a; } else { b = a; }
If there are no other assignments to B in the program, we might optimize this to:
| A | |_______| | -------o-o------ ___|___ | ___|___ | | | | | | A<0? | | | g | |_______| | |_______| | | | | | ---- | _|___|_ | \ 0 1 / ------\ Amux/ \___/ | ___|___ Bck | | ----------|> B | |_______| |
Be careful making such optimizations! When used with care, they can materially simplify the specification of the control unit, but used carelessly, they can make a circuit far more difficult to debug.
The Control Unit as a Finite State Machine
Our goal is to build a control unit that evokes the register transfers indicated by the program, and to do so in the sequence required. The fact that we are building a digital system with a sequencing constraint suggests immediately that the solution will take the form of a sequential circuit, and this, in turn, suggests modelling the circuit as a finite state machine. This suggestion is exactly the path we will follow!
The naive, mechanical reduction of a program to a digital system produces a register transfer subsystem that allows for all assignments required to compute that program, and it includes outputs corresponding to every Boolean expression in that program, but it contains nothing related to the flow of control in that program. The control unit, on the other hand, will ignore all issues of what is assigned to what, and instead, it will contain only control information. Specifically:
The finite state machine will have one state per assignment statement. When in that state, exactly that assignment statement and no other will be executed. If the flowchart of the program has a flow arrow from assignment statement A directly to assignment statement B, the finite state machine will allow a transition from state A to state B, and this transition will be taken regardless of the inputs to the machine.
Where the program contains a conditional test, where statement A has multiple possible successors, depending on some set of boolean tests. State A will have multiple successors that depend on inputs to the machine.
Following the above rules leads to a moore machine, so the formal specification of our control unit will look like:
booleans current | next state | state -------------------|------- state | mux register | | control clock | -------|------------------ | | | | | |
Of course, a Mealy specification can also be realized.
Assuming all registers in the digital system are either positive pulse triggered or are triggered on the negative clock edge, we can use the following generic moore control unit to control any register transfer system:
-----<---/----------- | booleans from ---------o-|----------- register transfer | ____|_|____ ____|____ subsystem | | | | | | | next | | | | | state | | output | | | ROM | | ROM | _ / |___________| |_________| | Mux selects | | | |-/--->--- | _____|_____ ---/---| ___ Register | | _ | |-/--| \ clocks | -|>_| state | |_ | )-/- | | |___________| -|___/ _ | | | | |_ --|------- | clock| | -----o-------------------------------
This control unit will change state on each positive edge of the system clock; the multiplexor select inputs will immediately stabilize, so that, by the time the negative clock edge comes, the inputs to the registers are stable. The booleans required for the next cycle then have 1/2 clock cycle to stabilize before the next state change.
QED!
Microprogramming
Control units for complex systems must frequently interpret many signals from the register transfer unit; if there are N states in the control unit and M booleans returned from the register transfer unit, there will be N2^M rows in the truth table for the next state function. The result can get very unwieldy very quickly!
To simplify the design, it is common to add a multiplexor to the system that is used to select which of the booleans from the register transfer system are presented to the finite state control unig as inputs.
__________ /|------| | | |------|Boolean | ----/----| |------|Outputs | ____|____ | |------| | | | \|------| | | Control | | | | | | | | Register | | Unit | | | Transfer | |_________| | | System | | | | | | | | | ---/--- | | | | |Mux | | --------/---------|Controls | | | | -----------/---------|Clocks | |__________|
This scheme limits the number of states that may follow each state, so, for example, we might only allow 2 or 4 successors to each state in the control unit, and this has a cost! If an optimal design would have had a 4 way branch when we only allow 2 successors to each state, we must introduce a tree of states used only for branching.
This is frequently worthwhile despite this small cost. If we use a mealy control unit, the state table for the resulting finite state control unit will contain the following items in each row:
_______ __________ __________ _____________ N |_______|__________|__________|_____________| | next | register | mux | conditional | Row state clocks controls select
We can think of this as a machine instruction format, where the next-state field is concatenated with the boolean conditions we have selected to compute the next instruction to be executed. It is therefore common to refer to each row of the control unit truth table as a microinstruction.
Thus, designing the control unit becomes a task akin to machine language programming, and the control unit itself is reduced to a very simple system consisting of a state register and a ROM. What we have developed here is called a horizontal microcode! Horizontal because the words are very long, able to evoke many different register transfers in parallel.
If we further restrict the code so that only one register transfer may be evoked per microinstruciton, the word size of the microcode is far shorter, and we refer to it as a vertical microcode because it may take a large number of short microinstructions to do something that could have been done in one horizontal microinstruction.
As with any machine language programming task, we can simplify our job by writing an assembler or even a compiler, although few microprogrammed systems are of sufficient complexity to justify development of compilers. In many cases, off-the-shelf assemblers can be used to assemble the microcode, with only a small macro package required to make the user interface m