5. From Algorithm to Register Transfer Logic

Part of the 22C:122/55:132 Lecture Notes for Spring 2003
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Theorem

Any program can be mechanically reduced to a digital system that implements the semantics of that program! By mechanically reduced, we mean that there exist algorithms for this.

Correlaries

Computers are unnecessary. We could build hardware that directly implements the semantics of any program.

All algorithms, expressed in program form, are the specifications of mechanisms. The distinction between mechanism and algorithm that was once made in US patent law is entirely meaningless. (Patent lawyers have argued that algorithms cannot be patented because algorithms are mathematical abstractions and patent law applies only to mechanisms.

Outline of proof

First, many mechanisms in modern programming languages are redundant. All pointers can be replaced with integer indices into arrays, where each array entry contains one object of the type referenced by the pointer. All recursive algorithms can be reformulated in iterative form, perhaps with the addition of explicit stack objects with an explicit stack pointer. All object oriented programs can be rewritten in non-object-oriented form, replacing calls to methods of objects with calls to functions operating on those objects.

Once recursion is eliminated, all local variables can be replaced with global variables, using renaming schemes to deal with name conflicts. All function calls can then be eliminated by the inline substitution of the indicated function bodies in place of the function calls, with appropriate substitutions of actual parameters for formal parameters. This may require the introduction of temporary variables to hold copies of some of the actual parameters.

Complex primitives such as multiplication and division can be replaced by algorithms based on simple primitives such as addition and subtraction. Floating point numbers can be replaced by records with exponent and mantissa fields.

All record variables can be replaced by groups of simple variables, one to hold each field of the record. All arrays of records can be replaced by groups of arrays, one for each field of the records that were elements of the original array. In sum, we can undo all of the sophisticated software engineering that was applied to the original program, mechanically producing a program in the same language that is utterly unreadable and unmaintainable.

We take for granted here that the transformations needed to eliminate use of redundant programming language mechanisms have been done, so that the program implementing our algorithm uses only the following limited set of features:

Simple variables (of any precision)
Arrays of simple variables
Assignment statements
Addition, subtraction, logical and shift operations
If statements
While loops (test before loop body)
Until loops (test after loop body)

We will also assume that no array appears twice in the same statement; this may require adding temporary variables and extra assignments, but it eliminates the need for multiport RAM arrays and for edge-triggered RAM updates.

In the following, we will show, for each of these, how it is reduced to hardware. There are three basic stages to a constructive proof of our the4rem:

All variables can be reduced to registers and RAM arrays.
All expressions can be reduced to register-transfer logic and all assignment statements can be reduced to multiplexor-mediated connections to the inputs of registers and RAM arrays.
The control structure of a program can be reduced to a finite control unit to control the register transfer hardware produced in the first two steps above.

Proof, Registers and Memories

Registers

A constructive proof of this theorem begins with the simple observation that each simple variable in the program can be realized by a register and a multiplexor. The number of bits in the register must be sufficient to represent all possible values of that variable, and the multiplexor will have one input for each distinct assignment statement that might assign a value to the variable.

	int I;

        Isel  _|_|_|_ 
          ----\_____/ IselMUX
                 |
	      ___|___
	Ick  |       |
	  ---|>  I   |
	     |_______|
                 |
                 |

Here, we've adopted the convention that Ick is the name of the clock signal to register I. This receives a clock pulse whenever we assign to I. Similarly, Isel is the name of the multiplexor input that determines what value is assigned to the variable, and we call this multiplexor IselMUX.

Arrays

The proof continues by observing that each array can be realized by a RAM and two multiplexors. The number of bits in each word of the memory is set by the type of the array elements. The number of entries is the array dimension. One multiplexor is used to select between the different possible sources of values assigned to the array, while the other is used to select between the different possible values used as subscripts. Thus, we rewrite

	int A[x];

        Asub  _|_|_|_ 
          ----\_____/
        Asel     |AsubMUX _|_|_|_ 
          -------|--------\_____/
                 |           |AselMUX
                 |        ___|___
	         |       |       |
	          -----/-|addr   |
	                 |       |
	                 |  RAM  |
	Ack              |       |
	  ---------------|>  A   |
	                 |_______|
                             |
                             |

Here, again, we use the convention that Ack is the clock for the RAM array A; it receives a pulse whenever we assign to A. The address line receives the array subscript, as selected by the Asub control input, and the data input is controlled by Asel. The multiplexors are named, respectively AsubMUX and AselMUX.

Proof, Expressions and Assignment

Expressions

Every operator in an expression corresponds to a combinational register-transfer level operator, and the variables and arrays referenced in an expression correspond to the outputs of the registers and RAM arrays. Thus, wherever this expression occurs in the program

	(b + c) - d

It can be reduced to this hardware, using a single copy of the hardware for all instances of the expression:

	registers |   B   | |   C   | |   D   |
	or arrays |_______| |_______| |_______| 
                      |         |         |
	       -------o---------|---------|-----  to the other
	      |        ---------o---------|-----  users of these
	      |       |    ---------------o-----  register values
	     _|_______|_  |
	    |           | |
	    |     +     | |
	    |___________| |
	         _|_______|_
	        |           |
	        |     -     |
	        |___________|
                      |
                (B + C) - D

Note that, at this point, it does not matter whether the terms of the expression are simple variables or array references!

Assignment Statements

Locate all assignments in the program that assign to the same destination variable or the same destination array (regardless of the subscript). Gather them together, for example, as follows:

	a = i + j;
        a = i;
        a = j - i;

This implies that the data selection multiplexors for the register A, AselMUX, has 3 inputs, where these inputs are the register transfer subsystems that evaluate the expressions on the right hand side of the assignments:

	     |f = i + j|  |  f = i  |  |f = c - b|
             |____f____|  |____f____|  |____f____|
                  |            |            |
                --o------------|------------|---  to other users
	       |  -------------o------------|---  of the values of
	       | |  ------------------------o---  these expressions
        Asel  _|_|_|_ 
          ----\_____/ AselMUX
                 |
	      ___|___
	     |       |
                 A

This applies equally if the assignment is to an array or to a simple variable. Note that this allows us to label the inputs of the AselMUX multiplexor by the expressions selected. The actual binary encodings used for the multiplexor control input Asel are arbitrary and can be assigned later. It is useful to go back to the list of assignment statements and document the control signal values needed to make that assignment occur:

	a = i + j;   Asel="i+j"
        a = i;       Asel="i"
        a = j - i;   Asel="j-i"

Array Subscripts

Locate all expressions used as array subscripts in the program, regardless of whether the subscript is used in an expression or on the left-hand side of an assignment statement. Gather them together, for example, as follows:

	a[i + j]
        a[i]
        a[j - i]

This implies that the address selection multiplexors for the RAM array A, AsubMUX, has 3 inputs, where these inputs are the register transfer subsystems that evaluate the expressions used as subscripts:

	     |a = i + j|  |  a = i  |  |a = c - b|
             |____a____|  |____a____|  |____a____|
                  |            |            |
                --o------------|------------|---  to other users
	       |  -------------o------------|---  of the values of
	       | |  ------------------------o---  these expressions
        Asub  _|_|_|_ 
          ----\_____/ AsubMUX
                 |
                 |        ___|___
	         |       |       |
	          -----/-|addr   |
	                 |       |
	                 |  RAM  |
	                 |   A   |

Looking back at the list of all statements, we can now pin down all of the details needed to control the execution of some very complex code:

	a[i] = b[j] + k;  Asel="b[]+k", Asub="i"  , Bsub="j" 
	a[b[i]] = j;      Asel="j"    , Asub="b[]", Bsub="i"
	if (b[i+1])       Bsub="i+1"

Other Expressions

Because of the restrictions we have placed on the language, the only expressions that remain are Boolean expressions used in the control structures of the program. Each of these one-bit signals is available for use as an input to the control unit.

The boolean expressions tested by if, while and until statements can be evaluated using AND, OR and NOT gates based on the component terms of the expressions. Where 2-valued variables (boolean or integer) are involved, these may be directly incorporated into the logic of the boolean expression.

Where a variable must be tested for zero, or where the sign bit or any other bits of the variable is tested for particular values, we will rewrite, for example:

	A == 0

	     |   A   |
	     |_______|
                 |
                 /
              ___|___
	     |       |
	     |  =0?  |
	     |_______|
        A=0      |
           <-----

Here, we show the boolean variable exiting to the left in order to conform to the convention that the control unit is off to the left side of the register transfer components. Of course, to test a variable for zero, we just ask if all the bits are zero, requiring one and gate with inverted bits for all the inputs. To test for a variable equal to a specific constant, we use the same and gate and invert only those inputs that correspond to zeros in the constant, leaving the other bits un-inverted. Testing for a variable less than zero just involves bringing out the sign bit. Testing two variables for equality involves exclusive oring the bits to see which differ, and then combining the difference signals for each bit to get an indication of whether all the bits are equal.

Proof, From Control Structure to a Finite State Control Unit

Overview

The control unit implements the control structure of the program by generating all clock and multiplexor control signals in the appropriate order, responding to the boolean signals from logic when needed because of if or while statements in the program.

At the top level, the naive digital system that realizes a program will have the following form:

	 _________               _________
	|         | booleans    |         |
	| finite  |<------------|         |
	|  state  |             | register|
	| control | mux control | transfer|
	|   unit  |------------>|  system |
	|         | clocks      |         |
	|         |------------>|         |
	|_________|             |_________|

In the above, the booleans include one wire carrying the result of each boolean test required in the program, the mux control lines include inputs to all the multiplexors used to select RAM addresses and register inputs, and the clocks include the clock lines to each register.

The Control Unit as a Finite State Machine

Our goal is to build a control unit that evokes the register transfers indicated by the program, and to do so in the sequence required. The fact that we are building a digital system with a sequencing constraint suggests immediately that the solution will take the form of a sequential circuit, and this, in turn, suggests modelling the circuit as a finite state machine. This suggestion is exactly the path we will follow!

The naive, mechanical reduction of a program to a digital system produces a register transfer subsystem that allows for all assignments required to compute that program, and it includes outputs corresponding to every Boolean expression in that program, but it contains nothing related to the flow of control in that program. The control unit, on the other hand, will ignore all issues of what is assigned to what, and instead, it will contain only control information. Specifically:

The finite state machine will have one state per assignment statement. When in that state, exactly that assignment statement and no other will be executed. If the flowchart of the program has a flow arrow from assignment statement A directly to assignment statement B, the finite state machine will allow a transition from state A to state B, and this transition will be taken regardless of the inputs to the machine.

Where the program contains a conditional test, where statement A has multiple possible successors, depending on some set of boolean tests. State A will have multiple successors that depend on inputs to the machine.

Following the above rules leads to a moore machine, so the formal specification of our control unit will look like:

	 booleans  current |  next 
	            state  | state
        -------------------|-------  state |   mux   register
                           |               | control  clock
                           |        -------|------------------
                           |               |
                           |               |
                           |               |

Of course, a Mealy specification can also be realized.

Assuming all registers in the digital system are either positive pulse triggered or are triggered on the negative clock edge, we can use the following generic moore control unit to control any register transfer system:

                       -----<---/-----------
                      |                   booleans from
	   ---------o-|-----------        register transfer
	  |     ____|_|____   ____|____   subsystem
	  |    |           | |         |
	  |    | next      | |         |
	  |    |   state   | | output  |
	  |    |     ROM   | |     ROM |   _
	  /    |___________| |_________|  |  Mux selects
          |          |            |       |-/--->---
          |     _____|_____        ---/---|     ___  Register
          |    |   _       |              |-/--|   \ clocks
          |   -|>_|  state |              |_   |    )-/-
          |  | |___________|                  -|___/ _
          |  |       |                       |        |_
           --|-------                        |
        clock|                               |
        -----o-------------------------------

This control unit will change state on each positive edge of the system clock; the multiplexor select inputs will immediately stabilize, so that, by the time the negative clock edge comes, the inputs to the registers are stable. The booleans required for the next cycle then have 1/2 clock cycle to stabilize before the next state change.

QED!

Microprogramming

Control units for complex systems must frequently interpret many signals from the register transfer unit; if there are N states in the control unit and M booleans returned from the register transfer unit, there will be N2^M rows in the truth table for the next state function. The result can get very unwieldy very quickly!

To simplify the design, it is common to add a multiplexor to the system that is used to select which of the booleans from the register transfer system are presented to the finite state control unit as inputs.

                                 __________
                        /|------|          |
                       | |------|Boolean   |
              ----/----| |------|Outputs   |
         ____|____     | |------|          |
        |         |     \|------|          |
        | Control |     |       |          |
        |         |     |       | Register |
        |   Unit  |     |       | Transfer |
        |_________|     |       |  System  |
          |  |  |       |       |          |
          |  |   ---/---        |          |
          |  |                  |Mux       |
          |   --------/---------|Controls  |
          |                     |          |
           -----------/---------|Clocks    |
                                |__________|

This scheme limits the number of states that may follow each state, so, for example, we might only allow 2 or 4 successors to each state in the control unit, and this has a cost! If an optimal design would have had a 4 way branch when we only allow 2 successors to each state, we must introduce a tree of states used only for branching.

This is frequently worthwhile despite this small cost. If we use a mealy control unit, the state table for the resulting finite state control unit will contain the following items in each row:

               _______ __________ __________ _____________
          N   |_______|__________|__________|_____________|
              | next  | register |   mux    | conditional |
         Row    state    clocks    controls     select

We can think of this as a machine instruction format, where the next-state field is concatenated with the boolean conditions we have selected to compute the next instruction to be executed. It is therefore common to refer to each row of the control unit truth table as a microinstruction.

Thus, designing the control unit becomes a task akin to machine language programming, and the control unit itself is reduced to a very simple system consisting of a state register and a ROM. What we have developed here is called a horizontal microcode! Horizontal because the words are very long, able to evoke many different register transfers in parallel.

If we further restrict the code so that only one register transfer may be evoked per microinstruciton, the word size of the microcode is far shorter, and we refer to it as a vertical microcode because it may take a large number of short microinstructions to do something that could have been done in one horizontal microinstruction.

As with any machine language programming task, we can simplify our job by writing an assembler or even a compiler, although few microprogrammed systems are of sufficient complexity to justify development of compilers. In many cases, off-the-shelf assemblers can be used to assemble the microcode, with only a small macro package required to map from the assembly language to something allowing the assembler output to be reduced to the binary content of the microstore.