7. Reduction to Microcode, An Example
Part of
the 22C:122/55:132 Lecture Notes for Spring 2004

In general, given a set of registertransfer components that correspond to the operators of an imperitive programming languge, we can reduce any program in that language to a registertransfer system and a finitestate control unit. We do this as follows:
The net result of applying this second part of the algorithm is the state table for a (Moore) finitestate machine to drive the register transfer logic developed in the first part of this.
There is one little detail needed to reduce this to actual hardware: We need to make sure that the register transfers occur at the right times. Typically, this means something like the following:
___________ _______________ status  ____________ outputs ______________     Register clock o> Finite State   Transfer   Control Unit   Logic  ________________   ___  _________ select   and  ___________ inputs gates  _____   ________________ clock __________________ inputs ___________
This bit of logic guarantees that the clock inputs to the register transfer logic all go to zero between successive clock pulses, even if the same register is assigned to by successive states of the control unit. This is extremely safe if the control unit changes state on the positive going clock edge and the registers in the registertransfer half of the system are negative edge triggered. Faster performance is possible if all register transfers happen on the same clock edge as the state changes; because we used and gates above, this is best done by having everything happen on the falling edge of the clock, so the and gates cut off any changes in the clock inputs to the register transfer logic before the output of the control unit begins to wobble through its state changes.
For the sake of example, consider Euclid's algorithm to compute the greatest common divisor of two numbers. This can be expressed algorithmically as follows:
unsigned int gcd( unsigned int x, unsigned int y ) { if (x == y) { return x; } else if (x > y) { return gcd( xy, y ); } else /* x < y */ { return gcd( x, yx ); } }
The above algorithm is expressed recursively, so our first job is to remove the recursion:
unsigned int gcd( unsigned int x, unsigned int y ) { while (x != y) { if (x > y) { x = x  y; } else /* x < y */ { y = y  x; } } return x; }
The next step is to remove the function calls from the program; in this case, we have no main program calling for the greatest common divisor, instead, we have the specification, invented arbitrarily for this assignment, that our machine should compute the greatest common divisor of its two inputs whenever the input ready signal is asserted, and our machine should output result valid whenever the output of the machine holds the result.
/* these variables stand in for external inputs */ unsigned int input_a; input_b; BOOLEAN input_valid; /* these are the variables of our machine */ unsigned int x, y; BOOLEAN result_valid; /* the contents of x and y will be equal when result_valid is true */ /* either x or y can be sampled as the output when this holds */ main() { for (;;) { /* an infinite loop */ do { (void); while (input_ready == FALSE); result_valid = FALSE; x = input_a; y = input_b; while (x != y) { if (x > y) { x = x  y; } else /* x < y */ { y = y  x; } } result_valid = TRUE; } }
We have written the above with some extra variables added to the declarations up front, standing in for outputs from the outside world, and with mere comments stating which internal variables are visible to the outside world to hold outputs. We had to do this because the programming language view of how input and output are done differs in serious ways from the way mechanisms actually accomplish this.
Having reduced our algorithm to the final form shown above, we can now begin reducing this to registertransfer logic. We begin by extracting all the assignment statements from the code and grouping them by the name of the variable assigned to:
result_valid = FALSE; result_valid = TRUE; x = input_a; x = x  y; y = input_b; y = y  x;
Next, for each variable, we use a register, and we use a multiplexor to select which assignment is performed, with appropriate registertransfer functional units substituted for the operators in the expressions. For the resultvalid Boolean variable, this gives the following:
FALSE TRUE _______ F \ 0 1 / resultvalid \_____/ ______________ C > result valid  resultvalid _______________   resultvalid output
This example is constructed, deliberately, without any thoughtful optimization! Given the encoding of FALSE as zero and TRUE as one, it is trivial to optimize this as:
F  resultvalid  ______________ C > result valid  resultvalid _______________   resultvalid output
We can apply similar reductions to each of the other registers, so for the X register, we get:
input a ___________ value of y  _________  _____    x y     xy    _______   _____  _______  F \ 0 1 /  x \_____/  ______________  C > x   x _______________    o  gcd output
The diagram for the Y register is very similar, but this raises a question if we try to combine all of these: How do we make an organized drawing of the entire system? Here is one suggestion for the overall structure of the drawing of such a digital system:
______________________________________ ____________  inputs from the outside world   ______________________________________   distibution network for feedback    from internal registers  feedback  ______________________________________ data   combinational registertransfer  paths   elements   ______________________________________   multiplexors   ______________________________________   registers   __________________________________________________
Following this convention gives us the following diagram for the entire registertransfer half of the system:
input a input b    o x        o y   _____  _____     x y    y x      xy    yx     _______  _______    _____  _____   _______ _______   F \ 0 1 / F \ 0 1 / F    x \_____/ y \_____/ rv    ________ ________ ________   C > x  C > y  C > rv    x _________ y _________ rv _________         o  o    gcd alternate result valid output gcd output output
At this point, we are more than halfway there! We know, at this point, that the outputs from the finite state control unit will be no more and no less than F_{x}, C_{x}, F_{y}, C_{y}, F_{rv}, and C_{rv}. For compactness, we have renamed the resultvalid flipflop (and its control signals) with the abbreviated name rv.
What we are missing are the inputs to the control unit! To locate these, we extract, from the final textual version of our program, all of the expressions evaluated in the control structures of the program. These were;
(input_ready == FALSE) (x != y) (x > y)
Inputready is trivial, since this Boolean input to the system is simply passed directly to the control unit, unchanged. The other two are more interesting. We could add special combinational logic to compute these directly from the distribution network for the values of internal registers, as follows:
input a input b   ooo x         ooo y  _____ _____  _____  _____    x y   x y    x y    y x     x>y   x=y    xy    yx    _______ _______  _______  _______           <  x>y   < x=y
This, however, is completely unnecessary, because we are already computing both the difference xy and the difference yx; between these two differences, we can completely determine equality and the relative magnitudes of the numbers.
If we were dealing with signed numbers, we might be tempted to use the signs of the differences for this comparison. This would fail in the event of an overflow. Instead, we will take advantage of the fact that the subtraction xy is generally done using two's complement additioin, x+~y+1; when this is done, the carry out of the adder means "no borrow output from most significant bit", which, in turn, means x>y. Therefore, taking both the differences xy and yx gives us signals indicating both x>y and y>x. We can get all the possible comparisons of x and y from simple functions of these two carry signals.
input input a input b ready     o x          o y    _____  _____      x y    y x       xy    yx    IR <  _c_____  _c_____   /         x>y <O< o    \         ___                 x=y <and        ___     _____  _____   _______ _______   F \ 0 1 / F \ 0 1 / F    x \_____/ y \_____/ rv    ________ ________ ________   C > x  C > y  C > rv    x _________ y _________ rv _________         o  o    gcd alternate result valid
The control unit can now be described! We know that the outputs of the control unit are F_{x}, C_{x}, F_{y}, C_{y}, F_{rv}, and C_{rv}, and we know that the inputs are IR, x>y and x=y. The truth tables for the control unit will therefore have the following structure
this  next this  F C F C F C state IR x>y x=y  state state  x x y y rv rv  
All we need to do is fill in these tables. To do so, we examine the algorithm again, grouping all assignment statements into groups that can be carried out in parallel using the data paths outlined at the end of the previous seciton. These groups are identified by the labels in the following restatement of the algorithm:
main() { for (;;) { /* an infinite loop */ do { A: (void); while (input_ready == FALSE); B1: result_valid = FALSE; B2: x = input_a; B3: y = input_b; while (x != y) { if (x > y) { C: x = x  y; } else /* x < y */ { D: y = y  x; } } E: result_valid = TRUE; } }
Each letter used above corresponds to one state of the finite state control unit. In state A no assignments take place; in state B, we have three parallel assignments, while in states C, D and E, there is one assignment for each. We can formalize this in the table that relates the outputs of our finite state control unit to the current state:
this  F C F C F C state  x x y y rv rv  A   0  0  0 B  0 1 0 1 0 1 C  1 1  0  0 D   0 1 1  0 E   0  0 1 1
In the above truth table, we have not provided binary encodings for the state names, and we have used dashes in the output to indicate don't care values. As a general rule, if a register is not clocked during some clockcycle of the finite state control unit, the multiplexor controls and function select controls that govern the data inputs to that register do not matter. Designating don't care values is important when an effort is made to optimize the implementation of a boolean function, so it is polite to indicate it if the door to optimization is to remain open.
The second state table we must fill out follows directly from the control structure for our algorithm; we can express this as a picture, as follows:
______ / \x>y \ / >(C) _____ir=0 / / \ x=y \ / \ /x>y \___ /y>x \ \ / / \ / \ >(A)>(B) X >(E) / ir=1 \ ___/ \ / / \ /  \y>x / \x>y / / \ \  \ \ / x=y / / / \ x=y >(D) / / \  / \ / / \ \ \______/y>x / / \ \___________________________/ / \_______________________________________________/
There is nothing particularly pretty about this state diagram! The inner loop of the original code has been thoroughly obscured by the fact that we only have states for each block of assignment statements, and not for control structures such as the while loop and the if statement. Without these extra states, states B, C and D all test for loop termination,m and states C and D are connected in what might be described as a figureeight loop. This state diagram reduces directly to the following truthtable for the nextstate function:
this  next state IR x>y x=y  state  A 0    A A 1    B B  1   C B  0 0  D B  0 1  E C  1   C C  0 0  D C  0 1  E D  1   C D  0 0  D D  0 1  E E     A
Again, we have used dashes to indicate don't care values on the inputs. When mechanically reducing a truth table where there are don't care inputs, neither that input nor the inverted value of that input needs connection to the row decoder for that row of the input. Digital logic optimization is largely concerned with algorithms for combining rows of the truth table by identifying the rows for which the inputs are don't care values.
We can combine the above two truth tables into one table without changing any of the information expressed. If we do it as follows, we make clear that this table still describes a Moore machine:
this  next F C F C F C state IR x>y x=y  state x x y y rv rv  A 0    A  0  0  0 A 1    B  B  1   C 0 1 0 1 0 1 B  0 0  D B  0 1  E  C  1   C 1 1  0  0 C  0 0  D C  0 1  E  D  1   C  0 1 1  0 D  0 0  D D  0 1  E  E     A  0  0 1 1
In the above truth table, all rows that involve transitions from one state have been grouped into blocks, and for each block, only one set of outputs is given.
We can assign arbitrary Boolean encodings for the state names used here! Sometimes, the optimization of the logic for the finitestate control unit is simplified by careful encoding of the state names, but since we are largely ignoring such issues in this class, we will not bother with this.
The important thing to note is that, aside from questions of optimization, the reduction of an algorithm to a digital system that executes that algorithm is a mechanical process. Most of the information you need to evaluate the performance of that digital system is actually apparent in the final version of the source code for the algorithm, particularly if it is annotated to show the statements that can be executed in parallel.
Filling in the contents of this state table has a strong resemblance to programming! It is, in fact, quite reasonable to view the part of the truth table dealing with each state as a kind of machine instruction, where nextstate fields of the table give the control structure part of the behavior of that instruction, while the control unit outputs to the data part of the system give the computational part of the instruction.
If we think in these terms, the registertransfer logic of the digital system can be thought of as defining the instruction set of the control unit, and then the actual job of working out the control unit for that control unit can be thought of as programming in that instruction set. When the register transfer logic is designed with the intent of allowing general purpose computation, as opposed to designing it to meet the needs of one particular algorithm, we refer to the entire system as a microprogrammable system, and we refer to the contents of the state table of the control unit as the microprogram (µprogram, for short). This terminology is particularly applicable when the state table for the control unit is stored in a dedicated fast RAM or in some kind of programmable ROM (PROM, generally; for example, EEPROM or Flash EEPROM), so that the particular algorithm to be executed by the digital system can be changed after the system has been manufactured.