13. Microprogramming

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Horizontal Microprograms

In its simplest form, the microprogram for a control unit is just the next-state/output table of a mealy machine that accomplishes the required control function. We interpret each row of this table as a microinstruction in the microprogramming language of this control unit. These terms were introduced by Wilkes and Stringer, in their 1953 paper Microprogramming and the design of the control circuits in an electronic digital computer (reprinted as Chapter 28 of Bell and Newell).

http://research.microsoft.com/~gbell/Computer_Structures__Readings_and_Examples/

There's no mandatory structure of such a state table, but if we consider the data-half of a register transfer system, for example:

              P           Q                      inputs
              |           |
            --|-----------|---------------       network to distribute
            --------------|---------------       inputs and register values
            --------------|---------------|---------------------------------
            -------|------|-----------|---|-------------------------------  |
            -------|------|--|--------|---|-----------------------------  | |
                 | |         | |      |     |                           | | |
  boolean     combinational logic for the various                       | | |
  outputs     expressions you need to evaluate                          | | |
  to control <--| |      |    |          |                              | | |
  unit       <----|------     |          |                              | | |
            ------|-----------|----------|----   network to distribute  | | |
            |-|---------|---|----------|------   values of expressions  | | |
          __|_|_|__   __|___|__       _|_|_|_|_                         | | |
          \_______/   \_______/  ...  \_______/  multiplexors           | | |
              |           |               |                             | | |
          ____|____   ____|____       ____|____                         | | |
         |____X____| |____Y____| ... |____W____| registers              | | |
              |           |               |_____________________________| | |
              |           |_______________|_______________________________| |
              |___________|_______________|_________________________________|
                          |               |
                          R               S      outputs

We find that there is a natural organization to the control inputs for this system, in which we group the multiplexor controls and clock signals for each register together:

          ____ ___ ____ ___ _   _ ____ ___ ____________
  state: |____|___|____|___|_   _|____|___|____________|
          Xmux Xck Ymux Yck  ...  Wmux Wck next state

If there is a combinational that has control inputs, where that combinational unit feeds only one register (a common case, for example, with an ALU that feeds only to an accumulator), then the control lines to that combinational unit should be placed immediately before the clock for that register; if it feeds to several closely related registers, the control lines can be listed with the controls for that group of registers. Such details are secondary to this discussion.

The control unit that supports this microprogram can be described as follows:

                   _
                  |________________ conditions tested in
       ___________|                 data part of system
      |           |____________ 
      |           |_           |
      |     ______________     |
      |    | microprogram |    |
      |    |      ROM     |    |
      |____|addr          |    |
           |              |    | next state
           |______________|    |
            ______|______      |
  clock ---|>____________|     | 
             |         |_______|      control signals to
             |_______________________ data part of system

This is called a horizontal microprogram because the micro instructions on such a system are very long. It is also a pure Mealy finite-state machine. The advantage of pure horizontal microprogramming is that the microprogram can exploit a maximal amount of parallelism in the data part of the system, but the price is a requirement for a very large ROM. The number of bits per word grows with the number of control signals to the data part, and the number of words doubles for each condition that must be tested from the data part.

Prior to 1970, large ROMs were extremely expensive -- the available technologies included discrete-component diode matrices and such odd things as braided wire memory and capacitive read-out punched card ROMs. This led people to look for more compact ways of encoding the microprogram. Essentially all of these sacrifice some potential parallelism and add slightly to the complexity of the control unit; in addition, they deviate from the pure Mealy formalism for finite state automata.

Reducing the number of words in the Control ROM

One way to reduce the size of the control ROM is to reduce the number of conditions coming out of the control unit. We can do this at the register transfer level by re-specifying the logic to reduce the number of tests, but this has limited prospects. There is another way:

Note that, if there are n condition test lines from the data part to the control unit, then each state has 2npossible successors in a pure Mealy machine. Looking at the logic of most microprograms, it is common to find that few microprogram instructions actually have that many distinct successors because, at most points in the microprogram, many, and perhaps most of the condition tests are don't care conditions.

If this is the case, why not use a field of the microinstruction to select which conditions are actually to be tested at each microinstruction? For example, consider the following:

          ____ ___ ____ ___ _   _ ____ ___ ____ ____________
  state: |____|___|____|___|_   _|____|___|____|____________|
          Xmux Xck Ymux Yck  ...  Wmux Wck test  next state

Given this, we may use the test field to control a multiplexor that selects among the different condition reports from the data part to determine the next state:

                   _               /|__
                  |_______________| |__ conditions tested in
       ___________|  tested cond. | |__ data part of system
      |           |____________    \|
      |           |_ next state|   |
      |     ______________     |   |
      |    | microprogram |    |   |
      |    |      ROM     |    |   |
      |____|addr          |    |   |
           |              |    |   |
           |______________|    |   |
            ______|______      |   |
  clock ---|>____________|     |   |
             |      |  |_______|   |
             |      |______________|     control signals to
             |__________________________ data part of system
If we require the tested condition to be a single bit, then each microinstruction has only two possible successors, but it will take many instructions in sequence to decode a multi-bit condition. If we allow the tested condition to be several bits, each microinstruction will have more successors, but this may still be far fewer than it would have had if we did not multiplex the conditions to be tested.

There is a performance penalty when we do this! We have added a multiplexor to the feedback cycle between the control unit and the data part of the system. This can require a longer microcycle time if this feedback path is the critical path that limits the system performance. Fortunately, this is not commonly the case.

This still leads to a large number of words in the microstore, and many microinstruction set designers have reduced this by putting multiple next-state fields in each microinstruction:

          ____ ___ ____ ___ _   _ ____ ___ ____ ____________ ____________
  state: |____|___|____|___|_   _|____|___|____|____________|____________|
          Xmux Xck Ymux Yck  ...  Wmux Wck test      A             B     
                                                       next states

Given this, we must now add another multiplexor to select from among the next state fields depending on the output of the condition test multiplexor:

       ________________________ 
      |              next state|    
      |     ______________    _|_     /|__  
      |    | microprogram |  /   \___| |__ conditions tested in
      |    |      ROM     | /_____\  | |__ data part of system
      |____|addr          |   | |     \|
           |              |   | |     |
           |______________|   | |     |
            ______|_______    | |     |
  clock ---|>_____________|   | |     |
             |      |  | |____| |     |
             |      |  |________|     |
             |      |_________________|    control signals to
             |____________________________ data part of system

It is noteworthy that, as we move to this approach, we have abandoned the Mealy formalism for finite state control units and adopted what is essentially a Moore formulation!

Here, we have doubled the number of multiplexor delays in the feedback path from control unit to register transfer logic back to control unit. There is a serious chance that this will force us to slow the clock rate for the whole system. If maximal performance is our goal, we might hesitate to take this step.

The need to encode two next-address fields in the microprogram is annoying, and we can easily incorporate an idea from CPU design, sequential execution, to eliminate the need for one of the next-state fields! Thus, we arrive at a microinstruction format that assumes sequential execution, with a next-state field that is only needed when there is a choice to be made:

          ____ ___ ____ ___ _   _ ____ ___ ____ ____________
  state: |____|___|____|___|_   _|____|___|____|____________|
          Xmux Xck Ymux Yck  ...  Wmux Wck test  branch addr

With this scheme, it is useful if one of the conditions that can be tested is always zero, so that the branch address field is entirely unused when simple sequential execution of a sequence of microinstructions is needed.

       ______________________________
      |                       |      |
      |     ______________    |     _|_     /|__ 0
      |    | microprogram |   |    /   \___| |__ conditions tested in
      |    |      ROM     |  _|_  /_0_1_\  | |__ data part of system
      |____|addr          | |+1 |   | |     \|
           |              | |___|   | |     |
           |______________|   |     | |     |
            ______|_______ ___|___  | |     |
  clock ---|>_____________|__µpc__| | |     |
             |      |  |      |_____| |     |
             |      |  |______________|     |
             |      |_______________________|    control signals to
             |__________________________________ data part of system

Having introduced sequential execution, it is worth noting that we now have something analogous to a program counter; we call this the microprogram counter, abbreviated µpc. We have not necessarily imposed any performance penalty in this final reduction, so long as the increment time is comparable to the ROM access time.

Reducing the Microinstruction Size

Now, we face a question, what should we do with the unused next-state bits, the branch address field of the microinstruction, in the case when microinstructions are sequential and these bits are unused. There are several things we can do with them! We can, for example, use these bits as a source of constants for input to the data part of the system. Or, if we are willing to sacrifice the ability to perform any register transfer on any microinstruction, we can use the branch address field to control additional register transfers. Suppose, for example, that we are willing to sacrifice the ability to load the W register in microinstructions that involve branches or that provide constants to the data part. This leads us to the following three microinstruciton formats:

          ____ ___ ____ ___ _   _ ___ ____ ____________
  state: |____|___|____|___|_   _|_0_|____|____________|
          Xmux Xck Ymux Yck  ...  Wck test  branch addr
          ____ ___ ____ ___ _   _ ___ ____ ____________
  state: |____|___|____|___|_   _|_0_|0000|____________|
          Xmux Xck Ymux Yck  ...  Wck test    const
          ____ ___ ____ ___ _   _ ___ ____ ____ _______
  state: |____|___|____|___|_   _|_1_|0000|____|///////|
          Xmux Xck Ymux Yck  ...  Wck test Wmux

Here, we can think of the the test field and Wck fields as combining to suggest the form of the microinstruction, since only when the test field is zero are the constant or Wmux fields defined, and only when Wck is one is the Wmux field defined. The following control unit structure realizes this instruction format:

       ______________________________
      |                       |      |
      |     ______________    |     _|_     /|__ 0
      |    | microprogram |   |    /   \___| |__ conditions tested in
      |    |      ROM     |  _|_  /_0_1_\  | |__ data part of system
      |____|addr          | |+1 |   | |     \|
           |              | |___|   | |     |
           |______________|   |     | |     |
            ______|_______ ___|___  | |     |
  clock ---|>_____________|__µpc__| | |     |
             |      |  |      |_____| |     |
             |      |  |______________|____ | __ constant to data part
             |      |_____ | _______________|
             |             |____________________ Wmux to data part
             |__________________________________ other to data part

This approach sends nonsense constants to the data part when there is a microprogram branch or when the W register is being controlled, so it is important to make sure that none of the multiplexors are set to values that cause the constant to be used. Similarly, it is important not to clock the W register when a constant is or branch address is being encoded.

Vertical Microcode

We can continue compacting the microcode if we are willing to eliminate even more parallel register transfers in the data part of our system. We might end up with a microprogram format like the following if we fold together all of the multiplexor control fields into one, so all multiplexors are set to the same setting at each register transfer. If we're very clever in designing the data part, we can still clock multiple registers at once.

          ______ ____ ____________
  state: |000000|____|____________|
          clocks test  branch addr
          ______ ____ ____________
  state: |______|0000|____|_______|
          clocks test  mux  const

Here, it is becoming very clear that the clocks field plus the test field resemble some kind of inefficiently encoded opcode field in an machine language. A control unit implemented on these lines will resemble the following.

       ______________________________
      |                       |      |
      |     ______________    |     _|_     /|__ 0
      |    | microprogram |   |    /   \___| |__ conditions tested in
      |    |      ROM     |  _|_  /_0_1_\  | |__ data part of system
      |____|addr          | |+1 |   | |     \|
           |              | |___|   | |     |
           |______________|   |     | |     |
            ______|_______ ___|___  | |     |
  clock ---|>_____________|__µpc__| | |     |
             |  |      |      |_____| |     |
             |  |      |______________|     |
             |  |    __|__                  |
             |  |   | | | |                 |
             |  |____ | | __________________|
             |        | |_______________________ constant to data part
             |        |_________________________ multiplexors to data part
             |__________________________________ clocks to data part

It is only a short step from this to a tightly encoded vertical microcode where the clock and test fields of the above proposal are combined into a single opcode field using, perhaps, a small fast ROM to decode the opcode into the appropriate clock and test fields.

The term vertical microcode refers to the fact that an operation we might have done with a single microinstruction in the horizontal microcode at the start of this presentation might now take a long column of instructions.

Note

This tour through the possibilities of microcoded control units has not been exhaustive! There are many other possibile compromizes between purely horizontal and vertical microcode arrangements!