4. Finite State Automata, A Breakneck Review

Part of the 22C:122/55:132 Lecture Notes for Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

A Breakneck Review of Finite State Automata

When a digital system contains feedback, the result is a sequential circuit. Fundamental mode sequential circuit design, in which the entire system is viewed in these terms, is very difficult and well will avoid discussing it in this course! A fundamental mode sequential circuit designer thinks in the following terms:

                          _______________
               ----------|               |
	inputs ----------| combinational |----------- outputs
	       ----------|   circuitry   |------
	           ------|               |---   |
	          |   ---|_______________|   |  |
	          |  |                       |  |
	          |   -----------------------   |
	           -----------------------------

The RS flipflop presented in the previous lecture was analyzed as a fundamental mode design. Fundamental mode design is used to construct optimal implementations of type JK or type D edge triggered flipflops, but most digital system designers avoid it.

Most work in digital systems is done with clocked sequential circuits. This is a slightly higher higher approach to sequential circuits. It is only applicable to circuits that change their observable state when a clock edge appears on one of their inputs (either 0 to 1 for a leading-edge triggered clocked sequential circuit, or 1 to 0 for a trailing-edge triggered clocked sequential circuit).

Clocked mode circuits are generally built using an edge-triggered register and a combinational circuit, as illustrated in the following:

                          _______________
               ----------|               |
	inputs ----------| combinational |----------- outputs
	       ----------|   circuitry   |------
	           ------|               |---   |
	          |   ---|_______________|   |  |
	          |  |                       |  |
	          |  |                     __|__|__  state
	          |  |    clock ----------|>_______| register
	          |  |                       |  |
	          |  |                       |  |
	          |   -----------------------   |
	           -----------------------------

Here, the feedback from the output of the combinational circuitry to the input is stored in a state register. This register is made of one edge-triggered flipflop per bit of feedback, and all the flipflops share the same clock.

The design and analysis of a clocked mode sequential circuit is based on finite state automata theory, so we must review this before we continue with computer architecture.

State Diagrams and State Tables

The behavior of a sequential circuit can be described by a state diagram. In such a diagram, a circle represents the state of the machine, and an arc represents the change of state from one state to the next. Thus, we might diagram the behavior of a very simple system as follows:

		 --->---
               /         \
             (A)         (B)
               \         / 
                 ---<---

This system has 2 states, A and B. Whenever a clock pulse arrives, it changes state, so on successive clock pulses, it alternates between states A and B. Note that the arcs on the state diagram must be annotated with direction arrows!

The most trivial thing you can do when you examine a state diagram is count the states. If a state diagram has n states, the implementation of the state diagram will require ciel(log2(n)) flipflops. Thus, our example with 2 states requires one flipflop.

Finite State Machines with Inputs

The above example was for an automata with no inputs. More typically, we are interested in machines that respond to their inputs. In general, if a machine has k binary inputs, these may take on 2k distinct values. As a result, each state may have as many as 2k distinct successor states depending on the specific value of the inputs at the time of the state transition!

The following figure illustrates a 2-state finite state machine with an input that may have 2 values, either x or y:

           -<--     --->---     ----
         /    x \ / y       \ /      \
        (       (A)         (B)       )
         \      / \       y / \ x    /
           ----     ---<---     -->-

This system has 2 states. Whenever a clock pulse arrives, if the input is y, the state will change, but if the input is x, the state will remain unchanged. We can also describe this machine with a state table:

            input  current | next
                    state  | state
          -----------------|--------
              x       A    |   A
              y       A    |   B
              x       B    |   B
              y       B    |   A

Finite State Machines with Outputs

There are two ways of dealing with the outputs of a finite state machine. One approach is to associate the output with the state. In its simplest form, the output of this type of machine is its state! Moore was the first to clearly define this type of finite state automata, so we describe such machines as Moore machines.

An alternative is to associate the output with both the input and the current state. Mealy was the first to clearly define this alternative, so we describe such machines as Mealy machines. The following figure and state table define the same Mealy machine.

           -<--     --->---     ----
         /  x/a \ / y/b     \ /      \
        (       (A)         (B)       )
         \      / \     y/d / \ x/c  /
           ----     ---<---     -->-
						A Mealy Machine
            input  current | next   output
                    state  | state
          -----------------|---------------
              x       A    |   A      a
              y       A    |   B      b
              x       B    |   B      c
              y       B    |   A      d
In the state diagram for a Mealy machine, each arc (state transition) is labeled with the input that enables that transition, a slash, and the output that is associated with that transition. The output is always after or under the slash!

It is a theorem of finite state automata theory that any Moore machine can be converted to a Mealy machine, and visa versa. Clearly, if a Moore machine has outputs that depend only on the state, and the machine has n distinct output values, it must have n states! As a result, translation from Mealy to Moore frequently results in adding states! The above machine is equivalent to the following Moore machine:

           -<--        --->---        -->-
         /   x  \    /   y     \    / x    \
        (       (A1/a)   ---<--(B1/b)       )
         \      /       /    y /           / 
           ->--        /      /       --<-   
         /            / y    /      /      \ 
        (       (A2/d)-->---   (B2/c)       )
         \   x  /    \    y    /    \ x    /
           -<--        ---<---        -->-
						A Moore Machine
            input  current | next
                    state  | state
          -----------------|-------
              x       A1   |   A1
              y       A1   |   B1     state | output
              x       A2   |   A1    -------|--------
              y       A2   |   B1       A1  |   a
              x       B1   |   B2       A2  |   d
              y       B1   |   A2       B1  |   b
              x       B2   |   B2       B2  |   c
              y       B2   |   A2

Note that the description of a Moore machine frequently has two tables, one for the state transitions, and one giving the output associated with each state. In the above, the states that were split have retained their old names, but gained a suffix to distinguish the two Moore states resulting from each Mealy state. The output associated with each state in a Moore machine is written inside the circle for that state, separated from the state name by a slash in the same way that the output of a Mealy machine was separated from the input by a slash.

Binary Encodings

The above description of Moore and Mealy machines used symbolic names for inputs and outputs. Before converting a finite state machine specification to a digital system, we need to assign encodings to the inputs, outputs and states. Sometimes these encodings will be imposed by the context, but sometimes, we can be arbitrary. For example, we could code the inputs and outputs used in the above examples as:

	  input | code       output | code
	 -------|------     --------|------
	    x   |   0           a   |  00  
	    y   |   1           b   |  01  
	                        c   |  10  
	                        d   |  11  
Similarly, we can encode state names:
	  Mealy |            Moore |     
	  state | code       state | code
	 -------|------     -------|------
	    A   |   0          A1  |  00  
	    B   |   1          A2  |  01  
	                       B1  |  10  
	                       B2  |  11  
Having made these state assignments, we can convert the state tables and the output table previously given to give the following:
	                 Mealy

            input  current | next   output
                    state  | state
          -----------------|---------------
              0       0    |   0      00
              1       0    |   1      01
              0       1    |   1      10
              1       1    |   0      11
	

                         Moore

            input  current | next
                    state  | state
          -----------------|-------
              0       00   |   00
              1       00   |   10     state | output
              0       01   |   00    -------|--------
              1       01   |   10       00  |   00
              0       10   |   11       01  |   11
              1       10   |   01       10  |   01
              0       11   |   11       11  |   10
              1       11   |   01

The assignments of binary codes in the above example was entirely arbitrary, and this is not always a good idea. For example, the the output table in the Moore machine maps 4 states to 4 outputs; if the state names had been encoded directly in terms of the expected output values, or alternatively, if the output values had been encoded to match the state encoding, the need for hardware to compute the state to output mapping could have been completely eliminated. Sometimes, it pays to tinker with these encodings before committing to one or the other of them.

Reduction of a Finite State Specificaiton to Hardware

Given the encoded state table for a Moore or Mealy machine, we can reduce these to a functional digital systems as follows:

First, treat the state table (and output table, if it's a Moore machine) as the specifications for Boolean functions. Build hardware to implement these functions following the outline given in the notes for the previous lecture.

Then, substitute the hardware you built into one of the following outlines:

                             _______________         ___
                  ----------|               |-------|   |---
           inputs ----------|     Mealy     |-------|   |--- outputs
                  ----------|   next-state  |-------|   |---
                            |   and output  | next  |   |
                            |    function   | state\|   |
           current    ------|               |-------|   |----- 
            state    |   ---|_______________|-------|   |--   |
                     |  |                           |   |  |  |
                     |  |         clock ------------|>__|  |  |
                     |  |                                  |  |
                     |   ----------------------------------   |
                      ----------------------------------------
    
                        ____________  next       ________
                  -----|            | state     |        |
           inputs -----|   Moore    | /___      | Moore  |---
                  -----| next-state |-|   |---o-| output |--- outputs
                       |  function  |-|   |-o-|-|function|---
           current  ---|            | |   | | | |________|
            state  |  -|____________| |   | | |
                   | |                |   | | |
                   | |        clock --|>__| | |
                   | |                      | |
                   |  ----------------------  |
                    --------------------------
In comparing the Moore and Mealy machines, note that the number of bits in the next-state output will usually be more for a Moore machine than for an equivalent Mealy machine, but that, where the Moore machine only needs the state register to store the current state, the Mealy machine needs to store bothe the current state and the current output.

This requirement of a register to hold the current output is imposed on Mealy machines to prevent the output from changing except when the clock edge arrives indicating that the machine should advance to the next state. If this were not done, the output of the Mealy machine could change whenever any input to the machine changes, and this is generally undesirable.

Of course, if the state assignment job is done carefully, the Moore machine may have a trivial output function, where the state itself encodes the desired output. The examples given above were deliberately clumsy in this regard!

Here is the Mealy machine we get from the state table given:

input _____________________
        ___________        |
       |           |       |
       |           o---    o---
       |           |  _|_  |  _|_
       |           |  \ /  |  \ /
       |           |   O   |   O   __
       |           |   |   |   |  |  \     |       |       |   
       |         --o---|---|---o--|   )O---o-------|-------o---
       |           |1  |   |0  |  |__/     |1      |0      |1  
       |           |   |   |   |   __      |       |       |   
       |           |   |   |   |  |  \     |       |       |   
       |         --|---o---o---|--|   )O---o-------o-------|---
       |           |0  |   |1  |  |__/     |1      |1      |0
       |           |   |   |   |   __      |       |       |   
       |           |   |   |   |  |  \     |       |       |   
       |         --o---|---o---|--|   )O---|-------o-------o---
       |           |1  |   |1  |  |__/     |0      |1      |1  
       |                                 __|__   __|__   __|__
       |                                |     | |     | |     |
       |                                 \___/   \___/   \___/ 
       |                                   O       O       O
       |                                   |       |       |  _____
       |                                   |       |        -|D   Q|-- output
       |                                   |       |         |     |   lsb
       |                                   |       |    -----|>    |
       |                                   |       |   |     |_____|
       |                                   |       |   |      _____
       |                                   |        ---|-----|D   Q|-- output
       |                                   |           |     |     |   msb
       |                                   |           o-----|>    |
       |                                   |           |     |_____|
       |                                   |           |      _____
       |                                    -----------|-----|D   Q|--
       |                                               |     |     |  | state
       |                         clock ----------------o-----|>    |  |
       |                                                     |_____|  |
       |______________________________________________________________|

The above brute-force circuit can be optimized considerably!

Bit-Slice Design

For both sequential and combinational circuitry, it is always possible to use the design methods described above, but it is sometimes very complex to think in such terms. For example, consider the problem of designing a 16 bit binary counter that simply increments its output modulo 216 each time a clock pulse arrives.

It is easy to reduce this design to the following:

                         ---------           
	          ______|______    |
                 |      N      |   |
                 |             |   |
                 |  next state |   |
                 |   function  |   |
                 |             |   |
                 |_____N+1_____|   |
                        |          /16
                        /16        |
                  ______|______    |
	clock ---|>____________|   |
                state   |          |     output
               (16 bit)  ------/---o-----
                                16        
When drawing a circuit with many parallel data lines, it is common to draw one line and then annotate it saying this is one of many by putting a slash through the line with the number of lines written next to it. The slash can be thought of as a schematic symbol for a string tied around the bundle of wires.

The trouble with the above design is that the combinational function required has 16 inputs and 16 outputs. As a result, the truth table for this function has 216 rows, and writing it out takes an inordinate amount of time and paper!

Fortunately, this function has a very regular structure, so it is possible to simplify the design immensely by exploiting this structure. When we exploit the regularity in an n-bit function by dividing it into n sub-functions operating on one bit each, possibly with additional horizontal interconnections between these 1-bit sub-functions, we call the result a bit-slice realization of the target function.

In the case of the 16 bit increment function, the bit slice design requires sub-functions with the following structure:

	 slice 15  ...   slice 1   slice 0

            A      ...      A         A
             15              1         0
	     |    C         |    C    |     1
	     |  __ 15       |  __ 1   |  ___|
	   __|_|  |       __|_|  |  __|_|   
	  |  a c|        |  a c| | |  a c|   
	  |  +  |     C  |  +  | | |  +  |
	  |c_s__|      2 |c_s__| | |c_s__|
	 __| |         |__| |    |__| |
	     |              |         |
	    S      ...      S         S
             15              1         0

In this case, the function of the smaller units is obvious from an elementary school understanding of arithmetic. Each small box takes the carry in to one digit position in the number, adds that carry to the digit input in that position to compute the digit output, and also computes the carry out from that position to the next higher position. The carry in to the lowest digit of the number is one in order to force the number to increment with each evaluation of the entire function.

Here, we have numbered the digits of the input from A0, the least significant bit, to to A15, the most significant bit, and we have numbered the outputs from S0, to S15 correspondingly. The carry signals are similarly numbered, where Ci is the carry that will be added to Si, so the carry out of the entire circuit (ignored in this context) would be labeled C16.

The bit-slice design for the entire counter can be completed by adding one bit of the register to each of the bit-slices of the combinational function, giving the following structure for each bit-slice of the original counter:

               C
                i              one bit slice of
	      -|----------    next state function
	     | |          |
	   __|_|          |        in  +  out
	  |  a c|         |       a  c | c  s
	  |  +  |         |      ------|------
	  |c_s__|  ____   |       0  0 | 0  0
	   | |    |    |  |       0  1 | 0  1
	   |  ----|D  Q|--o       1  0 | 0  1
	   |      |    |  |       1  1 | 1  0
	---|------|>C  |  |
	   |      |____|  |
	   C              S
            i+1            i
The implementation of this counter stage can be optimized locally, but that isn't the point of this course! Of more importance, the speed of this counter is limited by the time taken by the combinatorial circuitry to propagate information from the Ci input to the Ci+1 output. This is a problem that has been addressed by the designers of digital systems since the mid 19th century. Charles Babbage, for example, put a considerable amount of effort into what he referred to as carry anticipation mechanisms in his later mechanical difference engines, and today, essentially all high performance computers rely on carry anticipation trees in their arithmetic logic units. We will look at this issue later.

The bit-slice approach to designing systems that operate on long words is quite common, and it has also been used to manufacture computers. Bit-slice computer systems were built in the late 1960's with one slice of the entire system on each on N circuit boards, for an N-bit computer. By the mid 1970's, bit-slice CPU and ALU chips were commonly available, sometimes constructed with four bit slices per chip. Nowdays, it is reasonable to use bit-slice design for many of the components found on VLSI implementations of computers.