Homework 10 Solved

22C:122/55:132, Spring 2001

Douglas W. Jones
  1. Background: A self-proclaimed computer expert asserts that, given two pipelined machines, the machine with more pipeline stages will usually be faster.

    A parallel argument about vector machines would hold that, given two vector machines, the machine with the _______ vector advantage would be faster.

    Part A: Complete and explain the analogy:

    The machine with the larger vector advantage would be faster. More pipeline stages implies more instructions being processed within the CPU in parallel; a larger vector advantage implies that more vector operands can be processed per instruction. In each case, the larger number implies more parallelism.

    Part B: Is the computer expert correct in his assertion? Explain why, or why not.

    The "expert" is wrong.

    Large vector advantages are not advantageous, because each vector instruction typically requires a moderately large number of scalar instructions to prepare base registers, strides, and other parameters. If these are relatively slow, they become the computational bottleneck. If they are fast, the vector advantage begins to disappear.

    Similarly, machines with many pipeline stages tend to require many operand delay slots and branch delay slots. Efficient use of such a machine requires elaborate interleaving of computationally independent instructions, and as the number of delay slots that must be filled grows, the complexity of the code generation problem quickly becomes unmanagable. As a result, the payoff for longer pipelines can be marginal or nonexistant.

  2. Problem: Do a top-level design for an operate stage that is able to handle not only simple one-cycle operations such as and, or, not, add and subtract, but also complex microcoded operations such as multiply and divide. What follows is a proposed data part for this pipeline stage. CU is the control unit. The basic idea is that, during normal pipelined operations, data flows through ALU between interstage registers, while in microcoded operation, some data is recirculated from the ALU output back to an ALU input, while other data (for example, the divisor or the multiplicand) remains constant and is held in a stalled interstage register. One of the output interstage registers is used to hold the recirculated data -- from the point of view of the later pipeline stages, the contents of this register are invalid data so long as the pipeline is stalled.
            stall   __|__   __|__    __|__ 
    	  ^    |>_IR_| |>_A__|  |>_B__|
              |       |       |        |     
              |    ---o       |  ------|-----
              |  _|_  |      _|_|_     |     |
              | |   |-|------\0_1/     |     |
              | | CU| |       _|_______|_    |
              o-|___|-|------| ALU shift |   |
              |       |      |___________|   |
              |     __|__        __|__       |      
              v    |>_IR_|      |>_R__|      |      
             in-      |            |         |
            validate  |            o---------
                      |            |
    
    The control unit is a sequential machine with an internal state register. The input is the current instruction, and the outputs are ALU and shifter controls, multiplexor controls, and an indication of whether the pipe should stall during the next clock cycle.
    	IR      ---------
    	 |  ___|___      |
    	 | |>_state|     |
            _|_____|_        |
                |            |
                |    -----   |
                |   |     |  |
                 ---|  F  |  |
                    |_____|  |
                   ____|____ |
            stall   | | | |  |
    	  |     | | |  --
              o-----  | |
    	  |       |  -------- mux control
            in-       |
            validate   ---------- function select
    
    So long as the control unit is in its initial state, the IR determines the next state and the outputs. If the IR encodes a simple instruction, the next state is the initial state, no stall/invalidate is requested, the mux control prevents recirculation, and the function is selected for this pipelined instruction. So long as this situation persists, the finite state machine will process one pipelined instruction per microcycle and an observer will see no evidence of microcoded execution.

    If the IR holds a valid microcoded instruction, the stall/invalidate output goes high and the next state leads into a microcoded sequence of instructions, with the mux control and function select set accordingly for each step in the sequence. The final microinstruction in the sequence computes the valid result, sets the stall/invalidate output low, and has the initial state as its successor.