Homework 5

22C:122, Fall 1998

Due Monday Sept 28, 1998, in class

Figure 3.4 in the text has some omitted components. Note that Figure 2.21 gives the fields of IR that are needed in figuring out what is missing:

The IF/ID interstage register:
- IR, one instruction.
- PC', the value of PC after fetching IR, for addressing.

The ID/EX interstage register:
- IR', the instruction, from IR. All fields are needed, so unlike the figure in the book, we just pass the whole thing forward.
- S1, S2, the two operands from registers.
- PC", the value of PC after this instr fetch.

The EX/MEM interstage register
- OP, the opcode from IR'.
- RD or RS1, the register to store result into, depending on whether the opcode is I-type or R-type. A mux is missing from the figure to select one of these two.
- S1, the register value to be written to memory.
- RESULT, the result from the ALU.
- TAKE, branch taken bit.

The MEM/WB interstage register:
- OP", the opcode, passed on from OP'.
- RD' or RS1', the register to store result into.
- MEMDATA, the data from memory.
- RESULT', the result passed from RESULT.

Part A: Here is a diagram of the Pipelined Ultimate RISC instruction execution unit showing all comparitors, multiplexors and other parts required to implement the maximum amount of possible forwarding logic within the IEU:

            ____________
           |  ________  |
        __\|_|/__   __|_|__
      -|>___PC___| |__+2___|
     |     | |       /| |\
     |     | |________| |_____________
     |     |____________   _________  |
     |                  | |         | |
     |               __\| |/__      | |
     |              |___+1____|     | |
     |                  | |_________| |_______\ read
     |                  |  _________| |_______  address
     |    ______________| |_________| |_______/
     |   |  ____________| |_________| |_______   data
     |   | |  __________| |______   | |       \
     |   | | |  ________| |____  |  | |
     |   |_|_|_|        |_|    | |  | |
CLK -o   \0   1/       |   |___| |__| |____
     |    \___/--------| = |___| |__| |__  |
     |     | |         |___|   | |  | |__| |__\ read
     |     | |                 | |  |  __| |__  address
     |     | |        _________| |__| |__| |__/
     |     | |       |  _______| |__| |__| |__   data
     |     | |       | |  _____| |  | |  | |  \
     |     | |       | | |  ___  |  | |  | |
     |     | |       |_|_|_|   | |  |_|  | |
     |     | |       \0   1/   | | |   |_| |
     |     | |        \___/----| |-| = |_  |
     |  __\|_|/__   __\|_|/__  | | |___| | |
     o-|>__DST___|-|>__SRC___| | |       | |
     |     | |         | |_____| |_______| |__\ read
     |     | |         |_______| |__   __| |__  address
     |     | |        _________| |__| |__| |__/
     |     | |       |  _______| |__| |__| |__   data
     |     | |       | |  _____| |  | |  | |  \
     |     | |       | | |  ___  |  | |  | |
     |     | |       |_|_|_|   | |  |_|  | |
     |     | |       \0   1/   | | |   |_| |
     |     | |        \___/----| |-| = |_  |
     |  __\|_|/__   __\| |/__  | | |___| | |
     o-|>__DST'__|-|>__TMP___| | |       | |
     |     | |_________| |_____| |_______| |__\ write
     |     |___________| |_____| |____________  address
     |                 | |_____| |____________<
     |                 |_______________________  data
     |                                        /
      ----------------------------------------> write strobe

Part B: Of the delay slots required for the original A[I] = 5 example, one is eliminated, but one delay slot remains.

The operand delay slots are clearly the result of data hazards, while the branch delay slots are the result of control hazards. Our assumption of 4-port memory eliminates any structural hazards.
We know that the add times in the ALU will be very similar to the time taken for the PC = PC + 2 operation in the fetch stage of the pipeline. There is only one accumulator, so there is no need to do a RAM cycle to get the a value from the accumulator or store a result in the accumulator, and the instruction cycle time must be long enough to do a RAM cycle for normal memory operations. As a result, it appears that there is no need to pipeline the ALU.