Homework 4 Solutions

22C:122, Fall 1998

Douglas W. Jones
  1. Background: Consider a pipelined implementation of the Minimal Ultimate RISC where each instruction execution cycle has the following semantics:
    repeat
       M[dst] = tmp;
       tmp = M[src];
       src = M[pc];
       dst = M[pc + 1];
       pc = pc + 2
    forever
    

    Part A: This machine's instruction set differs from that of the Minimal Ultimate RISC discussed in class because there is no dst' register. As a result, if we look at an instruction sequence using a pipeline diagram, we find that the src and dst addresses for a single move are not fetched from the same instruction:

                   ----|-------------|-------------|-------------|-
             src1/dst1 | src=src1    | tmp=M[src1] | M[dst2]=tmp |
                       | dst=dst1    |             |             |
                   ----|-------------|-------------|-------------|-
             src2/dst2 |             | src=src2    | tmp=M[src2] |
                       |             | dst=dst2    |             |
                   ----|-------------|-------------|-------------|-
             src3/dst3 |             |             | src=src3    |
                       |             |             | dst=dst3    |
                   ----|-------------|-------------|-------------|-
                           cycle 1       cycle 2       cycle 3
    
    Examining the above, it is clear that, to move A to B, one must write the following code:
             1: word A
             2: word ... store previous operand
             3: word ... fetch next operand
             4: word B
    
    So, we can write code to compute A = B - C as follows, using XXXX, YYYY and ZZZZ as filler memory addresses for the strangeness introduced by this instruction set.
    	    word  B
                 word   XXXX
                word  C
    	     word acc
                word    YYYY
                 word sub
                word  acc
                 word   YYYY
                word    ZZZZ
                 word A
    

    Part B: Here is a register-transfer implementation of this instruction execution unit, ignoring the problem of branch instructions.

                  _____________
                 |  _________  |
              __\|_|/__    __|_|__
           --|>___PC___|  |__+2___| 
          |      | |        /| |\                  
          |      | |_________| |__________________\ read
          |      |______________________   _______  address
          |                   __________| |_______/
    CLK --o                  |  ________| |_______  data
          |                  | |      _\| |/_     \
          |                  | |     |__+1___|
          |                  | |        | |_______\ read
          |                  | |        |_________  address
          |       ___________| |__________________/
          |      |  _________| |__________________  data
          |      | |         | |                  \
          |   __\|_|/__   __\|_|/__
          o--|>__DST___|-|>__SRC___| 
          |      | |         | |__________________\ read
          |      | |         |____________________  address
          |      | |          ____________________/
          |      | |         |  __________________  data
          |      | |      __\| |/__               \
          o------| |-----|>__TMP___|
          |      | |_________| |__________________\ write
          |      |___________| |__________________  address
          |                  | |__________________<
          |                  |____________________  data
          |                                       /
           ---------------------------------------> write strobe
    

    Part C: Here is the above machine, with branch logic added:

                _______________________________ 
               |  ___________________________  |
               | |  ___________              | |
               | | |  _______  |             | |
              _|_|_|_|_      | |             | |
             _\ 1   0 /      | |             | |
            |  \_MUX_/       | |             | |
            |    | |         | |             | |
            | __\|_|/__    __|_|__           | |
          ---|>___PC___|  |__+2___|          | | 
         |  |    | |        /| |\            | | 
         |  |    | |_________| |_____________| |__\ read
         |  |    |___________________   _____| |__  address
         |  |                 _______| |_____| |__/
    CLK -o  |                |  _____| |_____| |__  data
         |  |                | |   _\| |/_   | |  \
         |  |                | |  |__+1___|  | |
         |  |                | |     | |_____| |__\ read
         |  |                | |     |_______| |__  address
         |  |     ___________| |_____________| |__/
         |  |    |  _________| |_____________| |__  data
         |  |    | |         | |             | |  \
         |  | __\|_|/__   __\|_|/__          | |
         o---|>__DST___|-|>__SRC___|         | |
         |  |    | |         | |_____________| |__\ read
         |  |    | |         |_______________| |__  address
         |  |    | |          _______________| |__/
         |  |    | |         |  _____________| |__  data
         |  |    | |      __\| |/__          | |  \
         o-------| |-----|>__TMP___|         | |
         |  |    | |_________| |_____________| |__\ write
         |  |    |___________| |______   ____| |__  address
         |  |                | |______| |____| |__<
         |  |                |________| |_________  data
         |  |                       __|_|__       /
         |  |                      |_=FFFF_|
         |  |                          |    ___
         |   --------------------------o--o|and|
          ---------------------------------|___|--> write strobe
    
    Here is the pseudocode for the above machine:
    repeat
       if dst <> FFFF then M[dst] = tmp else pc = tmp;
       tmp = M[src];
       src = M[pc];
       dst = M[pc + 1];
       pc = pc + 2
    forever
    
    Note that this logic does nothing to eliminate visible delay slots! Programming this machine will definitely be strange!

    Part D: This pipelined IEU has no operand delay slots, other than the strange relationship between source and destination addresses being encoded in different instructions.

    Part E: This pipelined IEU has one branch delay slots in addition to the strange semantics of the move instruction. Therefore, a branch would look like this:

    	    word  DST    -- start the branch
                 word   XXXX
                word    YYYY -- finish the branch
    	     word pc
                word    ZZZZ -- delay slot filler
                 word   YYYY