Homework 6 Solutions

22C:122, Fall 1998

Douglas W. Jones
  1.      a: MOVE X,sub
            MOVE ccN,Y
    
    Our result forwarding logic fails in the above because the address ccN (FFF1 as a source) and sub (FFF1 as a destination) are equal, and therefore the value of X is forwarded to Y; this is the incorrect result!
         b: MOVE X,sub
            MOVE acc,Y
    
    Our forwarding logic fails in the above because the forwarding logic does not know about the ALU; as a result, it cannot forward the result from the subtract operation to Y.
         c: MOVE X,FFF816
            MOVE FFF816,Y
    
    Our forwarding logic fails in the above because the memory address FFF816 is unimplemented memory, so the correct semantics is that Y should get an undefined value. Instead, it gets the forwarded value of X. This may not be formally incorrect, but is unexpected behavior.
         d: MOVE X,(d+2)
            MOVE .-.,Y
    
    Our forwarding logic fails in the above because the code is self-modifying and our forwarding logic does not forward to the instruction fetch path.
    
         e: MOVE X,Y
            MOVE Y,pc
    
    Result forwarding fails in the above because, while our forwarding tries to forward to the PC in order to eliminate branch delay slots, it does not account for an operand move immediately prior to a branch.

  2.            ______   __________
              |  __  | |  ______  |
              | |  |_|_|_|      | |
      clk     |_|  \0mux1/------| |-----
       |     |+2_|  _|_|_       | |     |
       o------| |--|>pc__|      | |     |
       |      | |____| |________| |_____|_____
       |      |___________   ___| |___________ read address
       |         _________| |___| |_____|_____
       |        |  _______| |___| |___________ read data
       |       _|_|_      |_|   | |     |
       o------|>src_|    |+1_|  | |     |
       |        | |       | |___| |_____|_____
       |        | |       |_____| |___________ read address
       |        | |        _____| |_____|_____
       |        | |       |  ___| |___________ read data
       |       _|_|_     _|_|_  | |     |
       o------|>src_|---|>dst_| | |     |
       |        | |_______| |___| |_____|_____
       |        |_________| |___| |_______   _ read address
       |   _____   _______| |___| |_____|_| |_
       |  |  _  | |  _____| |_____________| |_ read data
       |  | | | | | |     | |_ _______  | | |
       |  | | |_|_|_|     |  _|_=FFFF?|-  | |
       |  | | \1mux0/-----| |------------|_=_|
       |  | |   | |       | |             | | 
       |  | |  _| |_     _|_|_            | |
       o--| |-|>tmp_|---|>dst'|           | |
       |  | |___| |       | |_____________| |_
       |  |_____  |       |  _________________ write address
       |        | |_______| |_________________
       |        |_________| |_________________ write data
       |                __|_|__          
       |               |_=FFFF?|
       |                   |           ___
       |                    ------not-|and|___ write memory
        ------------------------------|___|
    

  3. To eliminate the worst of the difficulty with problem 1 parts a, b and c, we must turn off result forwarding for operands that are outside the normal part of memory. Thus, we could replace the test
         src = dst'
    
    with the more complex test
         (src = dst') and (src < F00016)
    
    This merely prevents the forwarding logic from producing anomolous behavior; it does not forward the correct results!

    To solve the problem with parts a and b, we must provide forwarding paths from the output of the ALU (prior to feeding into the accumulator or condition code register) to tmp. This is fairly complex, but it can be added to the register transfer notation with a one-line change, from

               tmp = (if src = dst' then tmp else m[src])
    
    to
               tmp = (if (src = dst') and (dst' < F00016)
                         then tmp
                      elseif src = FFF016 and (FFF016 < dst' < FFF816)
                         then alu-data-output
                      elseif src = FFF116 and (FFF016 < dst' < FFF816)
                         then alu-sign-bit-output
                         else m[src])
    
    This is ugly, but it works!

  4. The logic given in the assignment eliminates one branch delay slot because assignments to the PC are handled by checking dst and not dst', and by assigning from M[src] instead of assigning from tmp.

    To introduce a bubble in the pipe, we can convert the instruction following a branch into a no-op. Consider the following version of the architecture:

            repeat the following assignments in parallel
    
          --   if (dst' < FFFE16)
                  then m[dst'] = tmp
    
          *    tmp = (if src = dst' then tmp else m[src])
               dst' = dst
    
               src = m[pc]
               dst = m[pc + 1]
          --   dst = (if dst = FFFF16 then FFFE16 else m[pc + 1])
               pc = (if dst = FFFF16 then m[src] else pc + 2)
    
            forever
    
    The changed lines have been marked with dashes. The first change causes stores to location FFFE16 to be interpreted as no-ops; the second change is to convert dst to FFFE16 in the event that the previous instruction was a branch. This converts the instruction following a branch into a no-op, although it still wastes effort fetching its operand. This wasted fetch could be avoided by changing the line marked with a star.