Homework 7

22C:122, Fall 1998

Due Monday Oct 12, 1998, in class

Background: Consider a generic modern RISC processor with a load-store architecture and a 5-stage pipeline. The pipeline stages are:

Fetch into IR
Operand gather (from registers and IR)
ALU (computes result or effective address)
Memory operand (Load or Store)
Result save (to registers)

Load operations use stages 1-5, Store operations are completed in stage 4 and do not use stage 5, and arithmetic operations use only stages 1-3 and 5, making no use of stage 4.

Assume that the program counter is treated as a normal register, and assume that there is a load effective address (LEA) instruction that has exactly the same format as a normal memory reference instruction, but that makes no memory reference. Instead, the effective address computed is stored directly in the destination register.

Assume that this architecture is implemented using bubbles to fill all delay slots, so there is no result forwarding.

Identify all of the points where the pipeline must be stalled in the following little example, and for each, note the number of clock cycles that the stall must skip (equivalently, how many one-cycle bubbles must be injected)?
```
        LEA   c,X       R[c] = X
	LOAD  a,Y       R[a] = M[Y]
	LOAD  b,Z(c)    R[b] = M[Z+R[c]]
	ADD   c,a,b     R[c] = R[a] + R[b]
	STORE c,Z       M[Z] = R[c]
	LEA   pc,D      R[pc] = D
```
Note in the above that the assembly language notation used is merely typical. Here, we assume a, b, c, pc, X, Y and Z are constants. Of these, X, Y and Z are address constants used in memory reference and related instructions, while a, b and c are the numbers of general purpose registers and pc is the number of the register used as a program counter.
Background Consider the problem of implementing a scoreboard to interlock register use on our generic pipelined architecture. The scoreboard presented in class had one bit used to reserve each register (including the program counter). If an instruction intends to modify a register, the operand gather stage of the pipeline reserves that register by setting the corresponding scoreboard bit. The bit is reset by the result save pipeline step when the modification is complete, and the operand gather stage will stall until the scoreboard bit is reset for each operand it gathers.
Consider the following small code fragment in the light of this scoreboard implementation:
```
	LEA   a,X	R[a] = X
	LEA   a,Y	R[a] = Y
	STORE c,Z	M[Z] = R[a]
```
Part A Why would the above implementation of the scoreboard fail to correctly synchronize the above fragment!
Part B Propose an alternative scoreboard implementation that correctly synchronizes examples such as the above.
Part C What if there is a condition code register, and all LOAD, STORE and arithmetic operations use the condition codes to report on the value loaded, stored or computed, and that there is a full set of conditional branches that allow the condition codes to be tested.
Explain how the concept a scoreboard can be used to prevent conflicts over the use of condition codes? Does the one-bit-per-register scoreboard discussed in class suffice for this? If not, does the scoreboard implementation you proposed in part B suffice?
BACKGROUND Consider the problem of adding interrupts to Minimal Ultimate RISC. Assume the following design, where PC is addressable as M[FFFF], IR (the interrupt register) is addressable as M[FFFE], and IE (the interrupt enable flipflop) is addressable as M[FFFD]. What is
```
	repeat
	    SRC = M[PC]
	    PC = PC+1
	    DST = M[PC]
	    PC = PC+1
	    TMP = M[SRC]
	    M[DST] = TMP
	    if interrupt requested and IE' nonzero, then
	        exchange( PC, IR )
                IE = 0, IE' = 0
            else
                IE' = IE
	    endif
	forever
```
The interrupt-requested condition is true if any device has its interrupt request flipflop set.
Part A: Why is there a copy of the IE bit in IE'? Hint: The answer has something to do with the problem of returning from interrupt.
Part B: Consider the problem of adding a compatable interrupt mechanism to the Pipelined Ultimate RISC. We have discussed versions of the Pipelined Ultimate RISC with delay slots exposed to the programmer, with result forwarding, and with bubbles injected in the pipe to hide delay slots from the programmer. Which of these versions of the pipelined ultimate RISC are compatable with the suggested interrupt architecture, and if there are incompatabilities, explain them!