Midterm

22C:122, Fall 1998

Please write your answers on the paper provided. Make sure your name is on the front page, in the same form as it appears in the official class list! Illegible and overly long answers are difficult to grade. Please leave margins around your answers!

This exam is worth 1/5 of the final grade (20 points; allocate 2 minutes per point).

Background: Consider a coprocessor for performing computations on blocks of operands in memory. This coprocessor executes the following algorithm with a memory interface that allows a maximum of one memory access per clock cycle:
```
    variables
        src1, src2, dst:  addresses initialized by CPU
        count: operand count initialized by CPU
	op: operation specifier set by CPU (add, subtract, etc.)

    repeat
        if count > 0
	    M[dst] = f(op, M[src1], M[src2])
	    src1 = src1 + 1;
	    src2 = src2 + 1;
	    dst = dst + 1;
            count = count - 1
	endif
    forever
```
Part A: Rewrite the above so it accomplishes one register transfer per assignment while conforming to the restriction on the number of memory accesses per clock cycle. (2.0 points)
Part B: Rewrite the code from part A to indicate which register transfers can be accomplished in parallel (group parallel transfers using brackets), and then indicate how many clock cycles the code requires per iteration. (1.5 points)
Part C: Sketch the register transfer logic diagram that implements your solution to part B. (2.5 points)
Part D: Given the constraint forbidding parallel memory accesses, can pipelining be applied to this coprocessor to speed its operation? Why or why not? (1.0 points)
Background: Consider a RISC machine with the following (very typical) 5-stage pipeline:
1. Instruction fetch
2. Operand gather (from registers)
3. ALU
4. Memory, operand load or store
5. Result save (to registers)
Assume, initially, that this machine has no interlocks, no result forwarding, and a pure load-store instruction set, so all arithmetic operations are performed register to register, the only memory reference operations are load and store, and the user is responsible for filling all delay slots. The machine has a no-op instruction you may use to fill delay slots.
Part A: Write instruction-level pseudocode to correctly compute the following on this machine, assuming that the values of the variables I, X and Y are all stored in registers at the start of the computation; I holds an integer, X and Y are memory addresses, and memory is word-addressable:
```
	I = I + 1;
	X[I] = Y[I] + 2
```
Do not attempt to optimize your code! Write it out with no-ops filling all delay slots! (3.0 points)
Part B: Rewrite your answer to part A overlapping as many computations as possible so that use of no-ops is minimized. (2.0 points)
Part C: Which of the following instructions could be implemented using this pipeline model:
1. Branch Indirect (PC = M[ea])
2. Load Indirect (R[dst] = M[M[ea]])
3. Old-style function call (M[ea] = PC; PC = ea + 1)
4. Autoincrement store (M[R[x]] = R[src]; R[x] = R[x] + 1)
5. Register to memory arithmetic (M[R[x]] = R[s1] op R[s2])
(2.0 points)
Background: Consider a machine with the following classes of instructions:
1. Memory to register (R[reg] = R[reg] op M[ea])
2. Register to memory (M[ea] = R[reg] op M[ea])
3. Immediate to register (R[reg] = R[reg] op ea)
In the above, op is a function selected by a field of the instruction. Some of the typical functions include add, sub, and, or, left-operand-only, and right-operand-only. On this machine, effective address computation operates as follows:
- ea = R[x] mode immediate_constant
In the above, mode is a function selected by a field of the instruction; available functions include add (traditional indexed addressing), left operand only (register indirect mode), and right operand only (traditional direct addressing).
Part A: How many memory references could one instruction require in order to execute to completion? From this, can you conclude how many memory ports a pipelined CPU for this instruction set would require? (1.0 point)
Part B: Propose a pipeline structure for this machine, indicating, for each stage, the purpose of that stage. For example, what computations it performs, what memory references it performs, and what registers it accesses. (4.0 points)
Part C: Given that this machine has 16 registers and a 32 bit word, propose a reasonable instruction format for this instruction set. (1.0 points)