Assignment 9, due Apr 4

Part of the homework for 22C:122/55:132, Spring 2003
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Always, on every assignment, please write your name legibly as it appears on your University ID and on the class list!

Background: Consider the final pipeline diagrammed in the notes for lecture 23. This is largely a register transfer diagram, with the combinational circuitry in the align, add and normalize pipeline stages abstracted down to the bare minimum of a box. The operand fetch and result store boxes, however, are excessively simplified, since they contain no representation of data paths to and from memory, and no representation of address registers, strides, or the adders necessary to increment the address registers by the stride.
Problem: Assume that the memory is off to the left, that the memory has 3 available ports, and that each memory port has the usual data bus and address bus. Show the details of the operand fetch and result store stages, including the registers, adders, and whatnot missing from the diagram in the notes.
Please emphasize issues of data flow during pipelined operation, ignore data flow during setup and takedown, and ignore the control lines needed to evoke the various register transfers.
Background: Consider the VLIW architecture architecture in the notes for lecture 22. Specifically, look at the example that is worked out in the section titled Pipelining at the Program Level. This example is not quite correct for the inner loop of a matrix product; the inner loop for a matrix product might look more like this:
```
	t = 0;
	ap = &a
	bp = &b
	for i = 10 to 1 do
            t = t + ap* * bp*
	    ap += 1;
	    bp += 10;
	endloop
```
Part A: Work out this, with no attempt at optimization, in the style of the first effort listed after the loop source code example. Assume that the only pipelining is that you explicitly code, as in the referenced example from the notes, and not as is developed later in that lecture.
Part B: Optimize your code in the style of the second solution given in the notes, where iterations are combined and overlapped in order to minimize the number of no-ops. Pipeline diagram notation may be helpful even though we have not yet assumed a pipelined architecture.