Homework 3

22C:122, Spring 1998

Due Friday Feb 20, 1997, in class

Consider the following annoyingly simple instruction set for a simple machine:

   ____________________
  | Opcode |  Operand  |
  |________|___________|
    LOAD      address     AC = M[address]
    LOADI     immediate   AC = immediate
    ADD       address     AC = AC + M[address]
    ADDI      immediate   AC = AC + immediate
    SUB       address     AC = AC - M[address]
    SUBI      immediate   AC = AC - immediate
    SUB       address     AC = AC - M[address]
    STORE     address     M[address] = AC
    JMP       address     PC = address
    JMPN      address     if AC<0, PC = address
    JMPP      address     if AC>0, PC = address
    JMPZ      address     if AC=0, PC = address
    CALL      address     AC = PC; PC = address
    JMPX      address     PC = AC + address
    NOP                   do nothing!

First, propose how you would pipeline this machine. A rough block diagram showing what is in each interstage register and where the functional units and other registers go is sufficient.
Second, assuming that the pipeline is not interlocked, how many branch delay slots, operand delay slots and so forth must the programmer account for.
Third, write assembly code for the following program fragment, taking into account all the delay slots you've identified.
```
  for (i=1; i++; i<10) x = x + i;
```
Forth, note what fraction of your instructions in the above code fragment were NOPs. This gives you a rough idea of how close to the theoretical efficiency you can get with this architecture. Then, suggest what (mis)features of this architecture would have to be corrected in order to get better performance.