Assignment 13, due April 30

Part of the homework for 22C:122/55:132, Spring 2004
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Always, on every assignment, please write your name legibly as it appears on your University ID and on the class list! All assignments will be due on Fridays at the start of class, and unless there is what insurance companies call "an act of God", the only exceptions to this rule will be by advance arrangement.

Consider again (you may be getting tired of this) the example architecture from the midterm. This machine has two pipeline stages that commit the results of instructions, the ALU/MEM stage commits the results of store instructions, while the OS stage commits the results of LOAD, LOAD-ADDRESS and OPERATE instructions.
The trap conditions that might concern us on this machine include traps raised because of unimplemented physical memory addresses, write operations to read-only memory, and any unimplemented ALU operations.
a) What pipeline stage(s) cause(s) traps.
b) When a trap condition is detected:
- The contents of some pipeline stages are invalidated
- The contents of some pipeline stages are unchanged
- The contents of one stage become a special instruction that saves some version of the PC in the trap-save register.
- The PC in the INSTRUCTION-FETCH stage is changed to the first location in the trap service routine.
What stages have their contents invalidated, what stages are allowed to finish this pipeline cycle, what stage injects this new instruction, and what version of the PC is saved.
Hint: PC relative addressing also required that we have multiple versions of the PC
c) Are traps implemented in this way precise or imprecise?
Consider a pipelined machine with a 1 nanosecond pipeline clock cycle, where the memory word is large enough to contain two instructions, so every other instruction fetch involves one memory cycle, on the average, and where 1 instruction in 4 is a load instruction and 1 instruction in 8 is a store instruction. Assume an I-cache that achieves a 10% miss rate, and a write-through D-cache with a 10% miss rate. Both caches share one memory port through memory arbitration logic.
a) How many memory cycles per second does this processor generate? (you might want to also express your answer in nanoseconds per memory cycle.)
b) What fraction of the instruction fetches will be blocked by memory conflicts.
In the notes for lecture 36, there is an algorithm given for write-through-cached-ram. Rewrite this so, that, instead of using linear search, it uses a 256-word small and very fast RAM, as described in the next section of the notes. Assume that the system word size for data and addresses is 32 bits, and you may use two RAMs to build the cache, one for data, one for tags, if you want.