Final Exam

22C:122, Fall 1999

Please write your answers on the paper provided. Make sure your name is on the front page, in the same form as it appears in the official class list! Illegible and overly long answers are difficult to grade. Please leave margins around your answers! This exam is worth 30% of your final grade; allocate about 4 minutes per point!

Background Information: Consider the computer with the following characteristics, reprinted with minor corrections from the study questions:

     Word size:          32 bits
     General Registers:  64 (includes PC)
     Instructions:
        _______________________________________________________________
       |___|___________|___________|___________|___________|___________|
        OP      R1          R2           op         sh          R3

        OP = 00 operate
		  R1 = shf(R2 op R3)
                        ___________
		  op = |_____|_____|
                         alu   sr2

			alu = 3 bit alu function select
			sr2 = 3 bit shift select for R2 operand

		  sh = 6 bit signed shift count for result
		   
        _______________________________________________________________
       |___|___________|___________|___________________________________|
        OP      R1          R2                 const (18 bits)

        OP = 01 immediate constant to register
		   R1 = R2 + sign-extend(const)

        OP = 10 load      memory to register
		   R1 = M[R2 + sign-extend(const)]

        OP = 11 store     register to memory
		   M[R2 - sign-extend(const)] = R1

A Question: Given an arbitrary 32 bit destination address, how would you write code to jump to or goto that address using this instruction set? (2 points)
Hint: A 2 instruction sequence is necessary and sufficient.
A Question: Given a 2 instruction solution to the problem of jumping to a location, suggest an instruction sequence that will be sufficient to accomplish a call to a subroutine using a register to store the return address. (1 points)
Hint: A 3 instruction sequence will suffice, including the 2 from question 1.
A Question: How many pipeline stages would be reasonable for a straightforward* pipelined version of this machine. Justify your answer by clearly describing the functions of each pipeline stage with perhaps a paragraph per stage. (3 points)
* straightforward: No superscalar execution, no branch anticipation, nothing else bizarre or exciting, just a basic pipeline for this machine.
Hint: You might want to at least briefly consider both the "canonical 5-stage pipe" of the class notes and the pipeline described for the DLX machine in Figure 3.1 on page 130 of the text before you start composing an answer to this problem.

Here are some alternatives:

  ______                        ______
 |  IF  |---o----+             |  IF  |---o----+
 |______| __|__  |             |______| __|__  |
  |____| |cache| |              |____| |cache| |
 |      ||_____| |             |      ||_____| |
 |______|        |             |______|        |
  |____|         |              |____|         |
 |  MA  |---o----+   ______    |  MA  |--------+      ______
 |______| __|__  |  |      |   |______|        |     |      |
  |____| |cache|  --|Memory|    |____|          -o---|Memory|
 |      ||_____|    |______|   |      |        __|__ |______|
 |______|                      |______|       |cache|
                                              |_____|

Here, the MA stage represents a memory access stage in the pipe; the pipe may have multiple memory access states that perform read or write operations.

Part A: Explain why the left alternative above could not be easily incorporated into the architecture used for the running example in class, while the alternative presented on the right was considered appropriate. (2 points)

Part B: Why might you prefer the alternative on the left for the example architecture presented on this exam? (2 points)

Background: The example architecture has no conditional branch instruction (a deliberate oversight). Consider the following two alternatives for inclusion of such a feature:
We could steal 3 or 4 bits from the op and sh fields to make every operate instruction into a conditional skip instruction. 000 would mean never skip, other patterns would mean skip if result zero, skip if result positive, skip if negative, etc.
We could add condition codes to the system and steal 3 or 4 bits from the const field to allow each non-operate instruction to be conditional on the result of the most recent operate instruction.
Part A: Evaluated solely from the perspective of ease of implementation in a pipelined processor, which of these would be most desirable, and why. Your answer must both state the upside the desirable option and the downside of the undesirable option. (2 points)
Part B: Evaluated solely from the perspective of efficient instruction coding, which is likely to make more efficient use of program stream bandwidth. Your answer should be in the form of a persuasive qualitative argument. (2 points)
A Question: Explain why the concept of "the program counter register" gets somewhat fuzzy in the context of pipelined processors. (3 points)

Background: Consider the problem of byte addressable memory on a machine with a 16 bit word, an 8-bit byte, a 16-bit pathway between CPU and memory, and the convention that byte zero of a word is the MSB. One way to do this is as follows:

  From CPU                                  To Memory
                             word address
           byte address   |===============>
          ===============>| 
                          |---  byte number
                              |
           mode             __|__
          ---------------->|  f  |
      0=byte/1=word        |_____|
                            | | |
          <-----------------  | |
            trouble       ----  |  
                       __|_     |  
                      |   1|<===|==0
               msb|<==|MUX |    |     
  To CPU          |   |___0|<===|=o==|msb
            data  |             | |  |      From memory
          <=======|       ------  |  |  data
                  |    __|_       |  |<======
                  |   |   1|<=====   |
               lsb|<==|MUX |         |
                      |___0|<========|lsb

This kind of alignment network can be generalized for byte, halfword and word access on a 32 bit machine, and reverse alignment networks can be constructed for store operations.

Part A: What memory access condition must this alignment network report as a trouble condition? (1 points)

Part B: Give a truth table for the combinational function f. Note that, for some combinations of inputs, some outputs may not be fully defined. (3 points)

Background: Consider, a machine with a 16-bit word and an 8-bit byte. The memory interface of this machine supports only two operations, read word and write word, and the machine handles byte access using an assortment of alignment networks such as the one outlined in the previous problem.
The instruction set allows access to a non-aligned word operand in memory. On most machines where this is allowed, the pipeline stage that attempts a non-aligned access must stall in order to allow multiple memory cycles. Here, we propose an alternative design for a memory access pipeline stage:
```
  |____|      _____     |
 | MA1  |----|align|----+
 |______|    |_____|    |
  |____|      _____     |
 | MA2  |----|align|----+
 |______|    |_____|    |memory
    ...                 |arbitration
  |____|      _____     |bus
 | MAn  |----|align|----+
 |______|    |_____|    |
```
To access memory, we have pipeline stages MA1 through MAn, each of which has, if needed, a memory port, and each of which has, as required, an alignment network. Simple memory accesses, for example, read byte from memory or write aligned word to memory, are handled by MA1. More complex accesses require the additional pipeline stages to perform memory cycles.
Part A: For a non-aligned word read from memory, how many pipeline stages are needed, what do they do, and how must the alignment net for each stage be set. (2 points)
Part B: For a single byte write to memory, how many pipeline stages are needed, what do they do, and how must the alignment net for each stage be set. (2 points)
Part C: For a non-aligned word write to memory, how many pipeline stages are needed, what do they do, and how must the alignment net for each stage be set. (2 points)
Part D: Given that the naive pipeline, with no consideration for non-aligned operands, had a single memory access stage that could read or write aligned or non-aligned bytes or words, how many stages do we really need? (1 points)
Background: The example architecture has no provisions for byte or halfword manipulation. One way to add this would be to steal 2-bit field from the const field of memory reference instructions and use this to specify the operand size (word, halfword or byte). An alternative is to make sure that some combinations of operation and shift fields in the operate instruction can be used to efficiently extract and edit arbitrary fields of operand words briefly outline the costs and benefets of these two alternatives. (2 points)