Homework 11 Solved

22C:122/55:132, Spring 2001

Problem: If the entire first byte is the opcode, how many opcodes does this machine machine allocate for each instruction size.
We have 64 opcodes for length 1, 128 for length 2, and 64 for length 3.
Part A: Give a diagram showing the relevant registers and data paths used to control the routing of instruction bytes from the instruction register to the next stages of a 2-way superscalar pipelined machine using this opcode format.
```
         _________________________________________________
	|___I1__|___I2__|___I3__|___I4__|___I5__|___I6__|_...
            |       |       |       |       |       |
            o---    |       |       |       |        -- 
            |  _|_  |       |       |        --o-----  |
            | |   | |       |        --o-----o-|---  | |
            | | F | |       o--------o-|---  | |   | | |
            | |___| o-------|------  | |   | | |   | | |
            |   |   |       |     _|_|_|_ _|_|_|_ _|_|_|_
            |    ---|-------|-----\0_1_2/-\0_1_2/-\0_1_2/
         ___|_______|_______|___  ___|_______|_______|_____
	|__P1B1_|__P1B2_|__P1B3_||__P2B1_|__P2B2_|__P2B3_|_...
```
Part B, C and D: Give the column labels on the truth-table ... How big is this ROM? ... Give the contents of 2 rows of this ROM ...
```
	F
	  I1B1 I1B2 | Mux Control
	------------+--------------
	   0     0  |      0
	   0     1  |      1
	   1     0  |      1
	   1     1  |      2
```
The stupid design, requiring no optimization, requires a 4 word ROM with 2 bits per word. A better design eliminates this entirely and simply uses I1B1 and I1B2 (the two most significant bits of I1) to direclty control custom-designed 3-input multiplexors.
This solution is predicated on the fact that later stages in the pipeline don't mind receiving nonsense in unneeded bytes of the instructions. If they require that unneeded bytes be cleared or filled with special default values, a more complex solution will be required.
Notice that the valid bits play no part in this! They are extremely important in the instruction register shift logic, and they determine whether the instruction presented to the pipeline can be executed, but they play no necessary role in data routing.

A Problem: Modify the calling sequence, receiving sequence and return sequence given at the end of part 1 of the notes for Lecture 32 so that it uses a return address instead of a continuation point.

Boldface indicates changes in the following.

	-- assume the following named index registers:
	--    AR - pointer to the current activation record
	--    NR - points to new activation record (temporarily)
	--    RA - register holding return address (temporarily)

	-- if P points to an activation record
	--    P->RA holds the return address
	--    P->DL holds the saved dynamic link
	--    P->SL holds the saved static link

	-- we assume that ARS, the size of each activation
	--    record is known statically; it may vary from
	--    place to place depending on the number of
	--    temporaries currently needed, but each time
	--    any particular instruciton is executed, the
	--    size is fixed.

	--------------------
	-- calling sequence

	      ADD NR = AR + ARS

	      -- copy parameters into fields of *NR
	      -- save registers into fields of *AR

	      MOVE NR->SL = DST_CONTEXT
	      CALL RA,DST

	      -- restore registers from *AR

	--------------------
	-- receiving sequence

	DST
	      MOVE NR->RA = RA
	      MOVE NR->DL = AR
	      MOVE AR = NR

	--------------------
	-- return sequence   
	      MOVE RA = AR->RA
	      MOVE AR = AR->DL
	      JUMP *RA

Background: An alternative model has several pipeline stages that can complete instructions.
Part A: Does this alternative model offer a potential performance advantage? Why or why not?
Removing instructions from the pipeline as soon as possible eliminates potential operand conflicts by making the operands available sooner. Furthermore, once the instruction is completed, the resulting bubble in the pipeline will insulate the upper level in the pipe from stalls in lower levels.

Part B: Does this alternative model pose potential problems for interrupt and trap handlers? Explain!
If the instruction sequence A B C is executed in a conventional pipeline, A will be completed first and C will be completed last. If, however, instruction A is completed in stage 5 of the pipe, while B is completed in stage 3 and C is completed in stage 1, They will be completed in the order B C A. In sum, the alternative leads to out-of-order execution.
Out-of-order execution requires new interlock logic, but it also poses problems with interrupts. Consider the sequence A B C D where A enters the pipeline first but B is completed first. If an interrupt is requested after B is completed but before A is completed, we must find a way to execute A and then C, but not B, on return. Either that, or we must find a way to inhibit out-of-order execution before the interrupt is handled.