CS:2630, Homework 13, Summer 2018

On every assignment, write your name legibly as it appears on your University ID card! Homework is due on paper at the start of class on the day indicated (Tuesday or Thursday). Exceptions will be made only by advance arrangement (excepting "acts of God"). Late work must be turned in to the TA's mailbox (ask the CS receptionist in 14 MLH for help). Never push homework under someone's door!

Background: In homework 12, problem 3b, it was noted that before running the following sequence of instructions to turn on the Hawk MMU, it is necessary to load the MMU with at least one entry that maps the virtual address for the current page to somewhere sensible.

a) Write a sequence of Hawk instructions that maps the first page of the Hawk ROM (containing the entire Hawk monitor) to frame 0 of the virtual address space, and sets the access rights appropriately. (0.5 points)

	LIS	R3,0
	CPUSET	R3,TMA
	LIS	R3,#1B
	CPUSET	R3,MMUDATA

The above answer is short but not easy to read. Perhaps we should give symbolic definitions of the bits in the MMUDATA register, something like this:

; bits in the MMUDATA register
MMUVALID=	#01
MMUEXEC =	#02
MMUWRITE=	#04
MMUREAD =	#08
MMUCACHE=	#10
MMUGLOBL=	#20

Because the TMA and MMUDATA registers use identical formats for the page number, and because the low bits of the virtual address are ignored when setting the maping registers in the MMU, We can make it more obvious that we're setting the virtual and physical addresses identical with the following code:

	LIS	R3,#00000+MMUVALID+MMUEXEC+MMUREAD+MMUCACHE
	CPUSET	R3,TMA
	CPUSET	R3,MMUDATA

b) Write a sequence of Hawk instructions that maps the first page of the Hawk RAM to frame 10₁₆ of the virtual address space, and makes the mapping and sets the access rights appropriately. (0.5 points)

	LIS	R3,#10000
	CPUSET	R3,TMA
	LIS	R3,#1001F
	CPUSET	R3,MMUDATA

Or, in more symbolic form:

	LIS	R3,#10000+MMUVALID+MMUEXEC+MMUREAD+MMUWRITE+MMUCACHE
	CPUSET	R3,TMA
	CPUSET	R3,MMUDATA

Background: Here are two ways of computing the same thing, based on code lifted from the SAVEREGS section of the trap handler outlined in Chapter 13 in the Hawk manual:

FIRST:  CPUGET  R1,TSV
        CPUGET  R3,TPC
        STORE   R1,R2,svR2
        STORE   R3,R2,svPC
        CPUGET  R1,TMA
        CPUGET  R3,PSW
        STORE   R1,R2,svMA
        STORE   R3,R2,svPSW

SECOND: CPUGET  R1,TSV
        STORE   R1,R2,svR2
        CPUGET  R1,TPC
        STORE   R1,R2,svPC
        CPUGET  R1,TMA
        STORE   R1,R2,svMA
        CPUGET  R1,PSW
        STORE   R1,R2,svPSW

These two pieces of code do exactly the same thing. The first solution uses two registers, the second uses only one. If we did not have pipelined computers, the two alternatives would be equally fast, but on a pipelined machine, one alternative is distinctly faster than the other.

a) Which alternative is faster on a pipelined computer, and why? (0.5 points)

The first one is faster on a pipelined computer because there is one extra instruction between each CPUGET and the STORE instruction that saves the corresponding result.

b) How much faster? You can answer this by careful use of a pipeline diagram. The answer will be a number, a count of the number of clock cycles saved by the faster solution when compared to the slower solution. (0.5 points)

If we assume the 4-stage pipeline from the text, and if we assume that CPUGET updates the registers in the RS (result-save) pipeline stage, while STORE gets the value of registers in the OF (operand-fetch) stage, then there are 2 delay slots from CPUGET to STORE. We have only filled one of these slots with the above code, so the run-time will be comparable to this, using NOP instructions to fill each delay slot:

FIRST:  CPUGET  R1,TSV
        CPUGET  R3,TPC
        NOP
        STORE   R1,R2,svR2
        STORE   R3,R2,svPC
        CPUGET  R1,TMA
        CPUGET  R3,PSW
        NOP
        STORE   R1,R2,svMA
        STORE   R3,R2,svPSW

SECOND: CPUGET  R1,TSV
        NOP
        NOP
        STORE   R1,R2,svR2
        CPUGET  R1,TPC
        NOP
        NOP
        STORE   R1,R2,svPC
        CPUGET  R1,TMA
        NOP
        NOP
        STORE   R1,R2,svMA
        CPUGET  R1,PSW
        NOP
        NOP
        STORE   R1,R2,svPSW

In sum, the improved version of the code requires 6 fewer clock cycles than the slower version.

Background: The simplified instruction fetch stage discussed near the middle of Chapter 15 omits two important features of the Hawk computer and one important feature of the Sparrowhawk:

a) What does this diagram imply about the word size of the data fetched from RAM for each instruction, and how does this differ from the Sparrowhawk? (0.5 points)

It implies a 16-bit data path from memory, since the IR is 16 bits and it increments the program counter by 2 for each cycle.

b) What additional issue does the Hawk raise that would add even more complexity to this pipeline stage? (0.5 points)

The Hawk has two instruction lengths, 16 bit and 32 bit, so some instruction fetches would need to increment the PC by 2 and some would increment it by 4.

Assignment 13, due Jul 31