Assignment 13, due Jul 31

Part of the homework for CS:2630, Summer 2018
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

On every assignment, write your name legibly as it appears on your University ID card! Homework is due on paper at the start of class on the day indicated (Tuesday or Thursday). Exceptions will be made only by advance arrangement (excepting "acts of God"). Late work must be turned in to the TA's mailbox (ask the CS receptionist in 14 MLH for help). Never push homework under someone's door!

  1. Background: In homework 12, problem 3b, it was noted that before running the following sequence of instructions to turn on the Hawk MMU, it is necessary to load the MMU with at least one entry that maps the virtual address for the current page to somewhere sensible.

    a) Write a sequence of Hawk instructions that maps the first page of the Hawk ROM (containing the entire Hawk monitor) to frame 0 of the virtual address space, and sets the access rights appropriately. (0.5 points)

    	LIS	R3,0
    	CPUSET	R3,TMA
    	LIS	R3,#1B
    	CPUSET	R3,MMUDATA
    

    The above answer is short but not easy to read. Perhaps we should give symbolic definitions of the bits in the MMUDATA register, something like this:

    ; bits in the MMUDATA register
    MMUVALID=	#01
    MMUEXEC =	#02
    MMUWRITE=	#04
    MMUREAD =	#08
    MMUCACHE=	#10
    MMUGLOBL=	#20
    

    Because the TMA and MMUDATA registers use identical formats for the page number, and because the low bits of the virtual address are ignored when setting the maping registers in the MMU, We can make it more obvious that we're setting the virtual and physical addresses identical with the following code:

    	LIS	R3,#00000+MMUVALID+MMUEXEC+MMUREAD+MMUCACHE
    	CPUSET	R3,TMA
    	CPUSET	R3,MMUDATA
    

    b) Write a sequence of Hawk instructions that maps the first page of the Hawk RAM to frame 1016 of the virtual address space, and makes the mapping and sets the access rights appropriately. (0.5 points)

    	LIS	R3,#10000
    	CPUSET	R3,TMA
    	LIS	R3,#1001F
    	CPUSET	R3,MMUDATA
    

    Or, in more symbolic form:

    	LIS	R3,#10000+MMUVALID+MMUEXEC+MMUREAD+MMUWRITE+MMUCACHE
    	CPUSET	R3,TMA
    	CPUSET	R3,MMUDATA
    

  2. Background: Here are two ways of computing the same thing, based on code lifted from the SAVEREGS section of the trap handler outlined in Chapter 13 in the Hawk manual:
    FIRST:  CPUGET  R1,TSV
            CPUGET  R3,TPC
            STORE   R1,R2,svR2
            STORE   R3,R2,svPC
            CPUGET  R1,TMA
            CPUGET  R3,PSW
            STORE   R1,R2,svMA
            STORE   R3,R2,svPSW
    
    SECOND: CPUGET  R1,TSV
            STORE   R1,R2,svR2
            CPUGET  R1,TPC
            STORE   R1,R2,svPC
            CPUGET  R1,TMA
            STORE   R1,R2,svMA
            CPUGET  R1,PSW
            STORE   R1,R2,svPSW
    

    These two pieces of code do exactly the same thing. The first solution uses two registers, the second uses only one. If we did not have pipelined computers, the two alternatives would be equally fast, but on a pipelined machine, one alternative is distinctly faster than the other.

    a) Which alternative is faster on a pipelined computer, and why? (0.5 points)

    The first one is faster on a pipelined computer because there is one extra instruction between each CPUGET and the STORE instruction that saves the corresponding result.

    b) How much faster? You can answer this by careful use of a pipeline diagram. The answer will be a number, a count of the number of clock cycles saved by the faster solution when compared to the slower solution. (0.5 points)

    If we assume the 4-stage pipeline from the text, and if we assume that CPUGET updates the registers in the RS (result-save) pipeline stage, while STORE gets the value of registers in the OF (operand-fetch) stage, then there are 2 delay slots from CPUGET to STORE. We have only filled one of these slots with the above code, so the run-time will be comparable to this, using NOP instructions to fill each delay slot:

    FIRST:  CPUGET  R1,TSV
            CPUGET  R3,TPC
            NOP
            STORE   R1,R2,svR2
            STORE   R3,R2,svPC
            CPUGET  R1,TMA
            CPUGET  R3,PSW
            NOP
            STORE   R1,R2,svMA
            STORE   R3,R2,svPSW
    
    SECOND: CPUGET  R1,TSV
            NOP
            NOP
            STORE   R1,R2,svR2
            CPUGET  R1,TPC
            NOP
            NOP
            STORE   R1,R2,svPC
            CPUGET  R1,TMA
            NOP
            NOP
            STORE   R1,R2,svMA
            CPUGET  R1,PSW
            NOP
            NOP
            STORE   R1,R2,svPSW
    

    In sum, the improved version of the code requires 6 fewer clock cycles than the slower version.

  3. Background: The simplified instruction fetch stage discussed near the middle of Chapter 15 omits two important features of the Hawk computer and one important feature of the Sparrowhawk:

    a) What does this diagram imply about the word size of the data fetched from RAM for each instruction, and how does this differ from the Sparrowhawk? (0.5 points)

    It implies a 16-bit data path from memory, since the IR is 16 bits and it increments the program counter by 2 for each cycle.

    b) What additional issue does the Hawk raise that would add even more complexity to this pipeline stage? (0.5 points)

    The Hawk has two instruction lengths, 16 bit and 32 bit, so some instruction fetches would need to increment the PC by 2 and some would increment it by 4.