Assignment 13, due Jul 31
Part of
the homework for CS:2630, Summer 2018
|
On every assignment, write your name legibly as it appears on your University ID card! Homework is due on paper at the start of class on the day indicated (Tuesday or Thursday). Exceptions will be made only by advance arrangement (excepting "acts of God"). Late work must be turned in to the TA's mailbox (ask the CS receptionist in 14 MLH for help). Never push homework under someone's door!
a) Write a sequence of Hawk instructions that maps the first page of the Hawk ROM (containing the entire Hawk monitor) to frame 0 of the virtual address space, and sets the access rights appropriately. (0.5 points)
LIS R3,0 CPUSET R3,TMA LIS R3,#1B CPUSET R3,MMUDATA
The above answer is short but not easy to read. Perhaps we should give symbolic definitions of the bits in the MMUDATA register, something like this:
; bits in the MMUDATA register MMUVALID= #01 MMUEXEC = #02 MMUWRITE= #04 MMUREAD = #08 MMUCACHE= #10 MMUGLOBL= #20
Because the TMA and MMUDATA registers use identical formats for the page number, and because the low bits of the virtual address are ignored when setting the maping registers in the MMU, We can make it more obvious that we're setting the virtual and physical addresses identical with the following code:
LIS R3,#00000+MMUVALID+MMUEXEC+MMUREAD+MMUCACHE CPUSET R3,TMA CPUSET R3,MMUDATA
b) Write a sequence of Hawk instructions that maps the first page of the Hawk RAM to frame 1016 of the virtual address space, and makes the mapping and sets the access rights appropriately. (0.5 points)
LIS R3,#10000 CPUSET R3,TMA LIS R3,#1001F CPUSET R3,MMUDATA
Or, in more symbolic form:
LIS R3,#10000+MMUVALID+MMUEXEC+MMUREAD+MMUWRITE+MMUCACHE CPUSET R3,TMA CPUSET R3,MMUDATA
FIRST: CPUGET R1,TSV CPUGET R3,TPC STORE R1,R2,svR2 STORE R3,R2,svPC CPUGET R1,TMA CPUGET R3,PSW STORE R1,R2,svMA STORE R3,R2,svPSW SECOND: CPUGET R1,TSV STORE R1,R2,svR2 CPUGET R1,TPC STORE R1,R2,svPC CPUGET R1,TMA STORE R1,R2,svMA CPUGET R1,PSW STORE R1,R2,svPSW
These two pieces of code do exactly the same thing. The first solution uses two registers, the second uses only one. If we did not have pipelined computers, the two alternatives would be equally fast, but on a pipelined machine, one alternative is distinctly faster than the other.
a) Which alternative is faster on a pipelined computer, and why? (0.5 points)
The first one is faster on a pipelined computer because there is one extra instruction between each CPUGET and the STORE instruction that saves the corresponding result.
b) How much faster? You can answer this by careful use of a pipeline diagram. The answer will be a number, a count of the number of clock cycles saved by the faster solution when compared to the slower solution. (0.5 points)
If we assume the 4-stage pipeline from the text, and if we assume that CPUGET updates the registers in the RS (result-save) pipeline stage, while STORE gets the value of registers in the OF (operand-fetch) stage, then there are 2 delay slots from CPUGET to STORE. We have only filled one of these slots with the above code, so the run-time will be comparable to this, using NOP instructions to fill each delay slot:
FIRST: CPUGET R1,TSV CPUGET R3,TPC NOP STORE R1,R2,svR2 STORE R3,R2,svPC CPUGET R1,TMA CPUGET R3,PSW NOP STORE R1,R2,svMA STORE R3,R2,svPSW SECOND: CPUGET R1,TSV NOP NOP STORE R1,R2,svR2 CPUGET R1,TPC NOP NOP STORE R1,R2,svPC CPUGET R1,TMA NOP NOP STORE R1,R2,svMA CPUGET R1,PSW NOP NOP STORE R1,R2,svPSW
In sum, the improved version of the code requires 6 fewer clock cycles than the slower version.
a) What does this diagram imply about the word size of the data fetched from RAM for each instruction, and how does this differ from the Sparrowhawk? (0.5 points)
It implies a 16-bit data path from memory, since the IR is 16 bits and it increments the program counter by 2 for each cycle.
b) What additional issue does the Hawk raise that would add even more complexity to this pipeline stage? (0.5 points)
The Hawk has two instruction lengths, 16 bit and 32 bit, so some instruction fetches would need to increment the PC by 2 and some would increment it by 4.