Assignment 11, Solutions

Part of the homework for 22C:60 (CS:2630), Spring 2013
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: In the final machine problem, the prologue for the trap handler saves all of the registers in a trap save area, and the epilogue restores all of them. Within the handler, the saved registers represent the state of the user's computation. Within the handler, R2 points to the trap save area, so the instruction LOAD R3,R2,svPC, for example, will load the saved program counter into R3.
a) Write code to fetch the instruction halfword that caused the trap into R4, assuming that the saved program counter has been loaded into R3 using the instruction given above. (0.4 points)
```
        LOADS   R4,R3
        EXTH    R4,R4,R3        ; get halfword the saved PC points to
```
b) Write code to check to see if the instruction that caused the trap (already in R4) was the LIW instruction. Note that this would be as easy as a simple CMPI instruction except that there is a 4-bit destination register field in the instruction halfword. (0.4 points)
First, here is an easy solution, leaving R4 holding the instruction (because if it is the LIW instruction, we will need the destination register field). This solution simply uses the AND instruction to turn off all the bist other than the opcode bits.
```
        LIL     R5,#FFF0        ; R5 = mask
        AND     R5,R4           ; R5 = instruction halfword with Rd zeroed
        CMPI    R5,#00F0
```
Second, here is a clever solution that also extracts the destination register field, leaving that in R4 and then using it to clear out the corresponding bits of the instruction halfword in R5.
```
        MOVE    R5,R4
        TRUNC   R4,4            ; R4 = Rd field of instruction
        SUB     R5,R5,R4        ; R5 = instruction halfword with Rd zeroed
        CMPI    R5,#00F0
```
c) Suppose you wanted to call PUTHEX from within the trap service routine. What is the correct value of ARSIZE to use in the calling sequence? (0.4 points)

ARSIZE = svSIZE
This would apply if for the first call within the handler. From there on, ARSIZE is determined the same way it is for any other subroutine.

d) Knowing that your trap service routine might call PUTAT, PUTHEX and PUTS, how can you go about figuring out how much extra space to allocate in the register save area beyond the minimum reqsuired to save the registers? (0.4 points)

Look at the source code in the monitor for the activation record sizes required by all of the subroutines you might call, and if one subroutine calls another, add up the activation record sizes along the call chains and take the maximum.
The narrowest possible reading of this question does not require that you actually do the work, but doing so eliminates any doubt about whether you actually understood the methodology you described for finding the size. So, looking in the copy of monitor.a that is posted on the web,

PUTAT
ARSIZE = 4 and it calls TIMES.

TIMES
Does not use an activation record.

PUTHEX
ARSIZE = 4 and it calls PUTCHAR.

PUTCHAR
Does not use an activation record.

Does not use an activation record, and it calls PUTCHAR.

Conclusion: To call these routines, the stack needs one extra word beyond what it might otherwise need.

e) The built-in trap handler in the Hawk monitor outputs one more register than the assignment requires for your own instruction trap handler. What is this extra register and why don't you need to output it for the instruction trap handler? (Hint: See Chapter 14 of the notes, which you are expected to have read for this week.) (0.4 points)

TMA, the trap memory address register. This register records the memory address that caused a bus trap, but since we are writing code for the instruction trap handler, it is irrelevant.
Background: Computer science theory says that sorting an array of n elements takes O(n log n) time. As a result, if you plot the time taken to sort versus the array size on semi-log graph paper (that is, on graph paper with a linear scale for time and an logarithmic scale for the array size), you should get a straight line.
a) Sketch an appropriate semi-log graph layout for showing the time taken to sort n elements, with time units (of unknown size) on the appropriate axis and memory size in steps from 8 to 1024 kilobytes, with each step being twice the size of the previous step (8, 16, 32, 64, etc). (0.3 point)
b) Assume your machine has just one level of cache (an oversimplification on most modern machines) with a capacity of 64 kilobytes. Imagine the effect of this on an experimental study of sorting algorithms as you vary the size of the array being sorted from 8 kilobytes to 1024 kilobytes. Draw the approximate curve you would expect to see on the semi-log graph you drew for part a. (0.3 point)
Note: No matter what the hardware, it will always take longer to sort a larger array. Therefore, in all cases, run-time will always increase monotonically with the size of the set being sorted.
Note that there is a conflict in the problem statement: It should have said one of the following:
- searching an array of n elements takes O(log n) time.
- you get a hard to draw upward sweeping curve.
Here, we assume the former because the graphing problem is a bit messy if we assume the latter.

The graph shown for part a) is just a straight line, and a narrow reading of part a) did not require plotting this graph. (The question asked for the graph layout, not the graph).
The graph shown for part b) has a "knee" at the cache size. So long as the data structure is smaller than the cache, the sorting is faster, so the slope of the line is shallower. Once the data structures no longer fit in the cache, the slope is steeper because part of the data structure spills over into main memory.
Neither graph is shown as going through the origin. This is realistic for two reasons. There is no zero point on the logarithmic scale of the graph. Each step to the left is one half the step before it. The second reason is that when we talk about "big O notation" or say "the run time is on the order of the log of the array size," we are ignoring constant costs and ignoring constant factors.
A Question: What is the difference between a write-through cache and a write-back cache? (0.4 point)

A write-through cache only speeds up reads from memory and has no impact on write operations. A write-back cache also tries to speed up write operations. by holding copies of the data written to memory and only doing actual writes later. Obviously, write-back caches are harder to build.