Exam 2: Final

Solutions and Commentary

Part of the homework for 22C:112, Fall 2012
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Note: These distributions omit two students who did not take the final but remained registered in the class. Those students did very little work all semester. Why they registered for the course is a mystery.

Exam 2

Mean   = 8.38
Median = 8.3

              X               X           X                   X
  ________X_X_X_X_X_______X_X_X_X___X_X_X_X___X_________X_____X_______
   1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15. 16. 17

Exams 1 plus 2

Mean   = 15.65
Median = 15.1
                        X           X
            X           X           X     X         X
  __________X_X___X_X_X_X_____X_X___X___X_X_X_______X_________
   2 . 4 . 6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30

Programming Assignments 1 to 6

Mean   = 21.24                                   X
Median = 21.6                                    X
                                                 X
                 X   X             X             X X
  _________X___X_X___X___X_____X_X_X_____X_____X_X_X______
   6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32

Homework Assignments, top 10 of 12

Mean   = 19.98                          X
Median = 22.4                           X
                            X           X X   X
  __X___X___X_X___X___X_____X_X_____X_X_X_X_X_X___________
   6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32

Total Scores

Mean   = 56.86
Median = 55.6                                               X
                                                            X
              X                                         X   X
  ________X___X_____X_X_X___X_____X_X_X_X___X___________X_X_X_X_______
   20. 24. 28. 32. 36. 40. 44. 48. 52. 56. 60. 64. 68. 72. 76. 80. 84
      D D   + + - -   C C   + + - -   B B   + + - -   A A   + +

Exam Solutions and Commentary

Framework: For each of the following shell command lines, how many Unix/Linux processes are involved, at minimum, including the process running the shell itself? (1 point each of 3)
a) setenv x $y
1 -- because setenv must be a built-in shell command, so no fork operation can be involved in executing it. (For more commentary on this issue, see Machine Problem III and Exam I problem 3.)
4 gave good answers, 7 earned no credit by giving answers involving 3 or more processes. What these processes are supposed to have been doing is a total mystery. How any student who completed the first few machine problems in the class could have had such difficulty is a mystery.

b) ./a.out "test string" < infile > outfile

2 -- the original shell process plus the process forked off in order to execve() the file ./a.out. I/O redirection with < and > does not require creating any additional processes.
6 did well, while 7 earned no credit by giving answers involving 4 or more processes.

c) ./a.out "test" < infile | ./a.out "string" > outfile

3 is the bare minimum, 4 is quite likely. One for the original shell, one forked to launch ./a.out "test", one forked to handle the right side of the pipe, and possibly a 4th process to launch ./a.out "string".
6 earned full credit saying 3 processes, 4 earned full credit saying 4 processes. 5 earned no credit by giving answers involving 6 or more processes.
Background: Consider a character-sequential input device dev that has the usual interface registers: dev.data, the memory address of the device data register, dev.stat, ditto for the device status register, with a READY bit, and similarly, dev.cont with an ENABLE bit that permits (if set) the ready bit to request an interrupt.
The device driver includes an interrupt service routine and a user-side routine, communicating through the usual bounded-buffer FIFO queue. But there are multiple user processes, with interprocess synchronization done using semaphores where appropriate. Assume a multicore processor where interrupts are handled, at random, by either core, and assume memory-mapped I/O. If you assume the existence of additional fields of the device data structure, briefly document them (type and initial abstract value will suffice).
Deep in the user side of the I/O driver, the following code dequeues one character from the device's input queue:
```
a)       dev.queue.data.wait()            -- wait for data to be available
        _______________________________
        dev.queue.mutex.wait();           -- enter critical section
        ch = dev.queue.buffer[ dev.queue.head ];
        dev.queue.head = (dev.queue.head + 1) % dev.queue.size;
b)       *dev.cont |= ENABLE              -- signal space is available
        _______________________________
        dev.queue.mutex.signal();         -- exit critical section
```
For parts a) and b), write in your exam book what goes in the corresponding blank. Hint: One of the blanks above cannot be a semaphore operation. (1 point each of 2)

See above, but note the material in part e) that also applies to part b).
4 got part a) right, 5 earned no credit. Common errors earning partial credit included very informal statements of the right idea, or vague statements of some kind of waiting on some kind of semaphore.
Non gave really good answers to part b), and only 6 earned partial credit. Again, informal statements of the right idea were typically granted partial credit. Among wrong answers, many suggested using a semaphore, but interrupt handlers cannot block awaiting a semaphore.

c) Who are we competing with in the critical section above? Other processes in general? Others trying to read from the same queue? Interrupts in general? The interrupt handler for the same device? Explain. (1 point)
Other processes trying to read from the same queue. Note that dequeueing only involves the head pointer, so, implicitly, enqueueing only involves the tail pointer.
1 earned full credit, 5 earned no credit. Many who earned partial credit gave reasonable answers and then added odd things to them, others added interactions with the interrupt handler that are not supported by the code that was given.

d) What would be the adverse consequences of entering the critical section first and then waiting data to be available? (1 point)

None, so long as the answer to part c) remains unchanged.
There were no good answers, and only 5 earned partial credit. Wrong answers were clearly linked to the difficulties that students had on part c), but there was also considerable inventiveness and a strong bias toward inventing an adverse consequence instead of admitting the possibility that there wasn't one.

e) Suppose all I really want to do is this, somewhere in the user-side code:
```
        *(dev.cont) = *(dev.cont) & ~ENABLE;
```
But I know that the device control register has bits in it that are manipulated by both user-side code and interrupt-handler code. What must I do to protect this critical seciton, in the context of the architecture specified. (1 point)

This code is a critical section that must be guarded. We need to disable interrupts in order to prevent the interrupt service routine from running on this processor core, and we need a spin-lock to guard against other processor cores that might have either user-side or interrupt side code that manipulates the same variable.
None did well. 1 got partial credit for suggesting use of a spin-lock alone, while 5 got partial credit for suggesting disabling interrupts.
Background: Consider a system with an MMU but no paging to disk. The MMU is used for memory protection and relocation. Each process has its own memory map, with pages marked no-access, read-only and read-write. In addition, pages may be marked as shared or private. System calls are done by traps, and the MMU is turned off whenever any trap service routine runs.
a) The fork() system call obviously begins by creating a new process description. How does it initialize the new process's page table, which pages must it copy, and what changes (if any) does it make to the calling process's page table. (1 point)

First, all non-shared read-only pages are marked as being shared. This is the only change required to the original page table.
Second, for each page marked as shared in the original page table, the new page table has a verbatim copy of the original page-table entry. The same applies to all pages marked as no-access. For non-shared read-write pages, a copy of the original page must be made and then the new page table entry must reference that copy.
8 did well. 5 earned half credit by talking about copying the page table without discussing copying of any pages, while 4 more earned partial credit by confused discussion of copying something.

b) Eliminating paging is restrictive, so now assume this system has demand-paged virtual memory. How does this complicate the implementation of fork()? (1 point)

Now, there is a problem for pages that were on disk at the time of the fork operation and need to be copied. The obvious solution is to force a page fault for each of these pages in order to bring them into RAM and then duplicat the pages. This could involve a large number of page faults and therefore a large number of disk accesses.
6 earned full credit, while 3 more earned partial credit.

c) Is what you identified in part b) the root of a serious problem? If so, what? Hint:Calls to fork() are frequently followed, in the child process, by a call to what kernel primitive? (1 point)

The typical call to fork() on a unix system is followed shortly by a call to execve(). While some variables may be modified between these two calls, the number of changes is rarely large, so most of the duplication suggested in part b) is wasted.
4 did well here, while 2 earned partial credit. Some students, having read about the lazy fork mechanism of BSD Unix and its descendants attempted to vastly complicate their answers by describing that solution. Where these descriptions were clear, credit was granted, but in several cases, the explanations were garbled enough that it was difficult to recognize what the answers were trying to do.
Background: Recall that in C and C++, backslash before end of line allows newlines to be embedded in the text of a define. You find the following declarations in a C++ program:
```
thing * thinglist = NULL;
#define newthing(t) {                        \
        if (thinglist == NULL)               \
                t = malloc(sizeof(thing));   \
        else                                 \
                t = thinglist;               \
                thinglist = thinglist->next; \
        }                                    \
}
#define freething(t) {                       \
                t->next = thinglist;         \
                thinglist = t;               \
}
```
a) What is the potential performance advantage of using these defines instead of malloc() and free()? (1 point)

If the underlying heap manager does prompt merger of free blocks, this code will eliminate much splitting and combination of blocks. If the underlying heap manager is lazy, this code can still prevent block splitting and speed allocation if it is used for the most common block size, while other block sizes use malloc() and free() directly.
3 did well, 9 earned partial credit by giving either vague or confused answers.

b) What is potentially the worst potential performance disadvantage of these defines compared to malloc() and free()? (1 point)

If the space in the heap is exhausted when there is a call to malloc(), the heap manager has no way to check thinglist to see if there are any blocks there it can reclaim.
3 did well, while 5, for partial credit, discussed memory leaks in a general way without solidly connecting them to the problem. Among those earning no credit, several concluded that, having found this code somewhere in the program implied that this was the only code that would ever call malloc(), and that all dynamically allocated objects would have to somehow be forced into the thing data type.

c) Why use a #define construct instead of defining newthing() and freething() as functions? (1 point)

This avoids the cost of a funciton call and parameter passing.
2 did well, while 2 identified minor advantages for partial credit. Again, wrong answers were hard to classify. Some involved claiming that something was an advantage that either was not present in the example code.

d) Give advice to future programmers: When should you use something like these defines, versus using plain malloc() and free()? (1 point)

If running out of memory is not a problem, using a separate free list for every commonly-used data type can speed up the program.
4 did well. 4 more gave vague advice for partial credit. Wrong answers ranged from "if it works better", with no suggestion of when that might be, to advice about programming style or aesthetics.

e) Suppose the underlying heap manager used to implement malloc() uses a first fit algorithm, and the user uses these defines or something like them for all popular data types. What impact would you expect this to have on fragmentation when compared to code that simply uses malloc() and free()? (1 point)

Using a separate free list for every popular type has the result that there will be much less fragmentation because blocks, once allocated to one particular free list, will not be deallocated and potentially split. Of course, there is a cost (see part b).
One did well, 3 gave vague answers, and the remainder seemed confused, frequently as the result of assuming that the type thing was the only dynamically allocated type on the heap, with all of the consequences of that (horrible but unjustified) assumption.
A short question: Your heap runs from address 25 to address 50 in a word addressable memory (there are no bytes, and pointers occupy exactly one word). For the binary buddy system, give the starting addresses and sizes of all the free blocks making up the heap, after initialization. (2 points)

Address Size
25 1
26 2
28 4
32 16
48 2

5 did well. There was no partial credit. Wrong answers typically involved apparently random block sizes, block sizes that were not a power of two, or blocks allocated at addresses that were not multiples of the block size. Apparently, many students didn't bother reading the notes on the buddy system or didn't examine the solution that was distributed to machine problem 5.
Background: In a conventional device driver, interrupt service routines may be very complex and may contain a variety of device dependent code. Conceptually, however, interrupts are merely signals from a device that the device is ready. Consider this alternative software architecture for a device driver: Each device has its own semaphore. The interrupt service routine does exactly the following two things: signal the device's semaphore, and turn off the interrupt enable bit for that device. Each device driver contains a process, the driver process, that iterates once for each signal sent to the device's semaphore.
a) The I/O driver for a sequential device would still include a FIFO queue. What other top-level software components would be included in the I/O driver. (1 point)

The user side code of the driver would remain largely unchanged.
4 got this, 2 earned partial credit for vague answers. Many earned no credit by enumerating components that are implied by the presence of the queue (enqueue and dequeue methods, for example).

b) From the programmer's perspective, what is the benefit of this new architecture for device drivers? (1 point)

It allows semaphores and other conventional synchronization and interprocess communication mechanisms to be used in most of the I/O driver while limiting the need for the kind of special code required in interrupt handlers.
2 got this, 5 earned partial credit for vague answers.

c) What potential performance problems does this new architecture for device drivers cause? (1 point)

Most of the code formerly in the interrupt handler must now be in the driver process. As a result, this code runs only when the scheduling policy permits, with interrupts from other processes enabled. As a result, the time between the interrupt and the software response to that interrupt cannot be guaranteed.
Nobody earned full credit. 2 earned partial credit. Most of the answers paid no attention to performance and focused instead on difficulties the programmer might face in writing code in this new framework.

d) Suppose you were asked to write a Demos driver for a simple sequential device. Interprocess communication in Demos involves messages to server processes, so the I/O driver centers on a server process. If your goal (following the above outline) is to make the interrupt handler do the absolute minimum possible work, what should it do? (1 point)

It should send a Demos-style message to the driver process and disable the device's interrupt enable bit.
1 got full credit, 2 earned partial credit for vague answers. It was apparent that several students must not have read the assigned readings about Demos.
Background: Modern flash memory has a serious problem: Each flash-erase cycle (required to clear a block before writing it) slightly wears out that block of memory. Some flash chips are still on the market that are only warranteed for 100,000 flash cycles per block.
Obviously, in a flash file system, each "disk" write should be to a new block, so if I change one block of a file, that block should be indexed as free space and a different block allocated to hold the new data. If we organize the free space on the memory as a FIFO queue, this policy levels out the wear, so it is conventionally referred to as a wear-leveling policy.
a) Suppose /tmp/x is a file with 2 data sectors, and our file system stores each file's i-node as a separate disk block. The user does a write operation that changes one sector of x. How many blocks on disk must be flash-erased to record this change? Identify each block involved. (1 points)

Changing a sector of /tmp/x requires changing one of the addresses in the i-node for /tmp/x. This requires changing the i-number of the file, which is stored in the directory /tmp. Since we just changed a directory entry, we must also change the i-node for /tmp. This requires a change to the root directory / and its i-node. The net result is a change to 6 disk blocks.
2 got full credit, 2 earned partial credit for vague answers, typically not distinguishing between i-nodes and the files they describe.

b) Is there some block in the file system that causes big problems with wear leveling, bigger than the problems encountered with all other blocks? (1 point)

If any change to any sector on disk changes the i-number of the root of the file system, then how do we find the root to begin with?
6 earned full credit, 4 earned partial credit, typically by failing to distinguish between i-nodes and the files they describe.

c) Does the use of a software disk cache potentially reduce the severity of the wear leveling problem? (1 point)

Yes. The cache can reduce the number of write operations performed on the underlying flash memory by combining multiple changes to user files into single writes. It can also combine changes to i-nodes and directories in the same way.
11 earned full credit, 3 earned partial credit. In many cases, this must have been guesswork, since many students evidently didn't understand the underlying question.

Address	Size
25	1
26	2
28	4
32	16
48	2