Homework 9 Solutions

22C:116, Spring 1999

Due Friday Apr 9, 1999, in class

Douglas W. Jones

Problem 5 on page 505: If the coordinator crashes and is then restarted, this will not cause problems so long as no user has an outstanding request for the resource at the time of the crash, and so long as no user makes a request between the time of the crash and the time of the restart.
If the server broadcasts a message on crash recovery (perhaps "I just recovered from a crash"), and if the response of a user that had been awaiting an acknowledgement is to re-pose their request, while the response of a user to which the resource had been granted is to assert that they have not yet relinquished, the system can be made fault tolerant. In the event that no user replies "I have the resource", the server learns that the resource is free and available to the first process posing a request.
Another solution would be to have requesting processes periodically re-pose their requests, and to have processes holding resources periodically re-assert their claim on the resource. This way, no broadcast is required.
Problem 6 on page 505: The requirement that each process immediately acknowledge requests to enter critical regions (answering "NO" to requests that cannot be granted immediate entry, "YES" to those that can) allows Ricart and Agrawala's algorithm to tolerate crashes of any process that is not in the critical section and not awaiting entry. (No response of "YES" or "NO" is interpreted as "YES"). If, however, a process crashes between the time it answers "NO" and the time it sends a deferred answer of "YES" -- that is, the process crashes while it is blocked awaiting entry to the critical section, or while it is in the critical section, the contents of that process's queue will be lost and waiting processes may be blocked indefinitely.
Problem 7 on page 505: If we have two independent resources being managed by Ricart and Argawala's algorithm, the algorithm operates completely independently with regard to the two resources. So, each request and reply must indicate what request is being requested, and each process must maintain separate queues for each resource. As a result, the inherent structure of the algorithm itself does not create any special risk of deadlock.
However! If two processes are competing for two resources, no matter what mutual exclusion mechanism we use, deadlock is possible. The question in the book specifically asks about independent critical regions. If we interpret the word independent to mean that the regions are not nested or allowed to overlap, then no process can simultaneously enter more than one region, which is equivalent to saying that no process can simultaneously hold more than one resource. This reading eliminates the possibility of deadlock.
It is more likely, however, that Tannenbaum intended the word independent to mean that entry and exit from the different critical regions may each be done at any time. This is the typical case with real parallel programs, where processes may indeed hold multiple resources simultaneously and where nested critical sections are common. In this case, deadlock is certainly possible, but this does not depend on the specific use of Ricart and Argawala's algorithm.
Problem 10 on page 505: If each election message contains the ID of the process that initiated it, and if each process participating in an election remembers the ID of the initiator from the time it learns of the electon until the time the results are distributed, we can kill duplicate election messages as follows: On receipt of a call for election initiated by A, if a process is currently participating in an election initiated by B, the process IDs A and B are compared. If the new ID A is less than B, the call for election by A is discarded. If the new ID A is greater than B, the process participates in the election called by A and forgets about the election previously called by B.
Problem 11 on page 506: While a disk can simulate a tape, for most purposes, the failure dynamics of disk and tape are quite different! Consider the impact of simple power failures on the two technologies: With tape, the data that is stored on that part of the tape that is wound up on the reels cannot be damaged. Only that part of the tape under or very near the tape head is subject to corruption. On disks, however, a power loss during a write operation can result in head motion, with the real possibility that a spiral track of nonsense will overwrite data on a considerable area of the disk. On disk, the threat of loss is not only on recently written data, but on data written at arbitrary times in the past.
Problem 12 on page 506: Stable storage can easily be extended to an arbitrary number n of disk drives. To write, the data, time stamp and checksum is written to each disk drive in turn. To read, the data from all disk drives is read, copies with bad checksums are discarded, and among the copies remaining, one of the copies with the most recent time stamp is used. If any copies with recent time stamps and valid checksums differ, a majority vote can be used to pick the single value of the data to be used.
Problem 16 on page 506: The atomic transaction is actually committed only when the coordinator writes the commit to the coordinator's log. If Lamport's stable storage is used for this log, the actual commit occurs as soon as one good copy of the updated log, with checksum and timestamp, is written to one of the redundant log storage devices.
Problem 21 on page 506: Many things in the world can be reset or undone, but many other operations are difficult or impossible to undo. For example, with financial transactions, a transfer of money between two bank accounts is easy to undo if the system must roll back to a checkpoint, but it is very hard for software to reclaim cash that an ATM has dispensed. Similarly, it is hard to un-print hardcopy output, so once a report is printed, there is no way for software to guarantee that the report is destroyed and not acted on by someone somewhere.