22C:116, Lecture Notes, Mar. 1, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

Deadlock
Deadlock occurs when two or more processes are blocked, where each process is blocked awaiting some action by another blocked process.
A process might be blocked awaiting a message from another process, or it might be blocked awaiting a resource currently held by another process. If all the processes in a deadlocked group of processes are blocked awaiting messages, the deadlock is described as a communication deadlock; if all processes in a deadlocked group are awaiting resources, it is a resource deadlock. Mixed cases where some processes in the group are awaiting resources and others are awaiting messages are quite possible.
In either case, whether resources or messages (or both) are involved, the key is that there is a cyclic wait.
In some older texts, the possibility of message passing is completely ignored, and deadlocks are defined entirely in terms of resource contention.
Resource Deadlocks
Deadlocks resulting from resource allocation have been widely studied since the late 1960's. Three distinct issues have been raised in this context:
```
    1) Deadlock Detection
    2) Deadlock Resolution
    3) Deadlock Avoidance
```
Once a deadlock is detected, it must be somehow resolved. Deadlock resolution is frequently destructive -- so only fault tolerant applications can survive it. Therefore, if it is possible, deadlock avoidance is better.
Deadlock Detection
Since a deadlock involves is a cycle of processes waiting for something, the standard algorithms for deadlock detection convert the problem into graph theoretic terms and then use various cycle finding algorithms on the resulting graph.
The following notation is traditionally used in graph theoretic formulations of the deadlock detection problem:
```
   Allocated        Process that
   resource         has the resource
      _                _
     |_|------------->(_)

   Waiting          Resource the
   process          process waits for
      _                _
     (_)------------->|_|

   Waiting          Process that
   process          should send message
      _                _
     (_)------------->(_)
```
The graph has vertices for each process (round) and resource (square). The graph has edges representing the "waits for" relation. A process can wait for a resource, a resource can wait to be released by a process, and (in message passing systems) a process can wait for a message from some other process. The latter is a modern extension to this old notation.
Here is an example "wait for graph" describing 4 processes, A, B, C and D, and 4 resources, Y, Y, Z and W. A is waiting for X, but B has both X and Z. C has W and is waiting for Y, and D has Y and is waiting for W. Does a deadlock exist?
```
            Processes     Resources
                _            _
              A(_)--------->|_|X
                        ___/
                _      /     _
              B(_)<===    ->|_|Y
                       \/ __/
                _ _____/\/   _
              C(_)<--   /\__|_|Z
                      \/
                _     /\____ _
              D(_)<--    -->|_|W
                 \_____/
```
There is a deadlock involving processes C and D and resources W and Y.
Detecting a cycle in a graph involves traversing the cycle! Thus, the cost of deadlock detection is related to the number of vertices in the cycle, which could be as large as the number of vertices in the graph. As a result, deadlock detection algorithms scale poorly growing in expense as the system grows in size.
As long as a the only wait operations involve processes waiting on resources, the deadlock graph is formally a bipartite graph, that is, a graph where there are two sets of vertices (round and square, here), and where all edges either connect a vertex in one set with a vertex in the other. If processes may wait for messages, the graph is not bipartite, but this has no effect on the relevant algorithms.
Basic graph algorithms are a standard part of undergraduate courses on data structures! In short, to find cycles in a graph, you run a modified recursive tree-traversal algorithm, starting at any vertex. At each uncolored vertex visited by the traversal, the vertex is first colored red, then all vertices reachable from that vertex along edges of the graph are visited, and then the vertex is colored green. When a green vertex is visited, nothing is done. When a visit is made to a vertex that is already red, this indicates that the graph contains a cycle. If, after the traversal is completed, there are vertices that are still uncolored, the graph contains components that are unreachable from each other, and a not-yet-colored vertex must be used as the root of a new attempt at traversing the remainder of the graph. On a connected graph of n vertices, this algorithm runs in O(n) time.
Deadlock Resolution
Once a deadlock is detected, what should be done? The classic solution is to break the cycle by aborting a process in order to free its resources.
What process should be aborted? The obvious answer is to abort the least important process, but this requires a ranking of processes by importance. Process priority, for those operating systems that support such a notion, may or may not relate to importance.
An alternative is to return an error code on the resource allocation request that would have led to a deadlock. If the application is unprepared to deal with this error, it may be aborted, but note that this may mean that the wrong process is aborted, since the last process to join a deadlock cycle need not be the least critical process in the cycle. If a process can be coded to anticipate the possibility of allocation failures, this alternative can be quite useful.
The classic solution is clearly not oriented towards communication deadlocks! Furthermore, aborting a process is hard on critical applications, where less drastic solutions are desirable.
In deciding which process to abort, it is common to abort the process that was the last one added to the cyclic wait -- that is, the one that closed the cycle. This is not aloways the right choice, since the order in which processes join a deadlock is unrelated to the amount of damage that would be caused by aborting the processes.
Ideally, the process to be aborted should be the one where that act will cause the least damage, that is, the least critical process. Few systems provide information sufficient to determine how critical a process is, however. The common alternative of aborting the process with the lowest priority approximates this, since high priority processes are frequently more critical than low priority processes, but this isn't necessarily so.
Communications deadlock cannot be resolved by aborting a process! Aborting all processes in the deadlocked group is possible, of course, but it is hardly desirable, especially if fault tolerance is one of the system goals.
In a fault tolerant system, when a communications deadlock is detected, an appropriate action to take would be to break the deadlock by sending an error message to one of the waiting processes or by causing the call to Receive to return an error code or raise an exception instead of blocking forever awaiting a message that will never arrive.
Deadlock Avoidance
There is a trivial solution to the problem of deadlock avoidance in a pure resource allocation context: All you need to do is prohibit processes from incrementally allocating resources.
This trivial solution has been widely used. You simply require each process to declare all resources it might want as a field of the job card. It is clumsy, and it requires that the process hold all resources it might need for the duration of its run even if it only needs some of them briefly.
Another less trivial solution is to require that all resources be acquired in some order. For example, if any process requiring both use of the modem and use of the laser printer wishes to acquire both, it must acquire the modem first and the printer second. As long as this rule is enforced, it will be possible to guarantee that the system is deadlock free with regard to these two resources because it will be impossible for one process to claim the modem and then request the printer while the other claims the printer and then requests the modem. Total orderings over all resources are possible, and many applications can be rewritten to comply with such an ordering constraint.
The Banker's Deadlock Avoidance Algorithm
In the early 1970's, Dijkstra described a general deadlock avoidance algorithm, applicable in any resource allocation context. He called this the banker's algorithm, and he formulated it in terms of a bank with many customers taking out loans.
```
  Each customer has
    Ai, the amount of money currently borrowed
    Li, the credit limit (the maximum the bank will loan)

  The bank begins with a total of
    C cash on hand.

  The bank is not in immediate danger if:
    R-(sum Ai) >= (min Li-Ai)
          i           i
```
The left side of the above relation represents the cash on hand in the bank. The right side indicates the maximum additional loan any one borrower is entitled to request. As long as the above inequality holds, some bank customer will be able to borrow enough money to reach his or her credit limit.
For many customers, it may be necessary to borrow more money before they can begin to repay any, but no customer will need to borrow more than their limit before being able to start repaying. Thus, as long as some customer can borrow to the limit, that customer will be able, eventually, to repay their loan.
Dijkstra showed that a system is in a safe state, that is, in a state guaranteed to not lead to deadlock, if the state resulting from the customer with the minimum Li-Ai repaying their loan is also safe. This inductive definition of safe system states rests on the observation that the initial state, when the bank has all of the cash, is inherently safe as long as no customer has Li greater than the total available money supply.
Use of The Banker's Algorithm
Given an algorithm to determine if the system is safe, we can simply deny any request to adjust the credit limit of a user that would put the system into an unsafe state, and we can simply block any allocation request that puts the system into an unsafe state.
As long as processes have a finite lifetime after which they deallocate everything they hold, Dijkstra proved that this scheme guarantees that such blocking will never lead to deadlock1
Effectively, before granting any loan request, the Banker must ask not only "Do I have this money on hand?" but also, "Will I have enough left to allow someone to borrow to their limit, and if they did so and repaid their entire loan, would I have enough to let some other customer do the same, and so on?" If not, the banker must tell the customer to wait.