Homework 10 Solutions
22C:116, Spring 1997
Ioana M. Lungeanu
PROBLEM 1 PART A
Each process has a QueueTable from which it may receive
messages on links it has created, and a LinkTable which it
may use to transmit messages. Each link in the LinkTable
designates some queue in the QueueTable of some process.
Multiple processes may send to a queue, so this is an
or-model problem. Therefore, the deadlock detection
algorithm tries to find a knot in the wait-for graph.
A kernel detecting the fact that one of the processes
on its machine has been waiting for some time initiates the
deadlock detection algorithm. The kernel constructs links
for each queue on which it is waiting in its QueueTable
and broadcasts these links to all the other kernels in the
system, with a message saying "These links may be involved
in a deadlock." This message has a unique ID on it that
identifies this round of deadlock detection.
On receiving such a broadcast message, a kernel checks
the LinkTables of each process it manages. If one of the
links from the message is in that table, it checks the state
of the process. If the process is running (not blocked) it
sends a reply message saying "no deadlock". If the process
is blocked and has not been queried in this round of
deadlock detection, it broadcasts a deadlock detection
inquiry for all of the queues that process is blocked on.
In this inquiry, it uses the unique ID previously
established.
If a kernel receives a deadlock inquiry concerning a
process for which it has already begun deadlock detection
for this round, it records the origin of the inquiry (in
order to send an eventual reply) and does nothing.
Once a deadlock check is initiated for a process, the
kernel on that machine awaits a reply of "no deadlock" from
other machines. If such a reply is received by a process,
it sends the reply to all processes which had sent an inquiry
to it. This guarantees that all processes not involved in
deadlock will learn this fact.
It is harder to guarantee that all processes involved in
a deadlock are informed of this in a finite time! If no
"no deadlock" message is received within some time limit,
a deadlock could be declared, but this isn't formally
adequate.
If we require that every kernel reply to each broadcast
inquiry involved in deadlock detection, we can get a formally
correct solution. The replies are either "not involved",
which means that this kernel has no processes satisfying the
inquiry, or "no deadlock", as above, or "blocked" if, for
every blocked process running under this kernel, all replies
have been received and none say "no deadlock".
PROBLEM 1 PART B
If the kernels maintain, for each each incoming message
queue, a table of which processes have links to that queue,
then instead of broadcasting inquiries to every other kernel,
messages will be transmitted only to those kernels that are
relevant, and those messages will identify what processes are
involved.
The solution given in part A scales poorly because the
amount of broadcast traffic grows with the system size. The
solution involving inverted link tables scales far better
because each process will typically have to pass on deadlock
inquiries to a limited number of other processes and await
replies from only those processes.
PROBLEM 1 PART C
When a link is moved to a new process's link table (by
being included in a message), or when a link is deleted
from the link table of some process, a message must be sent
to the kernel managing the process that is the target of the
link telling it about the change.
PROBLEM 2
If scatter-gather primitives in the DMA hardware are
available, the actual data need not be copied several times,
with different headers and/or trailers. Trailers, headers
and data can be picked from different memory addresses and
assembled/disassembled as necessary, without multiple copies
of common parts. This is possible for layers that don't
change the actual data. For example, in the ISO/OSI protocol
the presentation level might perform data conversion.
PROBLEM 3
In a system in which every machine is expected to suffer
many transient failures, stable storage could be implemented
as follows:
In addition to duplicating the stable storage on machines
that are unlikely to have correlated failures, the stable
storage must be checked periodically. If, on checking, one
of the copies is found to have been corrupted (as evidenced
by a parity error or invalid checkusm) a correct copy is used
to fix the error.
There must be at least two processes prepared to make
periodic checks (since the transient failures could kill one
of these processes), and if one of these processes detects
that the other is dead, it must restart it.
Given that failures to any memory cell occur randomly
once every f seconds, the rate at which copies are checked
should be somewhat more frequent than once every f seconds.
With a sufficiently high rate of checking or a sufficiently
large number of copies, overall system reliability may be
made arbitrarily high.
PROBLEM 4 PART A
The pseudo-code for a nonblocking send using a blocking
send and threads:
nbsend(i,r) {
spawn(proc, i, r);
}
proc(i,r) { /* runs as a separate thread */
send(i,r);
die();
}
PROBLEM 4 PART B
The underlying system must schedule the threads to run
in the order of their creation, for the messages to be sent
in the intended order. The communication links have to be
FIFO links, for the messages to actually be received in the
order of transmission.
PROBLEM 4 PART C
Consider the following data structure for blocking
message exchange ports:
Blocking send( msg, queue )
transmit [msg,return-address] to queue
await reply
Blocking receive( msg, queue )
await [msg,return-address] from queue
send reply to return-address
A timeout-acknowledge scheme can be added to the send; this
requires that serial numbers be added.