22C:116, Lecture Notes, Lecture 6, Spring 2002

Douglas W. Jones
University of Iowa Department of Computer Science

Classical Interprocess Communication
First, consider the notational problem: How do you describe the creation of new parallel programs?
Conway, in his classic 1961 paper, was thinking about one of the first multiprocessor systems, and came up with the following general notation:
```
    | A |
     \_/
      |
    __v__ fork
   _/   \_
  / \   / \
 | B | | C |
  \_/   \_/
    \   /
    _\_/_
      |   join
      v
     / \
    | D |
```
This notation implies that statement A is executed before B and C, both of which are completed before D. Each of A, B, C and D are potentially compound statements, and statements B and C are executed in parallel.
A later notation for this, introduced after flowchart notations fell out of favor, was the following:
```
A;
cobegin
  B;
  C;
coend
D;
```
The flowchart notation used above to illustrate Conway's ideas is not his original notation, but rather, it is an adaptation of Petri net notation. Note that Conway proposed these ideas in the context of his proposal for a multiprocessor computer system, at the same time that he was developing a clear idea of the distinction between the concepts of process (a software entity) and processor (a hardware entity).
Implementing Parallelism
Conway's notation is symmetrical, but in real systems, he recognized that symmetrical implementations would be unlikely. Instead, using the figure from the previous section as an example, one sequence of statements must be in one process, perhaps A-B-D, and the other statement, perhaps C, must be in a second process; we will call these the parent process and the child process. Conway proposed something like the following:
```
Parent process:
                         Child process:
      A;
      fork(L);               L: C
      B;                        join(V);
      await-join(V);
      D;
```
Note that fork(L) sends control both to label L (the new process) and to the next successive statement (the parent process). In operation, the fork(L) primitive creates a new process description with the PC field of the new process description containing the address of the label L. This process description is put in the ready list and may be run on a different CPU, while the original process may continue to run.
Await-join(V) looks quite a bit like a semaphore wait operation on the semaphore V, used to coordinate the join operation, and join(V) looks like a signal operation on that semaphore, followed by process termination. In Conway's proposal, V was a simple integer initialized to the number of processes that were expected to join using that integer. The join(V) primitive simply decremented V, while the await-join(V) primitive first incremented V and then looped as long as V was nonzero.
In UNIX, this same assymetry is present, although the structured programming facilities of languages like C do quite a bit to mask this:
```
A;
if (fork() == 0) {
   B;               /* child */
   exit();          /* kill child, signal parent */
} else {
   C;               /* parent */
   wait();          /* await signal from child */
}
D;
```
In the above, statements A, C and D are executed by the parent process, while the fork() primitive creates a child process that executed B. The fork primitive duplicates the entire parent process, so that the parent and child are running identically the same code, with copies of the same variables, and copies of the same stack; at the instant fork returns in each of the two processes, the program counters are even equal. The only difference is that fork() returns zero in the child process, while it returns the process ID of the child to the parent.
Exercise: Look up the fork(), exit() and wait() primitives in the (on-line) Unix Programmer's Reference Manual. They are in section 2 of the manual (system calls); the UNIX man command under the UNIX shell searches this manual; the command man 2 x will search for the keyword x in section 2 of the manual.
Introduction to Mutual Exclusion
Consider the problem of counting the nonzero elements of a 1,000,000 element array. Here is a parallel solution, expressed informally using two processes:
```
cobegin
   -- process A
   for i := 1 to 500,000 do
     if a[i] <> 0
       then count := count + 1;

   -- process B
   for j := 500,001 to 1,000,000 do
     if a[j] <> 0
       then count := count + 1;
coend
```
After processes A and B have completed, you'd expect that count would correctly reflect the desired quantity, but there is a significant chance that it won't, because the operation of incrementing count will most likely be accomplished by some sequence of operations such as the following:
```
  load register with count
  increment register
  store register to count
```
Unfortunately, even on a uniprocessor, it is possible that both processes A and B will attempt to execute the this same instruction sequence at the same time, as illustrated below:
```
       process A           process B

  load rA with count
                       load rB with count
                       increment rB
                       store rB in count
  increment rA
  store rA to count
```
It is not hard to see that each time something like this occurs, one or more increment operations will be miscounted. On a uniprocessor using preemptive scheduling once per timeslice, once the CPU is preempted from process A and given to B as shown above, one full time-slice of computation by process B will be lost when process A regains control!
To prevent this, it is essential that only one process at a time attempt to increment count. Therefore, we say that this sequence of instructions is a critical section, or that access by processors to count must be mutually exclusive.
Simple Mutual Exclusion on Uniprocessors
On a uniprocessor, mutual exclusion may be provided by disabling interrupts on entry to a critical section and reenabling them on exit.
```
  disable interrupts
  load register with count
  increment register
  store register to count
  enable interrupts
```
The disadvantages of this are:
It requires that all critical sections be in privileged code (user code is typically unable to disable interrupts because doing so risks crashing the machine).
It is not selective. If one process is incrementing some variable, it only needs to exclude other processes that operate on the same variable!
Nonetheless, the code given in the previous lecture for implementing a scheduler relied on this method. We disable interrupts on entry to each scheduler operation such as relinquish or wait or signal, and we reenable interrupts after we have completed any changes to the scheduler data structures.
Simple Mutual Exclusion on Multiprocessors
Many multiprocessors include instructions such as exchange(r,m) or test-and-set(m). These are atomic (if one CPU is executing one of these operations, no other CPU may intervene). Exchange swaps the contents of a register with a location in memory, while test-and-set sets a memory location to one while testing its previous value.
Given one or another of these operations, the variable count used in our running example may be protected with a new variable called a spin lock. As a convention, the spin lock protecting a variable, for example, count will be called count-lock. Given this, we can implement our critical section as:
```
  lock( count-lock )
  load register with count
  increment register
  store register to count
  unlock( count-lock )
```
The two states of a spin lock are locked and unlocked; any two values may be used to represent these states, however, we will use zero to represent locked and one to represent unlocked because of the way these values generalize to the use of semaphores. Initially, the lock should be one; lock sets it to zero, and unlock resets it to one. Lock and unlock can be implemented as follows on a machine with an atomic primitive that can exchange the contents of memory wity the contents of a register:
```
  lock( v )
    register := 0;
    repeat
       exchange( register, v );
    until register = 1;

  unlock( v )
    v := 1;
```
Spin locks have the disadvantage that they involve busy waiting. The term spin lock derives from this! While the process is waiting, it is usual to describe the process as spinning around the polling loop or spinning on the lock.
Nonetheless, spin locks are the preferred mutual exclusion mechanism for very short critical sections on shared-memory multiprocessors. Specifically, if the expected waiting time for entry to the critical section in the event of contention is shorter than the time it would take to do two context switches (switching the waiting process out of the CPU and back in again, under the control of the scheduler), this is the preferred method.
On some uniprocessor operating systems, including much of the older code in AT&T UNIX, the poor design of the scheduler or compatability with older versions of the system forces the use of spin-locks. Typically, in high level low-priority code, a relinquish operation is inserted in the busy waiting loop in order to let other ready processes use the CPU while one process waits for the lock. This only works if either a round-robin scheduler is being used or if there is a "relinquish to lower priority" operation -- one that causes headaches when it is implemented by a priority scheduler, but is nonetheless included in many real-time systems in order to allow this approach to mutual exclusion.
The exchange() primitive used above may be easily replaced by many other primitives. For example, some machines, notably the IBM 360 family of mainframes and enterprise servers, have test_and_set(m) as a primitive. This tests memory location m to see if it is zero or nonzero, returning the result of this test in the condition codes. At the same time, it sets location m nonzero.
A more interesting hardware primitive is common on many modern RISC processors. This is called optimistic synchronization, and it was was first widely used on the DEC Alpha processor (later made by Compaq, and phased out after HP bought Compaq). Optimistic synchronization rests on snooping hardware, hardware in the CPU that passively monitors traffic on the system bus that was initiated by other processors in the system. The two operations that use this hardware are called load_locked(r,m) and store_conditional(r,m). These have the normal semantics for load and store, except that, in addition to loading register r from memory location m, load_locked makes the snooping hardware monitor all bus traffic for references to memory location m. The store_conditional instruction will store the contents of register r in location m only if the snooping hardware is monitoring address m and has not seen any references to that address since the preceeding load_locked instruction; store_conditional reports its success or failure, for example, in a condition code, so that the program can take appropriate action. These two instructions can be used to implement the exchange operation as follows:
```
	exchange( m, r )
	   loop
	      load_locked( m, r' )
	      store_conditional( m, r )
	   until success;
	   r = r'
```
Exercise: Write code using load_locked and store_conditional to directly implement an increment operation that will correctly increment a shared variable in a multiple processor environment without need for any other mutual exclusion mechanism.
Exercise: Write code using load_locked and store_conditional to directly implement the spin_lock(k) and spin_unlock(k) primitives.
A General Mutual Exclusion Mechanism
Semaphores can be used to provide mutual exclusion! The variable count used in our running example may be protected with a semaphore; we'll call this count-sem. Given this, we can implement our example critical section as:
```
  P( count-sem )    -- the wait operation
  load register with count
  increment register
  store register to count
  V( count-sem )    -- the signal operation
```
Initially, the semaphore should have a count of one; the P operation sets it to zero, and the V operation resets it to one.
The parallel between this and the spin-lock solution should be obvious! In fact, spin-locks are considered to one implementation of a special case of semaphores called binary semaphores. Binary semaphores are those that never take on values other than zero and one.
Semaphores implemented using the process scheduler are the preferred implementation of mutual exclusion on uniprocessors, and they are the preferred implementation on multiprocessors for those critical sections where the expected waiting time for entry to the section in the event of contention is greater than the time taken for two context switches.
Implementations of semaphores on multiprocessors, however, rest on spin locks to prevent two different processors from manipulating the same semaphore at the same time.
Exercise: Fix the implementation of semaphores given in Lecture 5 so that it will operate on a multiprocessor. Assume you have primitives called spin_lock(k) and spin_unlock(k) operating on spin locks that you can use to add additional protection to the critical sections in the code.
Exercise: Fix the implementation of semaphores given in Lecture 5 so that it will operate on a multiprocessor. Assume you have the load_locked and store_conditional primitives given above.
Exercise: In your solutions to the two exercises above, is there still a need to disable interrupts? This question can be evaluated with regard to absolute correctness, and it can also be evaluated with regard to efficiency.
Other Mutual Exclusion Mechanisms
The problem of mutual exclusion on multiprocessors where there are no atomic exchange or test-and-set operations has been the subject of years of study, during which many of the very ingeneous solutions to the critical section problem have been found. Dekker's solution is oldest of these. In Dekker's solution, each spin-lock is represented by three variables:
```
state[0..1] -- process i sets state[i] to enter section
turn        -- indicates who's turn it is to try next
```
Given this, and that all variables are initially zero, the code for Dekker's lock and unlock operations is as follows, assuming that each process passes it's process ID to the lock and unlock operations:
```
other(pid)
   return( 1 - pid );

lock(pid):
   state[pid] = 1;
   while state[other(pid)] = 1 do
      state[pid] = 0;
      while turn <> pid do nothing;
      state[pid] = 1;
   end loop;
   return;

unlock(pid):
   state[pid] = 0;
   turn = other(pid);
   return;
```
When A is inside the critical section, state[A] is one and state[B] is zero. If both try to enter at the same time, so both state variables are set to one, they both enter the loop waiting for the other to reset their state to zero. One will immediately reassert their claim, while the other is blocked a waiting its turn. Furthermore, on exit from the critical section, each process always sets turn to give priority to the other process, preventing one or the other from hogging the critical section.
This code is difficult to understand! Furthermore, it is difficult to generalize to more than two processes (consider, for example, building a binary tree of processes to generalize it!). There are many more solutions to this problem (consult your textbook for some!)
Lamport's Bakery Algoritm
In a distributed system, we frequently want to think in terms of decentralized algorithms. One of the oldest decentralized mutual exclusion algorithms is Lamport's Bakery Algorithm. This is modeled on the classical algorithm used in bakeries and other retail establishments. In such extablishments, sequential numbers are issued to each customer as the customers enter the shop. When the customers want to access the scarce resource (the clerk behind the counter), they compare their numbers and the customer with the lowest number wins the right to use the resoruce first.
The problem with this is that there must be some way to distribute numbers, A trivial solution is to introduce a very small server to distribute numbers, for example, the reel of numbered tickets by the door where each customer, on entry to the shop, tears off a numbered tag. This solution offers little aid to the programmer!
Before going on to more interesting implementations for distributing numbers, note that clients of such a protocol may make extensive use of their numbers! For example, if the bakery contains multiple clerks, the clients could use their number to select a clerk (number modulo number of clerks). Similarly, in a FIFO queue implemented with a bounded buffer, the number modulo the queue size could indicate the slot in the buffer to be used, allowing multiple processes to simultaneously place values in the queue.
There is a large literature of synchronization algorithms based on the "take a number" scheme. Much of this originates from new work by Gottleib at New York University, where a machine was built, the ultracomputer, that used a hardware solution to the problem of distributing unique numbers to multiple processes with minimal contention.
Lamport's Bakery Algorithm provides a decentralized implementation of the "take a number" idea. As originally formulated, this requires that each competing process share access to an array, but later distributed algorithms have eliminated this shared data structure. Here is the original formulation:
For each process, i, there are two arrays, C[i] and N[i]:
```
        _ _ _ _ _ _ _ _ _ _ _
     C |_|_|_|_|_|_|_|_|_|_|_|
     N |_|_|_|_|_|_|_|_|_|_|_|
    
     N[i] = 0 --> Process i is not waiting for the baker.

     C[i] = 0 --> Process i is not trying to pick a number.

     when
         N[i] = min( for all j, N[j] where N[j] > 0 )
     Process i is allowed into the critical section.
```
In Lamport's bakery algorithm, each process has two associated variables per mutual exclusion semaphore. C is used to help break ties if two processes try to pick numbers at the same time, and N is the number that process has chosen.
Instead of having a central source of numbers, processes pick numbers by examining the numbers held by all other processes, and instead of having the baker post the number of the process allowed to go next, processes decide among themselves who gets to go next.
Here is the algorithm used to take a number:
```
          C[i] := 1;
          N[i] := max( for all j, N[j] ) + 1;
          C[i] := 0;
```
In effect, the customer walks into the bakery, checks the numbers of all the waiting customers, and then picks a number one larger than the number of any waiting customer.
If two customers each walk in at the same time, they are each likely to pick the same number. Lamport's solution is to let them pick the same number, but to make sure that they notice that this has happened and break the tie in a sensible way.
To help the customers detect ties, each customer who is currently in the process of picking a number holds his hand up (by setting C[i] to 1. He pulls down his hand when he is done selecting a number -- note that selecting a number may take time, since it involves inspecting the numbers of everyone else in the waiting room.
A process does the following to wait for the baker:
```
 Step 1:
   while (for some j, C(j) = 1) do nothing;
  
   First, wait until any process which tied with you
   has finished selecting a number.

 Step 2:
   repeat
      W := (the set of j such that N[j] > 0)
      -- W is the set of indeces of waiting processes
      M := (the set of j in W
               such that N[j] <= N[k]
               for all k in W)
      -- M is the set of process indices with minimum numbers
      j := min(M)
      -- j is in M and the tie is broken.
   until i = j;

   Second, wait until your ticket number is the minimum of all
   tickets in the room.
```
Note that if someone else tied with you when you picked a number, he may still be trying to pick his number when you are ready to wait your turn. Therefore, the first thing you do after picking your number is look around to see if anyone has their hand up. As soon as nobody has their hand up, you know that anyone who picked the same number as you picked has now determined what number that is.
This is inefficient, because you might wait a bit too long while some other process picks a number after the number you picked, but for now, we'll accept this cost.
Once you know that any numbers that tied with your number are picked, you begin checking all the numbers in the room to see if you find any numbers smaller than your number, or alternately, you check all the numbers in the room and determine the identity of someone who holds the smallest number.
If you are not the person holding the smallest number, you start checking again. If you hold the smallest number, it is also possible that someone else holds the smallest number. Therefore, what you've got to do is agree with everyone else on how to break ties.
The solution shown above is simple. Instead of computing the value of the smallest number, compute the process ID of a process holding the smallest value. As long as all processes use the same deterministic algorithms to do this, they will arrive at the same conclusion -- at least, they arrive at the same conclusion if they happen to be in the set that are tied for the smallest.
The process that computes its own index as the holder of the smallest ID is the one that enters the critical section.
To return its ticket, a process simply executed the following:
```
          N[i] := 0;
```
When you return your ticket, if any other process was waiting, then on its next scan of the set of processes, some waiting process will find that it is holding the winning ticket.
The useful properties of this algorithm are that: Process i only modifies its own N[i] and C[i], but process i must read the entries for all others.
A distributed implementation of this algorithm can be produced directly by storing N[i] and C[i] locally with process i, and using message passing when any process wants to examine the values of N and C for any process other than itself.
This demonstrates that mutual exclusion can be done without any central authority! Each process must read the variables of all other processes a minimum of 3 times -- once to select a ticket number, once to see if anyone else is in the process of selecting a number, and once to see if it holds the minimum ticket.
It is possible to improve on these minima, but the demonstration that a centralized resource is not needed to assure mutual exclusion is important.
For each process contending for entry to the critical section, there are about 6N messages exchanged, which is clearly not very good. Much better algorithms have been devised, but even this algorithm can be improved by taking advantage of knowledge of the network structure. On an ethernet or on a tree-structured network, a broadcast can be done in parallel, sending one message to N recipients in only a few time units. On a tree-structured network, the reply messages can be merged on the way to the root (the originator of the request) so that sorting and searching for the maximum N or the minimum nonzero N can be distributed efficiently.
On a ring structured network, a token can be circulated indicating the state of the critical section. If a process wants to enter the section, it claims the token the next time the token comes by. If a process doesn't want to enter, it forwards the token everytime it sees it. There are many variations on this kind of approach.
Producers and Consumers
Mutual exclusion is not enough! Low level control over the sharing of data is essential but insufficient to allow productive parallel programming.
Consider a situation where one process produces data to be consumed by another:
```
    Producer --> Consumer
```
The producer can easily fill a shared variable, then let the consumer take the data, but controlled sharing is not enough. There must be some way for the producer to signal to the consumer that data is available.
Signalling as an Abstraction
One way to view this signalling problem is to think of the fork and join primitives.
```
   Process A             Process B
       |                     |
    Produce                  |
   some data                 |
       |                     |
       V                     |
     ----- fork = signal(s)  |
       | \                   |
       |   ------(s)------   |
       |                   \ |
       |  s is a semaphore  ||
       |                  __VV__ join = wait(s)
       |                     |
       |                  Comsume
       |                  the data
       |                     |
```
The control path between the two processes can be considered to be a short-lived process that serves to synchronize the producer and consumer, or it can be considered to be a synchronizing signal of some type. (This understanding of synchronization as being theoretically equivalent to control flow is a relatively new insight!)
In operation, if process A produces data before process B is ready to consume it, the signal reaches the join and waits for process B to arrive at that point. If Process B is ready to consume before process A has produced any data, Process B waits at the join for the signal to arrive.
It is important to note that the semaphore abstraction exactly solves this signalling problem. Each signal operation on a semaphore sends a signal that can be awaited by a process using the wait operation on that semaphore. If the signaler sends the signal first, the waiting process is not blocked. If the signaler sends the signal second, the waiting process blocks until the signal is sent. If many signals are sent before any process waits, the semaphore counts the signals and does not block until all of the signals that have been sent are received.
Queues
What kind of data structure should be used to communicate between the producer and consumer? There are many options, but all boil down to variations on the theme of FIFO queues. The most elementary queue implementation is the bounded buffer.
A bounded buffer has a data capacity that may vary from 1 item (where it becomes equivalent to a simple variable) to some large number. If more than one item is allowed, the buffer may have either a linked list or an array implementation. Here, we'll look at the array implementation:
```
   ________________________________
  |  |  |\\|\\|\\|\\|\\|\\|  |  |  |
  |__|__|\\|\\|\\|\\|\\|\\|__|__|__|
   Empty |     Data        | Empty
         |                 |
       Head               Tail
```
The operations on this structure are simple, if we ignore problems with preventing enqueueing data in a full queue or dequeueing data from an empty queue:
```
  enque(d)
     buffer[tail] = d
     tail = tail + 1 (modulo queue size)

  dequeue(d)
     d = buffer[head]
     head = head + 1 (modulo queue size)
```
We can detect empty and full queues fairly easily:
```
  empty := (head = tail)
  full := (head = tail)
```
But, these conditions are the same, which poses problems! Despite these problems, this approach to detecting the buffer full and buffer empty conditions is actually used, on occasion, primarily in low-level applications such as communication between processes and interrupt handlers.
The usual solution to the ambiguity is to redefine the full-queue condition as:
```
  full := (head + 1 = tail) (modulo queue size)
```
Exercise: Propose other solutions to detecting the full and empty conditions in a circular buffer, perhaps by counting the data elements in the buffer. Compare your solution with the one proposed above! Does it take more or less space? Does its use imply a run-time cost? Do your answers depend on the size of the enqueued data items, and if so, how?
Shared Queues
When a queue is shared between multiple processes, there are two different problems. One is assurance of mutual exclusion, so that, for instance, no two processes try to enqueue an item of data in the same queue slot, and the other is keeping track of the number of available data items and free spaces. All of these can be accomplished by semaphores:
```
  Semaphores used to provide mutual exclusion:
     headlock -- protects the head pointer
     taillock -- protects the tail pointer

  Semaphores used to count available resources:
     free -- counts number of free spaces
     data -- counts number of enqueued data items
```
Given these semaphores, with all mutual exclusion semaphores initialized to one and the counting semaphores initialized to reflect the initial state of the queue, the operations on a shared queue can be implemented as follows:
```
  enque(d)
     P(free)     -- wait for free space to put data
     P(taillock) -- wait for exclusive use of tail
     buffer[tail] = d
     tail = tail + 1 (modulo queue size)
     V(taillock) -- release the tail
     V(data)     -- signal that data is available

  dequeue(d)
     P(data)     -- wait for data to be available
     P(headlock) -- wait for exclusive use of head
     d = buffer[head]
     head = head + 1 (modulo queue size)
     V(headlock) -- release the head
     V(free)     -- signal free space available
```
This implementation of shared FIFO queues is universally applicable, but there are some special characteristics worth noting:
First, the mutual exclusion locks are only needed if there are multiple producers or consumers. When there is only one producer process, that is the only process that will inspect and manipulate the tail pointer, so no mutual exclusion semaphore is needed to protect the tail pointer.
Second, the mutual exclusion operations shown above protect very brief critical sections, so it is unlikely that any process will wait long for entry. As a result, the best implementations of headlock and taillock on a multiprocessor involve busy waiting at a spin-lock, not semaphores implemented in the process manager. Spin-locks are rarely appropriate on a uniprocessor!
Third, if the queues are very large, so there is a large probability that the queue will be neither full nor empty, and this means that the counting semaphores will rarely block the producer or consumer, context switching will be very rare and both producer and consumer may be able to run at full speed for long periods of time.
Variations on Queues
If the buffer has a capacity of one, the head and tail pointers are superfluous, as are the locks protecting them. In this case, the producer and consumer processes would be expected to operate in lock-step, producing one data item, then consuming it before the next is produced.
Exercise: Write the code for this limiting case for a bounded buffer queue. How many semaphores are required? What purposes do the remaining semaphores serve, and what range of values do they take on?
Lock step operation of the producer and consumer is frequently innefficient because unless the computational requirements of the two processes are identical, either the producer will spend time waiting for the consumer, or vicea versa.
From elementary queueing theory, we can predict the following: If the average time taken to produce an item is the same as the average time taken to consume an item, and the times vary randomly, the performance of the system will be maximized with an infinite queue capacity and an infinite average number of items in the queue. Furthermore, as the size of the bounded buffer is decreased, the percentage of the time that the producer spends waiting for space or the consumer spends waiting for data will increase smoothly. The conclusion is that, where memory allows, large buffers are preferable.
The bounded buffer with a capacity of two is a commonly used special case, particularly in 1960's vintage input/output subsystems. In that context, it is frequently described as "Double Buffered I/O". At any time, one buffer is used for communication with the device while the appliction program operates on the other buffer, and then, when both the device and application are done, the buffers are exchanged and both the device and application begin the next cycle.
Exercise: Write the code for this common case for a bounded buffer queue. How many semaphores are required if you assume a single consumer and a single producer? What purposes do the remaining semaphores serve, and what range of values do they take on?
When multiple FIFO queues are implemented using linked lists instead of arrays, it is common to have a single shared pool of free list items instead of a bounded supply of list items for each queue. For a given amount of memory, this can markedly improve performance, but it adds slightly to the complexity of the code. Specifically, instead of having a separate space semaphore to count the amount of free space in each queue, there is a single free space semaphore shared by all queues that use this pool of free space.
Exercise: Write the code for linked list queues in the general case where there are multiple queues where each queue may serve multiple producers and multiple consumers. Assume that there is only one type of object enqueued in any of the many queues -- call it a buffer, and assume that each buffer has a data field and however many links are required for the queue management scheme you use. How many semaphores are required per queue, and how many additional semaphores are shared by all queues?
Other problems
Do not be misled into thinking that the mutual exclusion and producer-consumer problems are the only problems in parallel programming. They are the most important and most common problems, but there are many others:
The readers-writers problem is a common problem in file-system design. A file may either be closed, open for read access by any number of users, or open for write access by exactly one user. Implementing the read-open, write-open and close operations using semaphores and shared variables is an interesting exercise.
Deadlock, the situation that arises when multiple processes are blocked, each awaiting some action by another blocked process, is a common failing in parallel programs. There are complex algorithms for both deadlock avoidance and deadlock detection. We will talk about these later.