5. Interprocess Communication

Part of the 22C:116 Lecture Notes for Fall 2002
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Producers and Consumers

Mutual exclusion is not enough! All it provides is the illusion that entire critical sections are atomic, but effective parallel programming rests on more than this.

Consider a situation where one process produces data to be consumed by another:

      Producer --> Consumer

The producer can easily fill a shared variable, then let the consumer empty the variable, and obviously, both the filling and emptying of the shared variable should appear atomic from the perspective of other processes. Mutual exclusion is sufficient for this, but it is not sufficient There must be some way for the producer to signal to the consumer that data is available!

Signalling as an Abstraction

One way to view this signalling problem is to think purely in terms of the sequence of computations. We want the producer to produce data, and only when the producer has produced do we want to allow the consumer to consume. If the producer produces, but the consumer is not yet ready, we do not want to make the producer wait, but if the consumer is ready and the producer has not yet produced, we must block the consumer. Using Petri-net notation, we can express these constraints as follows:

   Process A             Process B
       |                     |
  _____V_____                |
 |  Produce  |               |
 | some data |               |
 |___________|               |
       |          X          |
    ___V______________       |
       |          |          |
       |       ___V___       |
       |      |       |      |
       |      |   S   |      |
       |      |_______|      |
       |          |          |
       |       ___V__________V___
       |                     |
       |          Y     _____V_____
       |               |  Comsume  |
       |               |  the data |
       |               |___________|
       |                     |

This is just a Petri-net fragment because it shows nothing of the producer or consumer control flow other than the actual produce and consume steps. If the two transitions in this net fragment are viewed as fork and join transitions, then we have not only a producer and consumer process, but also a short process from X to Y with only one place, S, and no internal transitions. This is not a productive view of the place S!

For a better understanding of S, notice that, after each time the producer performs the produce step, a token is added to the place S, and before each time the consumer performs the consume, a token is removed from the place S. The count of tokens in S is therefore a count of the number of items that have been produced but not yet consumed. Therefore, we can model S in a program by a semaphore, and we can model the transition X as a signal operation on this sempaphore, while we model the transition Y as a wait operation on this semaphore.

In fact, this model generalizes. All semaphores can be modeled as places in a petri net, with each signal operation modeled by a fork that deposits a token in that place, and each wait operation modeled by a join that consumes a token from that place. This understanding of synchronization as being theoretically equivalent to control flow is a relatively new insight!

Bounded Buffers

The simplest general implementation of a FIFO queue between a producer and a consumer is called a bounded buffer. A bounded buffer has a fixed maximum data capacity, but at any time, it may be empty, it may contain fewer items than its capacity, or it may be full. The items in the queue are stored in an array, the buffer, between the head and tail pointers, as follows:

   ________________________________
  |  |  |\\|\\|\\|\\|\\|\\|  |  |  |
  |__|__|\\|\\|\\|\\|\\|\\|__|__|__|
   Empty |     Data        | Empty
         |                 |
       Head               Tail

In the illustration above, the head pointer refers to the first item in the queue, the item that will be returned by the next call to the dequeue operation, and the tail pointer refers to the free space that will be filled by the next call to enqueue. This arrangement assumes that the pointers are incremented after each use. A second possibility is to increment before use, in which case, the head pointer would reference the free location from which data was most recently dequeued, and the tail pointer would reference the data most recently enqueued.

The operations on this structure are simple, if we ignore problems with preventing enqueueing data in a full queue or dequeueing data from an empty queue they are:

      enque(d)
          buffer[tail] = d
          tail = tail + 1 (modulo queue size)

      dequeue(d)
          d = buffer[head]
          head = head + 1 (modulo queue size)

We can detect that the queue is empty or full fairly easily:

      empty := (head = tail)
      full := (head = tail)

Unfortunately, these conditions are the same, and we can only distinguish them by noting the history. If, after enqueue, the head and tail pointers are the same, the queue is full. If, after dequeue, this happens, the queue is empty.

In applications where busy waiting is appropriate, the full and empty conditions are sometimes distinguised by the simple expedient of wasting one slot in the buffer; in effect, we declare the queue to be full when there is acctually one free spot remaining:

      full := (head + 1 = tail) (modulo queue size)

Exercise: Propose other solutions to detecting the full and empty conditions in a circular buffer, perhaps by counting the data elements in the buffer. Compare your solution with the one proposed above! Does it take more or less space for a bounded buffer of 256 characters. How about a bounded buffer of 256 floating point numbers? Does your alternative have a higher or lower run-time cost than the approach outlined above for detecting the queue full and empty conditions.

Shared Queues

When a queue implemented as a bounded buffer is shared between multiple processes, there are two different problems. One is assurance of mutual exclusion, so that, for instance, no two processes try to enqueue an item of data in the same queue slot, and the other is keeping track of the number of available data items and free spaces. All of these can be accomplished using semaphores:

      Binary semaphores used to provide mutual exclusion:
          headlock -- protects the head pointer
          taillock -- protects the tail pointer

      General semaphores used to count available resources:
          free -- counts number of free spaces in the buffer
          data -- counts number of data items in the buffer

      Initially,
          headlock.count = 0
          taillock.count = 0
          free.count = buffer capacity
          data.count = 0

Given these semaphores, the operations on a shared bounded buffer queue can be implemented as follows:

      enque(d)
          P(free)     -- wait for free space to put data
          P(taillock) -- wait for exclusive use of tail
          buffer[tail] = d
          tail = tail + 1 (modulo queue size)
          V(taillock) -- release the tail
          V(data)     -- signal that data is available
          return

      dequeue(d)
          P(data)     -- wait for data to be available
          P(headlock) -- wait for exclusive use of head
          d = buffer[head]
          head = head + 1 (modulo queue size)
          V(headlock) -- release the head
          V(free)     -- signal free space available
          return
This implementation of shared FIFO queues is universally applicable whenever queues are used to communicate between processes that run under the process manager. There are some variations worth noting:

First, the mutual exclusion locks are only needed if there are multiple producers or consumers. When there is only one producer process, the producer is the only process that will inspect and manipulate the tail pointer, so no mutual exclusion semaphore is needed to protect the tail. The same holds for the head.

Second, the critical sections in this code are very brief, so it is unlikely that any process will wait long for entry. As a result, in system-level code on a uniprocessor, these can be protected by disabling interrupts, and in a multiprocessor setting, spin-locks are appropriate.

It is also worth noting that if queue items are very large, we do not need to read or write the item from the queue inside the critical section; all we need to do is sample and increment the head or tail pointer in the section, and then do the data copy outside, as follows:

      enque(d)
          local variable t
          P(free)     -- wait for free space to put data
          P(taillock) -- wait for exclusive use of tail
          t = tail
          tail = tail + 1 (modulo queue size)
          V(taillock) -- release the tail

          buffer[t] = d  -- copy data here!

          V(data)     -- signal that data is available
          return

Variations on Queues

If the buffer has a capacity of one, the head and tail pointers are superfluous, as are the locks protecting them. In this case, the producer and consumer processes would be expected to operate in lock-step, producing one data item, then consuming it before the next is produced.

Exercise: Write the code for this limiting case for a bounded buffer queue. How many semaphores are required? What purposes do the remaining semaphores serve, and what range of values do they take on?

Lock step operation of the producer and consumer is frequently innefficient because unless the computational requirements of the two processes are identical, either the producer will spend time waiting for the consumer, or vicea versa. Nonetheless, this one-item queue allows the actual computations of the producer and consumer to overlap, thus allowing parallelism that would not have been possible in a purely sequential setting.

From elementary queueing theory, we can predict the following: If the average time taken to produce an item is the same as the average time taken to consume an item, and the times vary randomly, the performance of the system will be maximized with an infinite queue capacity and an infinite average number of items in the queue. Furthermore, as the size of the bounded buffer is decreased, the percentage of the time that the producer spends waiting for space or the consumer spends waiting for data will increase smoothly. Put briefly, so long as there is any available memory, performance can be increased by using this memory to increase the queue capacity.

The bounded buffer with a capacity of two is a commonly used special case, particularly in 1960's vintage input/output subsystems. In that context, it is frequently described as "Double Buffered I/O".

Exercise: Write the code for this common case for a bounded buffer queue. How many semaphores are required if you assume a single consumer and a single producer? What purposes do the remaining semaphores serve, and what range of values do they take on?

When multiple FIFO queues are implemented using linked lists instead of arrays, it is common to have a single shared pool of free list items instead of a bounded supply of list items for each queue. For a given amount of memory, this can markedly improve performance, but it adds slightly to the complexity of the code. Specifically, instead of having a separate space semaphore to count the amount of free space in each queue, there is a single free space semaphore.

Exercise: Write the code for linked list queues in the general case where there are multiple queues, where each queue may serve multiple producers and multiple consumers. Assume that there is only one type of object enqueued in any of the many queues -- call it a buffer, and assume that each buffer has a data field and however many links are required for the list structuring scheme you use. How many semaphores are required per queue, and how many additional semaphores are shared by all queues? Identify the purpose of each!

Other problems

Do not be misled into thinking that the mutual exclusion and producer-consumer problems are the only problems in parallel programming. They are the most important and most common problems, but there are many others:

The readers-writers problem is a common problem in file-system design. A file may either be closed, open for read access by any number of users, or open for read-write access by exactly one user.

Problem: Implement the synchronization required for the open-for-read, open-for-write and close operations under the constraints of the readers-writers problem, using semaphores and shared variables.

Deadlock, the situation that arises when multiple processes are blocked, each awaiting some action by another blocked process, is a common failing in parallel programs. There are complex algorithms for both deadlock avoidance and deadlock detection. We will talk about these later.