22C:116, Lecture Notes, Aug. 28, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Classical Interprocess Communication

    First, consider the notational problem: How do you describe the creation of new parallel programs?

    Conway, in his classic 1961 paper, was thinking about one of the first multiprocessor systems, and came up with the following general notation:

        | A |
        __v__ fork
       _/   \_
      / \   / \
     | B | | C |
      \_/   \_/
        \   /
          |    join
         / \
        | D |
    This notation implies that statement A is executed before B and C, both of which are completed before D. Each of A, B, C and D are potentially compound statements, and statements B and C are executed in parallel.

    A later notation for this, introduced after flowchart notations fell out of favor, was the following:


    This flowchart notation used above is not Conway's original notation, rather it is an adaptation of Petri net notation.

  2. Implementing Parallelism

    Conway's notation is symmetrical, but in real systems, he recognized that symmetrical implementations would be unlikely. Instead, using the figure from the previous section as an example, one sequence of statements must be in one process, perhaps A-B-D, and the other statement, perhaps C, must be in a child process. Conway proposed something like the following:

    Parent process:
                             Child process:
          fork(L);               L: C
          B;                        join(V);
    Note that fork(L) sends control both to label L (the new process) and to the next successive statement (the parent process).

    Await-join(V) looks quite a bit like a semaphore wait operation on the semaphore V, used to coordinate the join operation, and join(V) looks like a signal operation on that semaphore, followed by process termination.

    In UNIX, this same assymetry is present, although the structured programming facilities of languages like C do quite a bit to mask this:

    if (fork() == 0) {
       B;               /* child */
       exit();          /* kill child, signal parent */
    } else {
       C;               /* parent */
       wait();          /* await signal from child */
    In the above, statements A, C and D are executed by the parent process, while the fork() primitive creates a child process that executed B. The fork primitive duplicates the entire parent process, so that the parent and child have the same code, copies of the same variables, and copies of the same stack. The only difference is that fork() returns zero in the child process, while it returns the process ID of the child to the parent.

  3. Introduction to Mutual Exclusion

    Consider the problem of counting the nonzero elements of a 1,000,000 element array. Here is a parallel solution, expressed informally using two processes:

       -- process A
       for i := 1 to 500,000 do
         if a[i] <> 0
           then count := count + 1;
       -- process B
       for j := 500,001 to 1,000,000 do
         if a[j] <> 0
           then count := count + 1;
    After processes A and B have completed, you'd expect that count would correctly reflect the desired quantity, but there is a significant chance that it won't, because the operation of incrementing count will most likely be accomplished by some sequence of operations such as the following:
      load register with count
      increment register
      store register to count
    Unfortunately, even on a uniprocessor, it is possible that both processes A and B will attempt to execute the this same instruction sequence at the same time, as illustrated below:
           process A           process B
      load rA with count
                           load rB with count
                           increment rB
                           store rB in count
      increment rA
      store rA to count
    It is not hard to see that each time something like this occurs, one or more increment operations will be miscounted. On a uniprocessor, where the CPU is preempted from process A and given to B as shown above, one full time-slice of computation by process B will be lost!

    To prevent this, it is essential that only one process at a time attempt to increment count. Therefore, we say that this sequence of instructions is a critical section, or that access by processors to count must be mutually exclusive.

  4. Simple Mutual Exclusion on Uniprocessors

    On a uniprocessor, mutual exclusion may be provided by disabling interrupts on entry to a critical section and reenabling them on exit.

      disable interrupts
      load register with count
      increment register
      store register to count
      enable interrupts
    The disadvantages of this are:

    It requires that all critical sections be in privileged code (user code is typically unable to disable interrupts because doing so risks crashing the machine).

    It is not selective. If one process is incrementing some variable, it only needs to exclude other processes that operate on the same variable!

    Nonetheless, the code given in the previous lecture for implementing a scheduler relied on this method.

  5. Simple Mutual Exclusion on Multiprocessors

    Many multiprocessors include instructions such as exchange(r,m) or test-and-set(m). These are atomic (if one CPU is executing one of these operations, no other CPU may intervene). Exchange swaps the contents of a register with a location in memory, while test-and-set sets a memory location to one while testing its previous value.

    Given one or another of these operations, the variable count used in our running example may be protected with a new variable called a spin-lock. We'll call this count-lock. Given this, we can implement our critical section as:

      lock( count-lock )
      load register with count
      increment register
      store register to count
      unlock( count-lock )
    Initially, the lock should be one; lock sets it to zero, and unlock resets it to one. Lock and unlock can be implemented as follows:
      lock( v )
        register := 0;
           exchange( register, v );
        until register = 1;
      unlock( v )
        v := 1;
    Spin locks have the disadvantage that they involve busy waiting. Nonetheless, they are the preferred mutual exclusion mechanism for very short critical sections on shared-memory multiprocessors. Specifically, if the expected waiting time for entry to the critical section in the event of contention is shorter than the time it would take to do two context switches (switching the waiting process out of the CPU and back in again, under the control of the scheduler), this is the preferred method.

    On some uniprocessors with poorly designed schedulers, spin-locks are still used. Typically, a relinquish operation is inserted in the busy waiting loop in order to let other ready processes use the CPU. This only works if either a round-robin scheduler is being used or if there is a "relinquish to lower priority" operation -- one that causes headaches when it is implemented by a priority scheduler, but is nonetheless included in many real-time systems in order to allow this approach to mutual exclusion.

  6. A General Mutual Exclusion Mechanism

    Semaphores can be used to provide mutual exclusion! The variable count used in our running example may be protected with a semaphor; we'll call this count-sem. Given this, we can implement our example critical section as:

      P( count-sem )
      load register with count
      increment register
      store register to count
      V( count-sem )
    Initially, the semaphore should have a count of one; the P operation sets it to zero, and the V operation resets it to one.

    The parallel between this and the spin-lock solution should be obvious! In fact, spin-locks are considered to one implementation of a special case of semaphores, the binary semaphores. Binary semaphores are those that never take on values other than zero and one.

    Semaphores implemented using the process scheduler are the preferred implementation of mutual exclusion on uniprocessors, and they are the preferred implementation on multiprocessors for those critical sections where the expected waiting time for entry to the section in the event of contention is greater than the time taken for two context switches.