First, consider the notational problem: How do you describe the creation of new parallel programs?
Conway, in his classic 1961 paper, was thinking about one of the first multiprocessor systems, and came up with the following general notation:
| A | \_/ | __v__ fork _/ \_ / \ / \ | B | | C | \_/ \_/ \ / _\_/_ | join v / \ | D |This notation implies that statement A is executed before B and C, both of which are completed before D. Each of A, B, C and D are potentially compound statements, and statements B and C are executed in parallel.
A later notation for this, introduced after flowchart notations fell out of favor, was the following:
A; cobegin B; C; coend D;
This flowchart notation used above is not Conway's original notation, rather it is an adaptation of Petri net notation.
Conway's notation is symmetrical, but in real systems, he recognized that symmetrical implementations would be unlikely. Instead, using the figure from the previous section as an example, one sequence of statements must be in one process, perhaps A-B-D, and the other statement, perhaps C, must be in a child process. Conway proposed something like the following:
Parent process: Child process: A; fork(L); L: C B; join(V); await-join(V); D;Note that fork(L) sends control both to label L (the new process) and to the next successive statement (the parent process).
Await-join(V) looks quite a bit like a semaphore wait operation on the semaphore V, used to coordinate the join operation, and join(V) looks like a signal operation on that semaphore, followed by process termination.
In UNIX, this same assymetry is present, although the structured programming facilities of languages like C do quite a bit to mask this:
A; if (fork() == 0) { B; /* child */ exit(); /* terminate child, signal parent */ } else { C; /* parent */ wait(); /* await signal from child */ } D;In the above, statements A, C and D are executed by the parent process, while the fork() primitive creates a child process that executed B. The fork primitive duplicates the entire parent process, so that the parent and child have the same code, copies of the same variables, and copies of the same stack. The only difference is that fork() returns zero in the child process, while it returns the process ID of the child to the parent.
Consider the problem of counting the nonzero elements of a 1,000,000 element array. Here is a parallel solution, expressed informally using two processes:
cobegin -- process A for i := 1 to 500,000 do if a[i] <> 0 then count := count + 1; -- process B for j := 500,001 to 1,000,000 do if a[j] <> 0 then count := count + 1; coendAfter processes A and B have completed, you'd expect that count would correctly reflect the desired quantity, but there is a significant chance that it won't, because the operation of incrementing count will most likely be accomplished by some sequence of operations such as the following:
load register with count increment register store register to countUnfortunately, even on a uniprocessor, it is possible that both processes A and B will attempt to execute the this same instruction sequence at the same time, as illustrated below:
process A process B load registerA with count load registerB with count increment registerB store registerB in count increment registerA store registerA to countIt is not hard to see that each time something like this occurs, one or more increment operations will be miscounted. To prevent this, it is essential that only one process at a time attempt to increment count. Therefore, we say that this sequence of instructions is a critical section, or that access by processors to count must be mutually exclusive.
On a uniprocessor, mutual exclusion may be provided by disabling interrupts on entry to a critical section and reenabling them on exit.
disable interrupts load register with count increment register store register to count enable interruptsThe disadvantages of this are:
It requires that all critical sections be in privileged code (user code is typically unable to disable interrupts because doing so risks crashing the machine).
It is not selective. If one process is incrementing some variable, it only needs to exclude other processes that operate on the same variable!
Nonetheless, the code given in the previous lecture for implementing a scheduler relied on this method.
Many multiprocessors include instructions such as exchange(r,m) or test-and-set(m). These are atomic (if one CPU is executing one of these operations, no other CPU may intervene). Exchange swaps the contents of a register with a location in memory, while test-and-set sets a memory location to one while testing its previous value.
Given one or another of these operations, the variable count used in our running example may be protected with a new variable called a spin-lock. We'll call this count-lock. Given this, we can implement our critical section as:
lock( count-lock ) load register with count increment register store register to count unlock( count-lock )Initially, the lock should be one; lock sets it to zero, and unlock resets it to one. Lock and unlock can be implemented as follows:
lock( v ) register := 0; repeat exchange( register, v ); until register = 1; unlock( v ) register := 1;Spin locks have the disadvantage that they involve busy waiting. Nonetheless, they are the preferred mutual exclusion mechanism for very short critical sections on shared-memory multiprocessors. Specifically, if the expected waiting time for entry to the critical section in the event of contention is shorter than the time it would take to do two context switches (switching the waiting process out of the CPU and back in again, under the control of the scheduler), this is the preferred method.
On some uniprocessors with poorly designed schedulers, spin-locks are still used. Typically, a relinquish operation is inserted in the busy waiting loop in order to let other ready processes use the CPU. This only works if either a round-robin scheduler is being used or if there is a "relinquish to lower priority" operation -- one that causes headaches when it is implemented by a priority scheduler, but is nonetheless included in many real-time systems in order to allow this approach to mutual exclusion.
Semaphores can be used to provide mutual exclusion! The variable count used in our running example may be protected with a semaphor; we'll call this count-sem. Given this, we can implement our example critical section as:
P( count-sem ) load register with count increment register store register to count V( count-sem )Initially, the semaphore should have a count of one; the P operation sets it to zero, and the V operation resets it to one.
The parallel between this and the spin-lock solution should be obvious! In fact, spin-locks are considered to one implementation of a special case of semaphores, the binary semaphores. Binary semaphores are those that never take on values other than zero and one.
Semaphores implemented using the process scheduler are the preferred implementation of mutual exclusion on uniprocessors, and they are the preferred implementation on multiprocessors for those critical sections where the expected waiting time for entry to the section in the event of contention is greater than the time taken for two context switches.
The problem of mutual exclusion on multiprocessors where there are no atomic exchange or test-and-set operations has been the subject of years of study, during which many of the most ingeneous solutions to the critical section problem have been found. Dekker's solution is oldest of these. Each spin-lock is represented by three variables:
state[0..1] -- process i sets state[i] to enter section turn -- indicates who's turn it is to try nextGiven this, and that all variables are initially zero, the code for the lock and unlock operations is as follows, assuming that each process passes it's process ID to the lock and unlock operations:
other(pid) return( 1 - pid ); lock(pid): state[pid] = 1; while state[other(pid)] = 1 do state[pid] = 0; while turn <> pid do nothing; state[pid] = 1; end loop; return; unlock(pid): state[pid] = 0; turn = other(pid); return;When A is inside the critical section, state[A] is one and state[B] is zero. If both try to enter at the same time, so both state variables are set to one, they both enter the loop waiting for the other to reset their state to zero. One will immediately reassert their claim, while the other is blocked a waiting its turn. Furthermore, on exit from the critical section, each process always sets turn to give priority to the other process, preventing one or the other from hogging the critical section.
This code is difficult to understand! Furthermore, it is difficult to generalize to more than two processes (consider, for example, building a binary tree of processes to generalize it!). There are many more solutions to this problem (consult your textbook for some!)