First, consider the notational problem: How do you describe the creation of new parallel programs?
Conway, in his classic 1961 paper, was thinking about one of the first multiprocessor systems, and came up with the following general notation:
| A | \_/ | __v__ fork _/ \_ / \ / \ | B | | C | \_/ \_/ \ / _\_/_ | join v / \ | D |This notation implies that statement A is executed before B and C, both of which are completed before D. Each of A, B, C and D are potentially compound statements, and statements B and C are executed in parallel.
A later notation for this, introduced after flowchart notations fell out of favor, was the following:
A; cobegin B; C; coend D;
The flowchart notation used above to illustrate Conway's ideas is not his original notation, but rather, it is an adaptation of Petri net notation. Note that Conway proposed these ideas in the context of his proposal for a multiprocessor computer system, at the same time that he was developing a clear idea of the distinction between the concepts of process (a software entity) and processor (a hardware entity).
Conway's notation is symmetrical, but in real systems, he recognized that symmetrical implementations would be unlikely. Instead, using the figure from the previous section as an example, one sequence of statements must be in one process, perhaps A-B-D, and the other statement, perhaps C, must be in a second process; we will call these the parent process and the child process. Conway proposed something like the following:
Parent process: Child process: A; fork(L); L: C B; join(V); await-join(V); D;Note that fork(L) sends control both to label L (the new process) and to the next successive statement (the parent process). In operation, the fork(L) primitive creates a new process description with the PC field of the new process description containing the address of the label L. This process description is put in the ready list and may be run on a different CPU, while the original process may continue to run.
Await-join(V) looks quite a bit like a semaphore wait operation on the semaphore V, used to coordinate the join operation, and join(V) looks like a signal operation on that semaphore, followed by process termination. In Conway's proposal, V was a simple integer initialized to the number of processes that were expected to join using that integer. The join(V) primitive simply decremented V, while the await-join(V) primitive first incremented V and then looped as long as V was nonzero.
In UNIX, this same assymetry is present, although the structured programming facilities of languages like C do quite a bit to mask this:
A; if (fork() == 0) { B; /* child */ exit(); /* kill child, signal parent */ } else { C; /* parent */ wait(); /* await signal from child */ } D;In the above, statements A, C and D are executed by the parent process, while the fork() primitive creates a child process that executed B. The fork primitive duplicates the entire parent process, so that the parent and child are running identically the same code, with copies of the same variables, and copies of the same stack; at the instant fork returns in each of the two processes, the program counters are even equal. The only difference is that fork() returns zero in the child process, while it returns the process ID of the child to the parent.
Exercise: Look up the fork(), exit() and wait() primitives in the (on-line) Unix Programmer's Reference Manual. They are in section 2 of the manual (system calls); the UNIX man command under the UNIX shell searches this manual; the command man 2 x will search for the keyword x in section 2 of the manual.
Consider the problem of counting the nonzero elements of a 1,000,000 element array. Here is a parallel solution, expressed informally using two processes:
cobegin -- process A for i := 1 to 500,000 do if a[i] <> 0 then count := count + 1; -- process B for j := 500,001 to 1,000,000 do if a[j] <> 0 then count := count + 1; coendAfter processes A and B have completed, you'd expect that count would correctly reflect the desired quantity, but there is a significant chance that it won't, because the operation of incrementing count will most likely be accomplished by some sequence of operations such as the following:
load register with count increment register store register to countUnfortunately, even on a uniprocessor, it is possible that both processes A and B will attempt to execute the this same instruction sequence at the same time, as illustrated below:
process A process B load rA with count load rB with count increment rB store rB in count increment rA store rA to countIt is not hard to see that each time something like this occurs, one or more increment operations will be miscounted. On a uniprocessor using preemptive scheduling once per timeslice, once the CPU is preempted from process A and given to B as shown above, one full time-slice of computation by process B will be lost when process A regains control!
To prevent this, it is essential that only one process at a time attempt to increment count. Therefore, we say that this sequence of instructions is a critical section, or that access by processors to count must be mutually exclusive.
On a uniprocessor, mutual exclusion may be provided by disabling interrupts on entry to a critical section and reenabling them on exit.
disable interrupts load register with count increment register store register to count enable interruptsThe disadvantages of this are:
It requires that all critical sections be in privileged code (user code is typically unable to disable interrupts because doing so risks crashing the machine).
It is not selective. If one process is incrementing some variable, it only needs to exclude other processes that operate on the same variable!
Nonetheless, the code given in the previous lecture for implementing a scheduler relied on this method.
Many multiprocessors include instructions such as exchange(r,m) or test-and-set(m). These are atomic (if one CPU is executing one of these operations, no other CPU may intervene). Exchange swaps the contents of a register with a location in memory, while test-and-set sets a memory location to one while testing its previous value.
Given one or another of these operations, the variable count used in our running example may be protected with a new variable called a spin lock. As a convention, the spin lock protecting a variable, for example, count will be called count-lock. Given this, we can implement our critical section as:
lock( count-lock ) load register with count increment register store register to count unlock( count-lock )The two states of a spin lock are locked and unlocked; any two values may be used to represent these states, however, we will use zero to represent locked and one to represent unlocked because of the way these values generalize to the use of semaphores. Initially, the lock should be one; lock sets it to zero, and unlock resets it to one. Lock and unlock can be implemented as follows:
lock( v ) register := 0; repeat exchange( register, v ); until register = 1; unlock( v ) v := 1;Spin locks have the disadvantage that they involve busy waiting. Nonetheless, they are the preferred mutual exclusion mechanism for very short critical sections on shared-memory multiprocessors. Specifically, if the expected waiting time for entry to the critical section in the event of contention is shorter than the time it would take to do two context switches (switching the waiting process out of the CPU and back in again, under the control of the scheduler), this is the preferred method.
On some uniprocessor operating systems, including much of the older code in UNIX, the poor design of the scheduler or compatability with older versions of the system forces the use of spin-locks. Typically, in high level low-priority code, a relinquish operation is inserted in the busy waiting loop in order to let other ready processes use the CPU while one process waits for the lock. This only works if either a round-robin scheduler is being used or if there is a "relinquish to lower priority" operation -- one that causes headaches when it is implemented by a priority scheduler, but is nonetheless included in many real-time systems in order to allow this approach to mutual exclusion.
Semaphores can be used to provide mutual exclusion! The variable count used in our running example may be protected with a semaphore; we'll call this count-sem. Given this, we can implement our example critical section as:
P( count-sem ) -- the wait operation load register with count increment register store register to count V( count-sem ) -- the signal operationInitially, the semaphore should have a count of one; the P operation sets it to zero, and the V operation resets it to one.
The parallel between this and the spin-lock solution should be obvious! In fact, spin-locks are considered to one implementation of a special case of semaphores called binary semaphores. Binary semaphores are those that never take on values other than zero and one.
Semaphores implemented using the process scheduler are the preferred implementation of mutual exclusion on uniprocessors, and they are the preferred implementation on multiprocessors for those critical sections where the expected waiting time for entry to the section in the event of contention is greater than the time taken for two context switches.