CS:3620 Notes, Lecture 30

The typical computer system supports at least as many processes as it has processors, but of course, it must have at least one processor. Each process runs on what we can describe as its own virtual processor, and whenever that process is executing some instruction, some physical processor must be involved. The part of an operating system which takes the physical processor resources and multiplexes them among the virtual processors needed to run the processes on that system is called the central processing unit scheduler. If the term scheduler is used without qualification, it is usually taken to refer to the CPU scheduler.

At any instant, each processor in a system is executing the code of some process. A context switch occurs whenever a processor stops executing the code of one process and starts executing the code of another. Context switching may be initiated by the process which was running, or it may be initiated by the scheduler. The latter are called preemptive or involuntary context switches.

The States of a Process

The scheduler decides, at each instant, which processor will be used to execute instructions on behalf of which process. Because all user code and most system code can be thought of as running as part of some process, the scheduler can be viewed as the bottommost or component in the hierarchy of software components making up an operating system.

From the point of view of processes running under the control of the scheduler, each process is either running, waiting for some event, or dead, as shown below:

              create                       wait
        ---------------->           ---------------->
   dead                    running                     waiting
        <----------------           <----------------
                exit                      signal
                kill

The basic states of a process

If you try to imagine process as a class, you can imagine create as the class initializer, and wait, signal, exit and kill as methods applicable to instances of the class. In fact, in the case of wait and signal, this is definitely wrong, but it is still a productive way to imagine the situation, at least at the start.

From the scheduler's point of view, an additional state is necessary: a process may be able to run, but not currently being supported by any real processor. Such processes are said to be ready because they are ready to run as soon as a processor becomes available. Ready processes existx because the number of processors may be smaller than txhe number of processes that are able to run. Thus, the states of a process shown must be extended as shown in the following figure.

                       _________________
                      |                 |
                      |      ready      |
                      |         ^       |
            create    |       | | relin-|    wait
         ------------>| sched-| | quish |------------>
    dead              |  ule  | |       |              waiting
         <------------|       | | pre-  |<------------
              exit    |       | | empt  |   signal
              kill    |       V         |
                      |     running     |
                      |_________________|

More process states

The list of system services supported by a typical scheduler is implied by the names of the transitions between the states shown above These services are described in more detail below. Note that we slip in a new data type here, the semaphore.

create(c): create a process which is an activation of the code c; the created process changes state from dead to ready or running; a process identifier, p, is returned. The Unix fork() operation is a variant of this.
kill(p): kill process p, changing its state from ready, running, or waiting to dead. In Unix kill(pid,SIGKILL) does this.
exit: used by a process to terminate itself when its work is done; the state changes from running to dead. The Unix exit() operation is exactly this.
relinquish: used by a process to voluntarily change state from running to ready so that some other process may be run on the processor which was being used.
wait(s): used by a process to wait on the semaphore s, decrementing the count in s and causing the process to change state from running to waiting if the result is negative.
signal(s): used by a process to signal using the semaphore s, incrementing the count in s and, if there is a process waiting on s, changing the state of one waiting process to ready or running.
newsem(s,i): initialize the semaphore s with the initial count i.

Note that most of the state transitions in the state diagram above correspond to operations in this list, with two exceptions. First, there is no preempt operation. This is because preemption is always a side effect of something else. A process cannot explicitly preempt some other process, because to do so would require that it already be running. Instead, preemption occurs because of activities of other running processes or because of events outside the process manager such as interrupts.

Second, there is no explicit schedule operation. This transition is also a side effect. Whenever a physical processor becomes idle, it must move on to executing some other process or the processor itself must be turned off. Therefore, whatever scheduler operations take running processes to any other state must either schedule new processes or turn off processors, and whenever a processor is turned on, some process must be scheduled on it.

The list of process scheduling operations given above is not universal. For example, processes on the UNIX system are created by an operation called fork() which splits the parent process into two identical copies which run the same code but have separate copies of the stack and local variables. One of the resulting processes is called the parent, the other is called the child; a call to fork() returns the process identifier of the child to the parent, and it returns zero to the child; aside from this difference in return value, the parent and child continue running from the point of return. The child has a copy of the parent's stack and heap, while the parent has the originals, but because these are identical, it really doesn't matter which process gets the copy and which keeps the originals.

Semaphores

The particular services which a system provides for synchronizing user processes vary considerably from system to system, but the wait and signal operations on semaphores are perhaps the most basic alternatives, and they are sufficient to implement all of the alternatives. In Unix, semaphore services were added as an afterthought, but the implementation of input/output operations implicitly involves semaphores or something equivalent in order to make processes wait until the input or output they requested has been completed.

The semaphore abstract data type is perhaps the simplest universally useful implementation of the variables used as arguments to the wait and signal operations. Although there are many implementations of semaphores, they can be intuitively thought of as integers, where wait(x) or the object-oriented x.wait waits until x is greater than zero and then decrements it, and signal(x) or x.signal increments x. An incorrect but intuitively useful implementation of these operations using polling loops is shown below:

type semaphore = integer;

procedure wait( var s: semaphore ) { P };
begin
     while s <= 0 do { nothing };
     s := s - 1;
end { wait };

procedure signal( var s: semaphore ) { V };
begin
     s := s + 1;
end { signal };

The basic operations on semaphores.

Sempaphores can be used for many different purposes, but two typical uses are mutual exclusion, the problem of locking other processes out of a critical section of code, and counting available (fixed size) resources. Consider what happens when a single FIFO queue is shared between multiple producers and multiple consumers of data. If we assume that we have naive enqueue and dequeue routines designed with no understanding of data sharing, we can protect these for use in the context of parallel processes as follows:

struct shared_queue {
        queue * q;       /* initially empty, with capacity c */
        semaphore data;  /* initially zero, counts data in q */
        semaphore free;  /* initially set to c */
        semaphore mutex; /* initially set to 1 */
}

shared_enqueue( shared_queue * q, item i )
{
        wait( q->free );
        wait( q->mutex );
        enqueue( q->q, i );
        signal( q->mutex );
        signal( q->data );
}

item shared_dequeue( shared_queue * q )
{
        wait( q->data );
        wait( q->mutex );
        return dequeue( q->q );
        signal( q->mutex );
        signal( q->free );
}

Conceptually, the shared enqueue routine above begins by waiting for there to be available free space in the queue. It will wait until this is the case, and then it will wait until no other routine is trying to enqueue or dequeue data in this particular queue using the mututal exclusion semaphore. After it finishes the naive enqueue routine, it signals that it is done on the mutual exclusion semaphore and then it signals that it has produced one item of data.

In general, semaphores with an initial value of one are used for mutual exclusion. One such semaphore is typically associated with each resource that might be shared, and all users are responsible for waiting on this semaphore (decrementing it to zero) before use of the resource and signalling it (incrementing it) when done. We say that the code using the shared resource is a critical secton, and we say that the semaphore is used to guard that critical sectoin.

Good programming technique, of course, suggests that sharable objects should be packaged with the semaphores that guard them, so that users need not be aware of the semaphores. The example above illustrates such a packaging.

Note again, the idea that a semaphore is just an integer and that the wait and signal operatons just increment and decrement that integer is a vast oversimplification. Semaphore implementatons are almost always more complex.

Unix Primitives

Semaphores were an afterthought in Unix; this should not be surprising, considering that Unix was originally developed in parallel with Dijkstra's work on the THE system. As a result, a considerable amount of old Unix code still exists that uses very ad-hoc synchronization. A common trick in old Unix code is to use files as mutual exclusion locks. This was (and remains) so common that most Unix system administrators routinely talk about lockfiles.

Lockfiles work under Unix and its variants because, if two processes call open(filename,O_CREAT|O_EXCL), the file will be created by one of them and not the other. The one that created the file will succeed with this call, while it will fail in the other process. Thus, we can implement a mutual exclusion lock as follows, assuming that the user provides the name of a unique lock file:

The polling loop in the lock code could become very expensive, so a call is made to the sleep() service to wait one second before each try; this effectively relinquishes the CPU to let other processes run. If we did not do this, other processes would still run when the calling process is preempted, but the caller would still fight for access to the CPU.

The use of lock files had good and bad features. It is dangerous, because a careless user could explore the directory containing the lock files and delete one of them, allowing the application to violate mutual exclusion constraints. On the other hand, if an application is ill behaved, a system administrator can kill the offending process and then manually delete the lock files. To simplify the latter and warn against the former, lock files are usually named with names that suggest their use, for example, with the text lock as part of the file name.

Creating and deleting files in the file system is a clumsy way to handle mutual exclusion, and it is slow, potentially requiring many disk accesses. Furthermore, the overhead of waiting largely depends on the time delay in the polling loop. A short delay means many extra and futile checks of the lock file, but it means that the waiting process will resume sooner. A long delay leads to lower overhead and slower response.

Later in the history of Unix, after major utilities had already come into widespread use, numerous other synchronizaton primitives were introduced. Among them:

flock( file_descriptor, LOCK_EX )
flock( file_descriptor, LOCK_UN ): LOCK_EX apply a mutual exclusion lock to an open file, and LOCK_UN unlock it.
sem_open( name, options )
sem_wait( name )
sem_post( name ): Create or connect to a named semaphore, and then wait on it or signal it. The word signal was taken, early in Unix development, for an exception handling mechanism, so that word is not used with Unix semaphores.

These new mechanisms are important, and essentially all modern Unix variants support them, but whenever you find yourself working with classical Unix utilities, even on the most modern Linux system, you are likely to find yourself fighting with lock files.

Thread and Process States

Schedulers

The States of a Process

Semaphores

Unix Primitives