22C:116, Lecture Notes, Lecture 5, Fall 2000

Douglas W. Jones
University of Iowa Department of Computer Science

Sharing CPU time is essential on any system with more processes than processors. It is common to talk about the contending processes as if each belongs to a different end user of the system. While multiprocessors have been in use since the 1960's, the number of logical processes is almost always larger than the number of CPUs, even on the few systems that have been built with thousands of CPUs.
Since 1966, with the Berkeley Timesharing System, many systems have allowed users to run more than one process in parallel, so not only do users compete with each other for CPU time, but the processes of one user also compete with each other.
Aside: Surveys done of UNIX users done in the 1970's show that a single user at a remote terminal typically had 4 processes, one of which is interactive, one of which was waiting for the former process to terminate, and two of which may have been waiting or running in the background.
On window-oriented systems, it is common to see one or more processes behind each window on the display screen.

Process Management

The traditional picture of the states of a process, as found in many elementary operating systems textbooks, is as follows:

            Dead _
            ^|  |\
            ||    \
        kill||     \Terminate
            ||      \
            ||       \
        <----|------- \
Waiting     ||  Wait   Running ---
       \    ||        _           |
        \   ||Create  /|          |
         \  ||       /            |
    Signal\  |      / Schedule    |
           \ v     /              | Preempt
            _|    /               | Relinquish
             Ready <--------------

Process States
- Running
  At any instant, exactly one process is in the running state on each CPU in the system, never more, and only less if it is possible to halt unneeded CPUs. With symmetric multiprocessors, the linkage between a process and any particular CPU is established when the process enters the running state and deestablished when the process exits the running state. Asymmetric multiprocessors may have more permanent linkages between processes and processors. A running process may:
  - terminate -- voluntarily become dead
  - be killed -- involuntarily become dead (not on all systems)
  - relinquish -- voluntarily become ready
  - be preempted -- involuntarily become ready
  - wait -- voluntarily wait for something (become waiting)
- Ready
  There is a queue of processes ready to use the CPUs of a symmetric multiprocessor. On an asymmetric multiprocessor, there must be one queue per equivalence class of CPUs -- for example, a queue for processes needing a CPU with floating point, and another queue for processes needing the processor with the extended graphics instruction set. Each queue may be a FIFO queue or a priority queue A ready process may:
  - be scheduled -- become running
  - be killed -- involuntarily become dead
  Note that the ready process cannot initiate any state changes because it is not running and therefore is entirely passive.
- Dead
  The state dead will be a real state only on systems where there is a fixed finite set of process description records that are reused during the lifetime of the system. If process descriptions are allocated from the heap, the dead state becomes something of a pseudostate, since dead processes are simply deallocated, and when new processes are created, they are allocated from the same heap that is used for other dynamically allocated resources.
  Assuming that dead processes exist, a dead process may
  - be created -- become waiting
  Dead processes, being dead, are entirely passive!
This list of process states implies the following process manager services; the names and details of these services, of course, vary considerably from one system to the next:

create(x)
create a new process. There must be some kind of parameter to the create call in order to distinguish the created process from other processes. It might be a starting address, but systems differ considerably in the details of how this is accomplished.

terminate()
terminates the calling process.

relinquish()
forces the caller to the ready state.

kill(x)
kill another process, where x is some kind of process handle or ID.

If no process in the system needs a particular CPU, it is usual to leave a running process on that CPU! This is because each CPU must have some value in the program counter, which is to say, it must be prepared to fetch a next instruction at all times, even when the CPU has a low-power idle state where the fetch opration is deferred until the next interrupt request.
For this reason, operating systems usually include idle processes to soak up the otherwise unused CPU cycles, one per CPU. On uniprocessors, this is sometimes called the background process (as opposed to other more productive background processes). A typical idle process runs the following code:
```
		loop
		   relinquish();
		forever
	
```
If the CPU includes an instruction forcing it into low-power idle mode until the next interrupt, this should be included in the loop so that idle CPUs will draw less power (this feature was first introduced on minicomputers of the early to mid 1970's).
On an n processor symmetrical multiprocessor, there will be n idle processes. On an assymetrical multiprocessor with n processes in some equivalence class, the ready queue serving that equivalence class will contain n idle processes.
On early UNIX systems, one well known use for the background process was for computing more digits of pi. This is silly, and there are useful housekeeping jobs the background process can do. For example, it can run CPU and memory diagnostics, or it can manage garbage collection. The SETI screen-savers that you can download for PCs are examples of (potentially) productive idle processes, although the code for this application is itself occasionally blocked by I/O operations, so there must be an additional genuine idle process that does nothing at all.
The ready queue may have any of a number of interesting data structures, but it always has two basic operations,
- enqueue a process and
- dequeue a process.
A FIFO queue gives a system where all processes have equal priority. A priority queue allows high priority processes to have more access to the CPU. If a priority queue is used, the idle processes must have the lowest priorityi, since they are always ready to run if nothing else is available.
Fixed priority processes are used in many real-time applications. With UNIX, an interesting variable priority scheme was introduced, in which the priority of a process rises when it voluntarily relinquishes the CPU, for example, to wait for I/O completion, and the priority of a process falls when it is preempted at the end of a time-slice. This favors interactive users over compute bound processes.
Preemption
A running process is preempted when it is forced into the ready state in order to give the CPU to some other process. This might be done, for example, by the real-time-clock interrupt service routine at the end of a time-slice in a timesharing system. This is illustrated in the following pseudocode:
```
Clock interrupt service routine:
   begin
      Disable Interrupts (if not already disabled)

      -- Save process state of interrupted process
      current.pc := interrupt-return.pc
      current.registers := interrupt-return.registers

      -- Change interrupted process state to ready
      Enqueue( current, ready queue )

      -- Make new process into the current process
      current := Dequeue( ready queue )

      -- Restore process state of new process
      interrupt-return.pc := current.pc
      interrupt-return.registers := current.registers

      Enable Interrupts (if necessary)
      return
   end
```
Note that, in addition to involuntary preemption, many systems include provisions for a running process to voluntarily move itself from running to ready. This is sometimes called voluntarily relinquishing the CPU, and the code for this is essentially the same as that given above for preemption, except for differences in the calling sequence.
In old systems, clock interrupts were periodic, for example, every 1/60 of a second. Modern hardware usually has a programmable timer, so it is possible for the scheduler to vary the length of the time-slice (interval between interrupts) depending on the behavior of the process being scheduled.
The Berkeley Timesharing System did this by the late 1960's, giving long time slices to jobs that appeared compute bound, in an effort to minimize context switching, while giving short timeslices to jobs that appeared I/O bound.
Interrupts must be disabled during the interrupt service routine so that some other interrupt doesn't cause scheduling activity when the ready queue is halfway through being updated or when the process state is only partially saved or restored.
More Process States
- Waiting
  There may be many conditions for which processes may wait. Any number of processes may be waiting for each such condition. A waiting process may
  - be killed -- involuntarily become dead (not on all systems)
  - be signalled -- involuntarily become ready
  Note that the waiting process cannot initiate any state changes because it is not running; it is therefore entirely passive.
Waiting processes are typically enqueued on a semaphore. Given this, the following process manager services are typically involved:

wait(s)
the calling process changes its state to waiting, specifically, waiting for a signal operation to be applied to the semaphore s. If the semaphore has already been signalled, the process will not enter the waiting state but will continue without stopping. If there are other waiting processes, each signal operation will release only one waiting process.

signal(s)
the calling process undergoes no state change. If there are any waiting processes on the semaphore s, one of them has its state changed to ready. If no processes are waiting, the signal is remembered so that the next process to try to wait will not block.

We can think of a semaphore as an abstract object with two methods. It is common to describe the semantics of semaphores in terms of a simple nonnegative integer, where signal increments the integer and wait decrements the integer. The constraing nonnegative implies that a wait on a semaphore with a value of zero will block until a signal is done in order to allow the semaphore to be decremented.
The abstract implementation of a seempahore given is of little use to an implementor! The usual implementation is more complex; we implement a semaphore as a two-component structure: A queue of waiting processes and An integer count.
One semaphore is needed for each condition on which some process might wait.
Semaphores were invented by E.W.Dijkstra in the late 1960's and they were only widely accepted in the early 1970's.
Prior to this, most operating systems had more ad-hoc and frequently incorrect implementations of the waiting state.
UNIX, which was developed in the late 1960's, does not entirely reflect the post-Dijkstra view of processes. With the exception of some later versions (for example, System V UNIX), UNIX does not provide user-level semaphore services, so UNIX programmers must avoid any such services that are provided if they wish to write portable programs.

Semaphore Operations

The name P is Dijkstra's original name for the wait operation; it comes from the Dutch word which is the cognate of Pause in English. Here is some pseudocode that implements this on a uniprocessor:

Wait( sem ) or P( sem ):
   begin
      Disable Interrupts (on a multiprocessor, we must do more)
      if sem.counter > 0
	 sem.counter := sem.counter - 1
      else
         -- Save process state
         current.pc := proc-return.pc
         current.regs := proc-return.regs

         -- Change state to waiting
	 enqueue( current, sem.queue )

         -- Make new process current process
         current := Dequeue( ready queue )

         -- Restore process state
         proc-return.pc := current.pc
         proc-return.regs := current.regs
      endif
      Enable Interrupts (on a multiprocessor, we must do more)
      return
   end

Signal( sem ) or V( sem ):
   begin
      Disable Interrupts (on a multiprocessor, we must do more)
      if empty( sem.queue )
         sem.count := sem.count + 1
      else
         temp := dequeue( sem.queue );
         enqueue( temp, ready queue )
      endif
      Enable Interrupts (on a multiprocessor, we must do more)
      return
   end

The above implementations are correct only if all processes have equal priority. If some have a higher priority, and if a low priority process issues a signal on a semaphore for which a high priority process is waiting, then the above code lets the low priority process continue running while the high priority process waits in the ready queue. This is wrong.

Threads
In many modern systems, it is common to find a distinction between the terms process and thread. This distinction can be confusing, because from the point of view of the theory of concurrent processes, a thread is a process!
The problem is that the word process, while originally referring to the sequential execution of a program by a processor, has come to mean much more. In many systems, each process has its own protection domain, including a virtual address space, set of open files, and other associated information. As a result, process creation and destruction have become heavyweight operations.
It is worth noting that, as parallel and distributed computing have advanced, our ideas of what is an acceptable amount of overhead for process creation have changed. Under OS-360, the classical operating system for IBM's mainframe computers, introduced in the mid 1960's, process creation was so expensive that processes were rarely dynamically created to run each user's program, but rather, they were statically created and "cleaned and reused" for many programs.
[As an aside, OS-360 came in many versions. OS-MFT (Multiple Fixed Tasks) had no dynamic process creation. All processes were created at sysgen time -- when the system was "generated". OS-MVT (Multiple Variable number of Tasks) allowed expensive dynamic task creation and is still widely run on today's IBM mainframes (now called enterprise servers), usually on a virtual machine running under a descendant of the VM operating system.]
One of advantages of UNIX, when it first came out, was that processes were so light-weight that not only could a new process be easily created every time a user logs into the machine, but new processes could be dynamically created to run each command typed by a user. Rates of a few process creations per minute would have been considered excessive under prior operating systems, while under UNIX, creating a few processes per second was considered perfectly normal.
When parallel programmers look at UNIX, however, they see processes that still appear heavy and difficult to create. This is because many parallel algorithms can naturally be expressed in terms of programs that spawn large numbers of very short-lived processes; such programs attempt to create thousands of processes per second, and when this is done under UNIX, the overhead of process creation becomes the limiting factor in the performance of the program.
This has lead to the development of what have come to be known as lightweight threads -- processes trimmed down to the minimum, freed of all but the most essential attributes: a distinct program counter and a private set of registers. Some of the thread systems now available allow processes to be created at a cost not much greater than the cost of a procedure call that involves saving and restoring registers.
Kernel Threads
A natural way to implement threads is to implement them as part of the operating system kernel. In this case, the kernel must distinguish between an addressing and protection domain, consisting of an address space with an associated set of open files, and a thread, consisting of a minimal process state, just registers and the identity of the domain being used.
The terminology that has emerged with such systems is somewhat confusing. A conventional UNIX process can be modeled by a domain in which a single thread is executing, but it is common to refer to the domain itself as a process, despite the fact that it is not a process, in the theoretical sense, and despite the fact that a thread is in fact a process in the theoretical sense! Thus, one hears talk of a process in which one or more threads are executing.
Typical kernel thread systems based on UNIX have a fork operation that replicates the protection domain of the caller and then launches a single thread in the new domain. In addition, these systems include a "thread create" operation that launches a new thread in the current protection domain.
If the UNIX tradition is ignored, it would probably be more sensible to elevate protection domains to the status of first class objects, so that the operations of domain and thread creation would be completely separated. When launching a thread, the creator of that thread would designate the domain in which it is to be launched from among any of the domains in which the creator has permission to launch threads.
With kernel threads, the primitives that normally block a process (semaphores, blocking I/O operatons, and so forth) block only the thread that requested the operation. Thus, while one thread of a group of threads that share a domain waits for I/O, others may continue.
Threads Outside the Kernel
When an operating system kernel directly supports the notion of threads, thread creation involves a kernel call, and this involves the overhead of a transfer of control from the user's domain into the system and back again. Frequently, this overhead is comparable to the computation actually involved in creating the thread! As a result, an alternative approach to threads has been developed, in which the threads are implemented at the user level instead of being implemented by the system.
User threads are represented by "thread state records" stored in the user's address space; these records typically hold just the program counter and registers of the thread, plus links used by the thread manager (scheduler). On a uniprocessor, threads implemented in this way are not very different from coroutines.
On multiprocessors such as those once sold by Encore and now widely made by Silicon Graphics, the thread package first forks a number of UNIX processes that share a large area of memory (the shared heap). These processes each have "main programs" that wait until a thread is available to run, then remove it from the thread ready list and run it until it blocks. To the UNIX operating system, this is just a fixed group of processes, but to the user of the thread package, the fixed set of processes is viewed almost as a fixed set of processors, and the threads themselves are viewed as processes that are dynamically created and destroyed.
User threads pose some problems! First, each UNIX process used to support the thread package is an independent protection domain. If a thread opens a file, that file is only available to the process in which the thread opened it, and if that thread is later run by a different process, the file will not be available! As a result, the manuals for user-level thread packages always state that all files needed by the application must be opened before the thread system is initialized.
The second problem with user-level thread packages is that when a thread performs a blocking operation such as an I/O operation, not only is the thread blocked, but so is the process that was running it at the time it blocked. The problem is that the kernel call (for example, read) is unaware of the existance of the thread package and the fact that there might be other threads in the thread ready list. As a result of this, if the thread package was started with N processes, and then N threads block to perform I/O operations, no threads will remain running, despite the fact that there may be ready threads!
Composite Thread Systems
If kernel threads are too heavy, but user-level threads have semantic problems, it is natural to ask if there is some compromize. The answer is yes! The solution is to leave threads at the user level but design thread support into the kernal calls that would normally block.
The minimum necessary support is a nonblocking alternative to each normally blocking kernel call. Instead of blocking, this variant would return an error code indicating that the operation would have blocked. Consider the example:
```
      P(semaphore)
        -- may block
```
A nonblocking variant might be:
```
      NonBlockingP(semaphore)
        -- if P would have blocked
           sets errno to WOULDBLOCK
```
Using this, a version of read callable from the thread package can be written:
```
      ThreadP(unit,buffer,length)
        loop
          NonBlockingP(semaphore)
        exit loop when errno <> WOULDBLOCK
          ThreadRelinquish
        end loop
```
Effectively, this repeatedly tries the semaphore until it doesn't block; every time the try fails, it allows other threads to run (using the thread package's relinquish operation); when the operation finally succeedes, it falls through.
Of course, in addition to the semaphores provided by the kernel for synchronization of heavyweight processes, typical thread packages provide lightweight semaphores for synchronization of threads that share the same protection domain. In addition, on multiprocessors, thread packages typically also provide spin locks so that the overhead of a context switch between threads can be avoided when the expected wait for entry to a critical section is shorter than the time taken to switch between threads.
The scheme illustrated above has the disadvantage that it introduces busy waiting. More complex kernel support mechanisms for user-level threads allow busy waiting to be eliminated. Consider the following version:
```
      ThreadP(unit,buffer,length)
        if empty(ThreadReadyList)
          P(semaphore)
        else
          loop
            NonBlockingP(semaphore)
          exit loop when errno <> WOULDBLOCK
            ThreadRelinquish
          end loop
        end if
```
This, at least, avoids busy waiting if the calling thread is currently the only ready thread. So long as ThreadP is only used with semaphores that are used for interprocess synchronization, while some internal thread package semaphore implementation is used for semaphores local to the threads of a process, this will probably perform acceptably.
This discussion of kernel support for user-level threads did not cover questions of how a file opened by a user-level thread can remain open after that thread has been suspended and then picked up by a different thread support process. One response to this problem is to question the entire approach to open files used by operating systems like UNIX, but this is not strictly necessary. If a record of what files are open is maintained somewhere in the shared heap, the different heavyweight processes acting as thread servers can all find out what files should be open and can therefore cooperate to maintain a consistant worldview. The details of how this is done make an excellent homework assignment!
Other Uses of Threads
It is important to note that threads have uses other than the exploitation of parallel processors! Perhaps the most important of these arises in the area of supporting nonblocking I/O on systems that provide only blocking primitives. Consider, for example, the following code to perform a nonblocking write using a blocking one:
```
     NonBlockingWrite(text)
       create thread to run this code
         {
           write(text)
           terminate thread
         }
```
The most important application of this kind of coding is in client server applications. In a file server, for example, a typical protocol involves sending the server's "open file port" a message asking it to open a file. The open operation returns the network address to which read and write requests should be sent. One way to implement this is as follows:
```
     FileServer
       loop
         Await message from the open file port
         create thread to run this code
           {
             create a new port ID
             send the new port ID as a reply
             loop
               Await message from the new port
               process read and write messages
             until message is "close"
             delete the new port
             terminate thread
           }
       end loop
```
This scheme avoids the need for a "multiwait" primitive in the server that waits for an incoming message on any open file port; it allows the server to utilize multiple processors, if they are available, and it allows the logic of the server to be highly simplified compared to servers that use a single conventional process to handle many different operations.
This scheme requres either kernel threads or kernel support for user-level threads! (Why? This is a very good homework problem or exam question because it exercises your understanding of the limits of user-level threads.)
Similar schemes can probably be used to greatly simplify the horribly convoluted event-driven logic of applications that run in windowing environments, although I know of no efforts in this direction. The basic idea is to create a thread for each active window.