22C:116, Lecture Notes, Nov. 11, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Threads

    In many modern systems, it is common to find a distinction between the terms process and thread. This distinction can be confusing, because from the point of view of the theory of concurrent processes, a thread is a process!

    The problem is that the word process, while originally referring to the sequential execution of a program by a processor, has come to mean much more. In many systems, each process has its own protection domain, including a virtual address space, set of open files, and other associated information. As a result, process creation and destruction have become heavyweight operations.

    It is worth noting that, as parallel and distributed computing have advanced, our ideas of what is an acceptable amount of overhead for process creation have changed. Under OS-360, process creation was so expensive that processes were rarely dynamically created to run each user's program, but rather, they were statically created and "cleaned and reused" for many programs.

    One of advantages of UNIX, when it first came out, was that processes were so light weight that not only could a new process be easily created every time a user logs into the machine, but new processes could be dynamically created to run each command typed by a user. Rates of a few process creations per minute would have been considered excessive under prior operating systems, while under UNIX, creating a few processes per second was considered perfectly normal.

    When parallel programmers look at UNIX, however, they see processes that are too heavy and difficult to create. This is because many parallel algorithms can naturally be expressed in terms of programs that spawn large numbers of very short-lived processes; such programs attempt to creat thousands of processes per second, and when this is done under UNIX, the overhead of process creation becomes the limiting factor in the performance of the program.

    This has lead to the development of what have come to be known as lightweight threads -- processes trimmed down to the minimum, freed of all but the most essential attributes, a distinct program counter and a set of registers. The various thread systems now available allow processes to be created at a cost not much greater than the cost of a procedure call that involves saving and restoring registers.

  2. Kernel Threads

    A natural way to implement threads is to implement them as part of the operating system kernel. In this case, the kernel must distinguish between an addressing and protection domain, consisting of a set of open files and a virtual address map, and a thread, consisting of a minimal process state, just registers and the identity of the domain being used.

    The terminology that has emerged with such systems is somewhat confusing. A conventional UNIX process can be modeled by a domain in which a single thread is executing, but it is common to refer to the domain itself as a process, despite the fact that it is not a process, in the theoretical sense, and that a thread is in fact a process! Thus, one hears talk of one process in which one or more threads are executing.

    Typical kernel thread systems based on UNIX have a fork operation that replicates the protection domain of the caller and then launches a single thread in the new domain. In addition, these systems include a "thread create" operation that launches a new thread in the current protection domain.

    If the UNIX tradition is ignored, it would probably be more sensible to elevate protection domains to the status of first class objects, so that the operations of domain and thread creation would be completely separated. When launching a thread, the creator of that thread would designate the domain in which it is to be launched from among any of the domains in which the creator has permission to launch threads.

    With kernel threads, the primitives that normally block a process (semaphores, blocking I/O operatons, and so forth) block only the thread that requested the operation. Thus, while one thread of a group of threads that share a domain waits for I/O, others may continue.

  3. Threads Outside the Kernel

    When an operating system kernel directly supports the notion of threads, thread creation involves a kernel call, and this involves the overhead of a transfer of control from the user's domain into the system and back again. Frequently, this overhead is comparable to the computation actually involved in creating the thread! As a result, an alternative approach to threads has been developed, in which the threads are implemented at the user level instead of being implemented by the system.

    User threads are represented by "thread state records" stored in the user's address space; these records typically hold just the program counter and registers of the thread, plus links used by the thread manager (scheduler). On a uniprocessor, threads implemented in this way are not very different from coroutines.

    On multiprocessors such as those sold by Encore and Silicon Graphics, the thread package first forks a number of UNIX processes that share a large area of memory (the shared heap). These processes each have "main programs" that wait until a thread is available to run, then remove it from the thread ready list and run it until it blocks. To the UNIX operating system, this is just a fixed group of processes, but to the user of the thread package, the fixed set of processes is viewed almost as a fixed set of processors, and the threads themselves are viewed as processes that are dynamically created and destroyed.

    User threads pose some problems! First, each UNIX process used to support the thread package is an independant protection domain. If a thread opens a file, that file is only available to the process in which the thread opened it, and if that thread is later run by a different process, the file will not be available! As a result, the manuals for user-level thread packages always state that all files needed by the application must be opened before the thread system is initialized.

    The second problem with user-level thread packages is that when a thread performs a blocking operation such as an I/O operation, not only is the thread blocked, but so is the process that was running it at the time it blocked. The problem is that the kernel call (for example, read) is unaware of the existance of the thread package and the fact that there might be other threads in the thread ready list. As a result of this, if the thread package was started with N processes, and then N threads block to perform I/O operations, no threads will remain running, despite the fact that there may be ready threads!

  4. Composite Thread Systems

    If kernel threads are too heavy, but user-level threads have semantic problems, it is natural to ask if there is some compromize. The answer is yes! The solution is to leave threads at the user level but design thread support into the kernal calls that would normally block.

    The minimum necessary support is a nonblocking alternative to each normally blocking kernel call. Instead of blocking, this variant would return an error code indicating that the operation would have blocked. Consider the example:

          P(semaphore)
            -- may block
    
    A nonblocking variant might be:
          NonBlockingP(semaphore)
            -- if P would have blocked
               sets errno to WOULDBLOCK
    
    Using this, a version of read callable from the thread package can be written:
          ThreadP(unit,buffer,length)
            loop
              NonBlockingP(semaphore)
            exit loop when errno <> WOULDBLOCK
              ThreadRelinquish
            end loop
    
    Effectively, this repeatedly tries the semaphore until it doesn't block; every time the try fails, it allows other threads to run (using the thread package's relinquish operation); when the operation finally succeedes, it falls through.

    Of course, in addition to the semaphores provided by the kernel for synchronization of heavyweight processes, typical thread packages provide lightweight semaphores for synchronization of threads that share the same protection domain. In addition, on multiprocessors, thread packages typically also provide spin locks so that the overhead of a context switch between threads can be avoided when the expected wait for entry to a critical section is shorter than the time taken to switch between threads.

    The scheme illustrated above has the disadvantage that it introduces busy waiting. More complex kernel support mechanisms for user-level threads allow busy waiting to be eliminated.

  5. Other Uses of Threads

    It is important to note that threads have uses other than the exploitation of parallel processors! Perhaps the most important of these arises in the area of supporting nonblocking I/O on systems that provide only blocking primitives. Consider, for example, the following code to perform a nonblocking write using a blocking one:

         NonBlockingWrite(text)
           create thread to run this code
             {
               write(text)
               terminate thread
             }
    

    The most important application of this kind of coding is in client server applications. In a file server, for example, a typical protocol involves sending the server's "open file port" a message asking it to open a file. The open operation returns the network address to which read and write requests should be sent. One way to implement this is as follows:

         FileServer
           loop
             Await message from the open file port
             create thread to run this code
               {
                 create a new port ID
                 send the new port ID as a reply
                 loop
                   Await message from the new port
                   process read and write messages
                 until message is "close"
                 delete the new port
                 terminate thread
               }
           end loop
    
    This scheme avoids the need for a "multiwait" primitive in the server that waits for an incoming message on any open file port; it allows the server to utilize multiple processors, if they are available, and it allows the logic of the server to be highly simplified compared to servers that use a single conventional process to handle many different operations.

    This scheme requres either kernel threads or kernel support for user-level threads!

    Similar schemes can probably be used to greatly simplify the horribly convoluted logic of applications that run in windowing environments, although I know of no efforts in this direction.