Threads vs Processes

Part of CS:3620, Operating Systems Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

How did Unix decide on Fork

The decision to use fork() as the only process creation primitive in Unix is somewhat of a puzzle. It seems very expensive to create a new process by making a complete copy of the parent process, particularly when the most common act by the child process is to immediately use some version of exec() to launch a new application -- abandoning the copied data in the process.

This did not seem expensive in the original Unix. That is because the first two implementations of Unix were on a surplus PDP-7. This had no memory management unit. As a result, it only allowed one process in memory at a time. All other processes were stored on disk (or possibly drum). In this setting, the normal cost of a context switch was the cost of swapping the current process to disk and swapping the next process in from disk. To fork a process, you just swap it out to disk and let the image of the process left behind in RAM continue to run as the child process. This is inexpensive.

The cost rises, however, when you start using a memory managemet unit with demand paged memory. When Unix was run on 16-bit PDP-11 computers with simple memory managers, swapping remained in use, although the RAM was big enough to allow one running process while several other processes were awaiting swapping out or being swapped in. With the migration from the 16-bit PDP-11 to the 32-bit VAX, however, the cost changed immensely. The VAX had a pure paged view of the address space with a full 32-bit virtual address, and replicating all of the memory of a process when it forked would be very expensive.

The solution to this problem was lazy copying. The idea is simple: First, there is never any reason to copy read-only or read-execute pages. They can be shared by all processes that use those pages. So, when a process forks, why not mark all of its pages read-only and make them shared, with a side note saying that the shared pages are only shared for the purpose of a lazy fork.

This works, after the fork, until the moment that either the parent or the child process tries to change some variable in RAM. When this happens, the page-fault service routine checks the side-note on the page in question. If it says "this is really read only", the service routine does the usual thing for a process that tries to alter a read-only page -- signal an exception in that process. On the other hand, if the page was marked "read only for the purpose of a lazy fork", the page fault service routine duplicates that page, marks the duplicate read-write, and then lets the process continue with the duplicate.

As a result, after a fork, the cost of copying is limited to those pages that are actually changed by either the parent or the child process, and the cost of copying is spread out instead of being paid up front at the time of the fork.

Threads versus Processes

In a simple system with no memory management unit, where all processes share the same address space, there is no necessary difference between threads and processes. Once there are multiple address spaces, however, these terms diverge.

The term process is generally used for heavy objects that combine a thread of execution with its own address space. Generally, therefore, context switching between two processes involves not only changing the contents of the CPU registers, but also changing the address map used by the memory management unit. The result of this is that we must either replace the page table (if it is stored in the MMU) or flush the cache inside the MMU and change the pointer to the page table. In either case, the cost is high compared to just changing registers.

Typically, open files are associated with processes as well, and in many systems, processes have other resources, such as timers -- In Unix, for example, each process has a programmable interval timer that can be set to signal the process after any desired delay.

The term thread is generally used for threads of execution within one process. User threads are entirely implemented by user code, while kernel threads are implemented by thread management services provided by the host operating system.

User threads have the fastest context switching, since they involve no transfer of control to the operating system, with the attendant cost of a system call. On the other hand, if a user thread is blocked by a system call, all threads in that process will be blocked.

Kernel threads impose the overhead of a system call with each thread operation, but they have the advantage that, when a system call blocks, other threads in the same process may continue running.

In operating systems that support kernel threads, each process is an address space, a set of open files, and one or more threads of execution.