The Unix Memory Model

Part of 22C:112, Operating Systems Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Memory Protection

Consider the following horribly misbehaved program:

/* misbehaved */
#include <stdlib.h>
int main() {
        for (;;) {
                long int i;
                *(int *)random() = random();
        }
}

This program uses the standard pseudo-random number generator from the C/C++ library to maliciously write random integers to random locatons in memory. The attack this program performs on itself is comparable to a horrible flood of hardware failures, or the impact of the very least subtle attacks on the security of an operating system. It is also typical of the consequences of the typical programming error made by C or C++ programmers when working with pointers.

The question is, why doesn't this program crash the system when you run it on a modern Unix, Linux, MacOS or Windows machine? Indeed, if you run it on an old Mac with a 68000 processor, as opposed to the later 68030, Power PC or Intel processors, it has a high likelihood of crashing the machine. The same is true if you run it under Microsoft DOS or versions of Windows from prior to Windows NT and Windows 95.

These old operating systems used computer architectures that offered a very simple memory addressing model. All of memory was seen as a simple array of integers, addressed starting at zero and running up through a limit that depended on the particular chip being used. This model dates back to the seminal paper by Burks, Goldstine and VonNeumann written in 1946.

By the 1960's, the designers of large computer systems realized that the lack of memory protection was a big problem, so they began exploring mechanisms that would do two things:

Prevent misbehaving programs from damaging the operating system.
Prevent misbehaving programs from accessing or damaging data belonging to other programs.

Initially, many vendor's solutions to these problems involved different mechanisms to solve these problems, but by the mid 1960's, it was obvious that a single basic mechanism could accomplish both purposes. This device has come to be known as the memory management unit.

It took a while for memory management units to mature, but by 1970, several vendors were selling them in essentially modern form. The memory management units of the Multics system, the IBM System 360 Model 67, and others did not differ greatly from that of the Power PC or Pentium -- as seen from the perspective of the programmer. Of course, the implementation technology was quaint, and of course, lots of little details of the implementaiton differed from modern memory management units.

Key Elements of Memory Management

Memory Address:: A bit pattern that selects one item from the array of items that make up the memory of the computer. Typically, the addressed items are bytes, words, or small multiples thereof.
Address Space:: The abstract space of all possible memory addresses. For a 32-bit address, the address space runs from 00000000 to FFFFFFFF₁₆.
Virtual Memory Address (sometimes called the name):: A memory address issued by the user program. Load, store, and branch instructions use some addressing mode to compute a memory address before using that address for some purpose. Pointer variables contain memory addresses. Object handles are pointer variables that refer to the representation, in memory, of some object. In the presence of a memory management unit, all of these memory addresses are virtual addresses.
Physical Memory Address:: When the CPU issues a virtual memory address, the memory management unit checks the validity of that address, and if valid, it translates it to a physical memory address.
Address Map:: The address map of the memory management unit maps each address in the virtual address space to either some address in the physical address space or to invalid. Generally, the address map of any address space describes, for each address in that space, what operations are valid on it. Some addresses may be invalid, some may be read-only, and some may be read-write. Some may be persistant memory (for example, flash memory) while others are transient (classical RAM or DRAM).
Page Table:: An implementation of the virtual address map in which consecutive virtual addresses are gathered into fixed size blocks, called pages. All addresses in each page are mapped to consecutive physical addresses, and all addresses in each page share the same attributes (read-only, read-write, invalid, etc). If the page is valid, the page table maps it to a page frame in physical memory.
Page:: A fixed size block of virtual memory. The size of a page is generally a power of two -- 512 bytes was once a common page size, 4K bytes remains common today. As a result, the least significant bits in each virtual memory address give the word or byte within a page (depending on whether the machine in question allows byte addressing), while the more significant bits of the virtual address specify which page within the virtual address space is being addressed. The memory management unit only looks at the most significant bits.
Page Frame:: A fixed size block of physical memory, sized to hold one page of data. Pages may be moved from one page frame to another without the application program knowing of the move so long as the address map is updated to reflect the move.
Segment:: A variable sized block of virtual memory. In some memory management units, segments were supported directly, but the dominant model today is to create segments from consecutive pages. User programs rarely concern themselves with pages, but they are frequently concerned with segments. Typically, for example, the program will have a stack segment, a code segment, and a segment containing the statically allocated global variables.

The Unix process model

Unix (and its many clones, Linux, Solaris, FreeBSD, and many more) organizes memory by processes. Each process represents one program being executed. The operating system is able to execute multiple processes in parallel by multiplexing one or more CPUs between the set of processes that are currently ready to execute. There is no need to allocate CPU power, for example, to processes that are blocked awaiting input. If there are more ready processes than there are CPUs, the operating system will typically dole out time to the different processes in increments of something like 1/100 of a second.

The operating system allocates a number of resources to each process:

Memory Resources

The Code Segment: This is typically read-only or read-execute-only, and all processes that are running the same program usually share the code segment.
The Stack Segment: This must be read-write, as this is where the activation records for the running program are stored.
The Static Segment: This segment holds statically allocated global variables, and it also typically holds the heap in which dynamically allocated objects are stored. When you build linked lists or binary trees, these are in the heap.
Other Segments: Most modern Unix variants allow programs to allocate additional segments.

Open File Resources

Each Unix process has a list of files it has opened. In the abstract, each open file is just an object of class file, with methods such as read, write, seek and close, but file objects, as a security critical part of the system interface, are outside of the Unix memory model. The actual objects are stored in the operating system's private memory space, and they are addressed indirectly (and clumsily) through file handles in the open file table.

Standard Input: Open file zero is the standard input stream of a Unix process. By default, this is the keyboard, but it is easy to run programs with input from other sources.
Standard Output: Open file one is the standard output stream of a Unix process. By default, this is the current terminal window, but it is easy to run programs with output to other sources.
Standard Error: Open file two is the standard error stream of a Unix process. By default, this is the current terminal window, and it is used only for error messages. Redirecting standard output leaves the error messages on the terminal, but this stream too can be redirected.
Other Open Files: Any other files, up to a system defined limit, may be opened. Typically, the limit is 16 on older 16-bit systems, and 32 on newer 32-bit systems, although this strange coupling of the number of permited files to the word size essentially irrational.

System Calls

Note that the operating system is not necessarily visible in the address space of a process. The memory model does not require that the operating system be invisible, but if any parts of the operating system are visible in the process's address space, they are read-only. Furthermore, no operating system data structures that reveal private information about other running processes can be readable, and the only legal control transfers into the operating system are through calls to operating system services.

If the operating system is not visible to the application program, how does an application program call a system service. To the programmer, the system services are typically simply subroutines in the standard library, routines like open() and read().

The answer depends on the architecture. Some memory management units permit certain addresses in the address space to be designated as "gate crossing" addresses or system entry points. Others do not. Here, we describe the one solution that works on every modern machine, regardless of the nature of its memory management unit:

In general, all interrupts and traps cause a transfer to a system service routine (either an interrupt or trap service routine). All such service routines are part of the operating system. With the simplest memory management units, the MMU is simply turned off on entry to such a service routine; with more complex MMUs, the MMU switches from the memory address space of the user to the memory address space of the system. In either case, return from trap switches back to the address space of the program that caused the trap.

What the operating system designers do is pick some trap and designate this as a call to the operating system. Some hardware designers reserve some opcode for this, giving it a name such as TRAP. Others do not, but any way of forcing a trap will suffice. For example, we could declare that the last page in the address space, containing the address FFFFFFFF₁₆, will never be executable. Having declared this, we can use a jump to FFFFFFFF₁₆ as a system call. Since there are many different addresses in that page, we can allocate one address to each system call, so a jump to FFFFFFE5₁₆ might be our code for read(). Given this, a user's call to read() would call a stub of the form:

int read( int fd, char * buf, int len )
{
	static int (*readit)( int fd, char * buf, int len ) = 0xFFFFFFE5;
	(*readit)( fd, buf, len );
}

One such stub would need to be created for each system call. The system code for the trap handler would then need to examine the virtual address that caused the trap. On finding the address FFFFFFE5₁₆, it would know that this trap should be interpreted as a call to read(). The code would then need to get the parameters, which requires close cooperation between the compiler and the system, since how a compiler passes parameters is determined by the compiler writer. If the compiler passes parameters in registers, then the saved parameter values are in the memory locations used to save registers on entry to the trap service routine. If parameters are passed on the user's stack, then the parameters are on top of the stack in the user's address space.

Parameter Validation

If parameters are passed on the caller's stack, the easiest way to view this, from the system perspective, is that the caller's stack pointer is passed as a parameter in a register. Typically, the system establishes a new stack, in the system address space, for its own return addresses and local variables.

If we turn off the memory management unit while the system is running, or if user and system code have different virtual address maps, pointers passed by the user, and even the user's stack pointer, cannot be used as memory addresses because the memory they reference will not be at the same place in the operating system's address space as it is in the user's address space.

Therefore, each pointer passed by the user to the system must be validated and translated to an address in the system's address space. For the sake of example, consider a system where the memory management unit is simply turned off while the system is responding to a trap. In this case, the system must take each pointer provided by the user program and perform virtual to physical address translation on that pointer before using it. This translation might have any of the following outcomes:

The pointer is found to be invalid, that is, it points to data in a page that is marked as invalid. In this case, the system call must fail. In the case of Unix, system call failures are indicated by returning an error indication, but in a more modern design, failure could be indicated by raising an exception.
The pointer is found to point to data in a page that is not in main memory. In this case, we must perform all the work of a page fault service routine to bring the page into main memory before translating the user's pointer to a system pointer.
The pointer is valid but the page it points to is read-only. In this case, if the system call needs to write the indicated address, the system call must fail.
The pointer is valid and the page it points to is read-write. In this case, there is no problem.

Having completed the translation of the user's pointer to a system pointer, the system can then use that pointer directly to access one byte of memory.

If the system needs to access many bytes of memory, there are two options: First, the system could access them one byte at a time, repeating the above validation and translation step for each byte. Second, the system could have a pointer validator and translator that operates on a range of addresses. In this case, the translated range of addresses will not necessarily be the same size as the desired size. If I ask the address translator-validator to translate address X for 200 bytes, it may return t(X) for 120 bytes. The system must then translate address X+120 for 80 bytes as a separate transaction. This is because pages that appear consecutive to the user won't necessarily be consecutive to the system.

References

The modern idea of a digital computer dates to 1946. Earlier conceptions of programmable calculators date to the 19th century in the work of Charles Babbage, and these ideas reappeared in the early 20th century in such machines as the Harvard Mark series of relay-based machines, as well as the first electronic calculator, Eniac. Everything changed, though, after this paper:

Preliminary discussion of the logical design of an electronic computing instrument Taken from report by Arthur W. Burks, Herman H. Goldstine and John von Neumann to the U. S. Army Ordnance Department, 1946.

Memory management references:

The Wikipedia writeup on memory management units is a good place to start.
The Wikipedia writeup on address space supports this.
One-Level Storage System by T. Kilburn, D. B. C. Edwards, M. I. Lanigan and F. H. Sumner, first published in 1962, describes the Atlas computer's virtual memory system. The Atlas computer was the first computer that could claim to have anything that resembled a modern memory management unit.
The PDP-8 Memory Extension and Time-Share Option, sold starting in 1965, was an extremely simple yet effective memory management unit for a very small computer, even by the standards of that era.