Chapter 13, Virtual Memory

When main memory isn't big enough

Part of the notes for 22C:50, Introduction to System Software
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Memory Hierarchy

Most computer users take the main memory resources of computers for granted, assuming that they are completely implemented by hardware and thus, aside from the problem of storage allocation, not an object dealt with by system software. This is indeed the case for small computer systems, but since the early 1990's, desktop computers have been large systems and system software has played a large part in implementing the memory resources seen by applications programs.

There are two primary problems with using the raw physical memory resoruces directly in application programs. First, high performance main memory is comparatively scarce and expensive, at least when compared with disk, and second, there are security and reliability problems when multiple applications share the same memory.

The problem of the cost of high performance memory has troubled system designers since the early days of computing. In early 2003, 256 megabytes of SDRAM cost about $50, or about 20 cents per megabyte; in contrast, 200 gigabytes of disk, spinning at 7200 RPM, cost $340, or about 0.2 cents per megabyte, and the price fell to 0.1 cent per megabyte for disks spinning at $5400 RPM. In 1965, the costs and memory capacity figures were very different: A 6K byte core memory module (4K times 12 bits) cost $10,000, or about 1.6 dollars per byte, while a 384 kilobyte drum (256K times 12 bits) cost $43,600, or about 11 cents per byte.

Whether in 1965 or 2003, main memory costs much more per byte than rotating magnetic memory, and the ratio between main memory and secondary memory costs has increased by a factor of ten from then to now! As a result, there is great economic pressure on system designers to try to combine small high speed memories with large low speed memories in order to build something with a cost similar to that of disk but a speed comparable to main memory. This leads to hierarchic memory structures, and frequently, these have more than two levels.

Today's high performance microprocessors frequently have an L1 cache on the same chip as the processor; for example, the Power PC 601 has 32K. This memory is very fast and because it is on the same chip as the processor, there is very little delay in accessing it. L2 cache is not on the same chip as the processor, and it is typically both larger and slower. For the Power PC family, 256K of L2 cache is common. The same machine might have several hundred megabytes of main memory and several gigabytes of disk.

The interface between a cache memory and main memory is almost always managed by hardware, but the interface between main memory and backing storage is usually largely in software. On early systems, this interface entirely was under user program control; when a program needed data or code, that program was responsible for bringing it into main memory. The problems of overlay management which result from this have consumed a tremendous amount of programmer effort and have motivated the development of complex software tools such as the overlay linkers discussed in Chapter 7. None of these solutions has been entirely successful, and in the early 1960's, an alternative was developed: virtual memory.

It is worth noting that early British and American terminology for some issues in the design of virtual memory systems differ, and both sets of terminology persist to this day.

store - British

memory - American

After the 1970's, the term RAM has become dominant; in the 1960's and 1970's, people called it core, but these older terms avoid reference to any underlying technology.

Of course, in a memory hierarchy, we should qualifiy our terms. Today's RAM is used as primary memory or main memory (or the primary store or main store).

backing store - British

secondary memory - American

swap sapce - an archaic term, still used by Unix users!

After the 1960's, people have assumed that this was disk, while before this, the assumption was that it was a fast magnetic drum or tape; these older terms avoid reference to the technology, and the term secondary memory, in this context, dates back to 1946.

Virtual Memory

In a virtual memory system, the burden of overlay management is moved from the programmer to the operating system and the hardware, working in cooperation. When a program requests the use of a memory block which is not currently in main memory, the system is automatically informed by the hardware so that it can bring the needed data into main memory. The user program sees no difference between an access to a word of data which was previously in main memory and one which had to be moved from secondary memory except a difference in access time. Since most programs exhibit good locality of reference, most memory accesses will be fast, and the average speed of execution of a program will be near that which the program would attain if all of the memory were fast memory.

In addition to the sizes of the primary and secondary memory devices, three key questions must be answered to obtain a complete description of any virtual memory system. The first of these is: What are the units of transfer between main memory and the backing storage device? This has two common answers, pages, or fixed size blocks of information, and segments, or variable sized blocks. Typically, the page size used in a virtual memory system is a small multiple of the sector size of the backing storage device. When segments are used, each segment is typically a logical unit such as the code for a procedure or function, on the one hand, or the representation of a particular data object on the other hand.

The second question is: What causes a transfer from backing storage to main memory? Demand transfers are those which are initiated by an attempt by a program to access a page or segment which is not currently in main memory, while anticipatory transfers are those made by the system in anticipation of a user demand. Some systems also allow a user program to request a transfer in anticipation of a specific later need.

The final question is: What policy is used to select objects in main memory which are to be moved back to the backing store when the space they occupy is needed for other pages or segments? There are a number of replacement policies, ranging from random replacement to least-recently used replacement. The choice of an appropriate replacement policy is one of the key variables determining the performance of a virtual memory system, since the premature replacement of a page or segment which is still being used can significantly slow the system.

Virtual memory was originally developed by two groups, working independently. Paged virtual memory was developed by R. M. Kilburn at the University of Manchester, and commercialized by Feranti Corporation. Segmented virtual memory was developed by Burroughs corporation in the United States. Both groups came to market with working systems around 1960.
Virtual memory was generally seen as obscure technology throughout the 1960's, although by the late 1960's, several computers were being made with the memory management unit hardware needed to implement this technology and several commercially available systems used it: Aside from Burroughs and Feranti, the Scientific Data Systems 940 and DEC PDP-10 computer systems were the best known commercial offerings. IBM and GE (later Honeywell) experimented with virtual memory, and with the commercialization of IBM's VM 370 and Honeywell's Multics operating systems in the early 1970's, virtual memory came to dominate mainframe operating systems.
The Unix system was modified to support virtual memory at the University of California at Berkeley in the early 1980's, and by the mid 1980's, as Unix based workstations began to arrive on desktops, essentially all of them used virtual memory. Early microprocessors did not support the necessary memory management units, but with the advent of the Intel 80386 and the Motorola M68030, virtual memory became a possiblity in the personal computer marketplace. It took some time for Microsoft and Apple to exploit this, though; it was only with Windows NT and MacOS X that these systems made extensive use of virtual memory.

Paging versus Segmenting

The hardware being used largely determines whether a virtual memory system uses paging or segmenting (or some combination), and indeed, whether a particular computer system can be used to implement virtual memory at all. If virtual memory is to be used, the hardware must provide some facilities for translating virtual addresses, as issued by a user program, to physical addresses, as used to access actual words or bytes in the main memory. This address translation hardware is responsible for detecting attempts by a program to access objects which are currently on secondary memory, and it isolates programs from any knowledge of the physical addresses actually being used to store objects in main memory. In this latter role, virtual memory hardware can be thought of as performing a run-time relocation function. Again, because of the independent American and British development of this technology, we have two sets of terms for many issues:

name - British
virtual address - American: The name or virtual address of a variable is the way that variable is referenced in a program; in British usage, this applies even to references to a variable in the binary representation of a program.
address - British
physical address - American: The address or physical address of a variable is where that variable is actually stored. British usage is less common here because from an American viewpoint, it sounds ambiguous.
name space - British
virtual address space - American: The name space or virtual address space is the set of all names or virtual addresses that might be used in a program. At any instant, some of these will be valid, some will be invalid, some will refer to items in the backing store, and others will refer to items in main memory.

When segmented memory is used, each memory address issued by the user program must consist of the identity of a segment and the offset of the desired item within that segment, as shown in Figure 13.1.

segment number      address within segment
 ____________        ____________________
|____________|      |____________________|

Figure 13.1. A segmented virtual address.

For example, on the Intel 80286 (and to a lesser extent, the Intel 8086), each memory reference is identified by the hardware as being addressed to either the current code segment, the current stack segment, or one of two additional segments known as the extra segments. Each segment may be from 0 to 64k bytes long, and each is described by a base register and a limit register. All later processors in the Intel 80x86 family support this addressing model, but this addressing model is rarely used on more recent family members.

On the 80286, all instruction fetches are from the instruction segment, and all operand references are, by default, from the stack segment. There is a special form of every memory reference instruction to override this default and explicitly control what segment is used for the operand. There are additional instructions for loading these base and limit registers; in the processor's protected mode of operation, these instructions load the base and limit registers from a segment holding descriptions of other segments, called the processor base segment, which lists all of the segments the current program is allowed to access. A virtual memory system on this machine would quite naturally transfer data between primary and secondary memory in units of one segment.

When paged memory is used, each memory address issued by the user program is typically interpreted as a single integer by that program, but the hardware breaks the address into two fields. The most significant field is the page number, while the least significant field is the word (or byte) number within that page, as is shown in Figure 13.2.

            address                 --- user view
 _______________________________
|___________|___________________|

page number   address within page   --- system view

Figure 13.2. A paged virtual address.

For example, a 16 bit virtual address may be broken into an 8 bit page number, selecting one of 256 addressable pages, and an 8 bit specification of the byte desired from that page. Unlike segments, pages are all the same size, and user data and program structures are usually placed in memory without regard to how those structures are broken up into pages (although paying a little attention to this can improve performance).

Independently of whether paged or segmented memory is used, it is conventional to refer to pages and segments as components of the user's address space, as opposed to referring to them as units of storage in memory or on disk. Thus, a particular page or segment may reside in main memory at some times and on disk at other times, while it is always in the user's address space. The place where the segment is stored does not change its identity, and the segment may be moved from one place to another many times during the execution of a program.

When the virtual address space is broken up into pages, the term page frames or just frames is conventionally used to refer to the areas in main memory which may hold pages, and the term may also be used to refer to locations in secondary memory (disk) that hold pages, particularly if the page size is not one sector, and if pages may move from one disk address to another during the life of a program.

When a user attempts to address a page or segment which is not currently in some frame in main memory, that event is called a page fault. The term segment fault can be defined similarly, but there is nothing like a 'segment frame' because there is no standard segment size.

Most modern systems combine paging and segmenting; thus, each segment consists of one or more pages. This is a particularly useful when the maximum size of a segment is such that bringing an entire segment into main memory would be difficult. A second advantage of this is that it eliminates the need to allocate variable sized blocks of main memory to hold entire segments. When paging and segmenting are combined, the virtual address issued by a user program consists of a segment number and an integer offset within that segment, and the system interprets the offset as consisting of two fields, the first giving the page number within that segment, and the second giving the desired word in that page, as shown in Figure 13.3.

segment number            address within segment        --- user view
 ____________        _______________________________
|____________|      |___________|___________________|

segment number      page number   address within page   --- system view

                         or

                      address                           --- user view
 ___________________________________________________
|_______________|_______________|___________________|

segment number     page number    address within page   --- system view

Figure 13.3. Paged, segmented virtual addresses.

The Intel 80386 and higher numbered processors (including all Pentiums) take this view, as do the Motorola 68030 and up, and most RISC processors, including the IBM/Apple Power PC, the HP PA-RISC and the SGI MIPS families of processors. In all these machines, addresses used by the user program are 32 or more bits, and are broken into 3 fields, as illustrated in the second example above. Some operating systems on these machines use segments to hold logical objects such as major parts of the program, while others allow users to ignore the hardware segment boundaries and treat the entire address as an integer.

Paged Virtual Address Translation Hardware

Operating systems that support virtual addressing rely on the services of a hardware component called the memory management unit or MMU. The memory management unit sits logically between the processor and the main memory. For some computers, the memory management unit is an optional component, while for others, including most modern microprocessors, the memory management unit is an integral part of the processor chip. However it it is packaged, the memory management unit takes, as input, the virtual addresses issues by the program and produces, as output, the physical addresses used by the memory. This is illustrated in Figure 13.4.

 ____________                                         ________
|            |                                       |        |
|   Central  |<=====================================>|        |
| processing |             ____________              | Memory |
|    Unit    |===========>|            |============>|        |
|____________|  virtual   |   Memory   |  physical   |________|
                address   | Management |  address
                          |    Unit    |
                          |____________|

Figure 13.4. The memory management unit, in context.

The addresses seen by the application programmer programmer, that is, the values of pointer variables, object handles and index registers, are all virtual addresses. When a program issues a virtual address, the memory management unit hardware must signal a page fault if the desired page is not in main memory, and the memory management unit must find the appropriate physical address if the page is in main memory. There are two approaches to this, one centered on the use of a page table, and one centered on the use of a frame table.

A page table is an array, indexed by the page number field of a virtual address, where each array entry indicating where the corresponding page is currently stored. A frame table is an array, indexed by frame number, indicating which page is currently stored in that frame. In fact, most operating systems that use paged virtual memory keep both tables, but the memory management unit hardware only needs to know about one or the other to do the address translation job.

The use of a page table is illustrated in Figure 13.5.

      virtual address          -- from CPU
 ___________________________
|___________|_______________|
page number   address in page
      |             |___________________ 
      |                                 |
      |     __________________          |
      |    |_____|____________|         |
index |    |_____|__        __|         |
 into |    |_____|__  page  __|         |
table  --->|_____|__  table __|         |
           |_____|____________|         |
           |_____|____________|         |
            valid    address            |
              |         |               |
              |     ____________________________
              |    |____________|_______________|
To CPU <------     frame number   address in page
(invalid leads 
 to page fault)           physical address  -- to memory

Figure 13.5. The use of a page table.

As shown, the page number field of the virtual address is used as an index into the page table. Each page table entry has at least one bit indicating whether that entry refers to a page in main memory or to one in secondary memory; if the page is in main memory, the hardware considers a reference to that page to be valid; if not, it is an invalid address and the operating system must be informed. The remainder of the page table entry for a valid page is the frame number where that page is stored; for invalid pages, the software may use this field for something else, perhaps the disk address of the page.

If the virtual address space of a user program is small, for example, under a thousand pages, the entire page table may be held in special registers within the central processor; in the days of 16-bit address spaces, 256 pages of 256 bytes each made excellent sense. If the virtual address space is large, the page table must be held in main memory, in which case, there is typically a page-table base-register in memory management unit. When the page table is stored in main memory, it would appear that each memory access made by a user program would require an extra access to fetch the required page table entry, but most memory management unit hardware avoid as many as possible of these extra accesses by using a special cache memory inside the memory management unit to store the most recently used page table entries.

On many modern systems the page table itself can grow too large to fit conveniently in main memory. Some systems have overcome this problem by using combined paging and segmenting, so that only the page tables for recently used segments need be in memory at one time. Another approach is possible! The Digital Equipment Corproation (now part of Compaq, which was later bought by HP) VAX family of computers, built between the mid 1970's and the early 1990's, stored the page table in the virtual address space; as a result, only the recently used pages of the page table would typically appear in main memory, whilt the remainder of the table would be stored on disk. Of course, this means that the VAX memory management unit may have to report an error in translating the address of the page holding the address of the page requested by the central processor, which complicates the picture.

The use of a frame table is illustrated in Figure 13.6.

      virtual address         -- from CPU
 ___________________________
|___________|_______________|
page number   address in page
      |             |______________
      |                            |
 page number                       |
 ___________  ___                  |
|___________|  |                   |
|__       __|  |_____  search      |
|__ frame __|  |     |             |
|__ table __| _|_    |             |
|___________|        |             |
|___________|        |             |
    miss       ______|_____________|_______
      |       |____________|_______________|
      |          frame number   address in page
  <--- 
  to CPU             physical address  -- to memory

Figure 13.6. The use of a frame table.

If a frame table is used, the page number field of the virtual address must be compared with each entry in the frame table to determine which frame holds the desired page. If there is no match, the address is invalid and the operating system must be informed. If there is a match, the index of the frame table entry that matched is used as the frame number in constructing the physical address. The requirement that the frame table be searched with each memory reference may sound very expensive, but the frame table is usually stored in an associative memory, a special high speed memory that allows this search to be carried out in parallel, and as a result, it can be very fast!

When a valid physical address is found as a result of virtual address translation, the instruction execution cycle continues as it would for a simple machine with no virtual addressing; ideally, the memory management unit is so fast that the CPU operates at the same speed it would have had there been no virtual memory mechanism. If the virtual address is determined to be invalid, on the other hand, the execution of the current instruction must be terminated and the operating system must be informed. This is called a page fault.

The transfer of control from the user program to the system when a page fault is called a page fault trap. In general, traps are similar to interrupts. When either occurs, control is transferred from the program that was running to an appropriate service routine. With either, once the service routine is done, it may return to the program that was running. There are two primary differences between traps and interrupts. First, interrupts are caused asynchronous events such as the completion of an input output operation, while traps are a direct consequence of the execution of instructions on the processor. Second, if an interrupt is requested while the processor is in the middle of executing an instruction, the processor completes that instruction before it services the interrupt. On the other hand, if a trap is requested while the processor is in the middle of executing an instruction, the processor aborts that instruction and services the trap immediately; a return from the trap would therefore restart the instruction that caused the trap.

Demand Paging

If a system is to support demand paging, the invalid address trap service routine must update the page and frame tables after moving the desired page into main memory. After doing so, it must return to the program which caused the page fault in such a way that that program is able to continue the execution of the instruction which caused the page fault. The above description requires that the hardware possess two key properties: First, the service routine must be able to find the address referenced by the user that caused the page fault, and second, it must be possible to continue the execution of the user program after a return from the service routine. Not all central processor designs meet these criteria!

Ideally, the hardware should load the virtual address which caused a page fault into a special register or memory location so that the trap service routine has immediate access to it. Many virtual memory systems have been constructed with much less convenient hardware, for example, some provide only values of all of the user's registers at the time of the trap. On such systems, the trap service routine must simulate the execution of the instruction which caused the trap in order to determine which memory addresses were issued by that instruction, and then test each of them against the page table or against the frame table to see if it was the one that caused the page fault.

There are a number of ways that hardware can be built so that the execution of the user program can be continued after the cause of the page fault has been corrected. The approach most convenient for the software would have the CPU store all of its internal state information when a trap occurs; this would allow each instruction to be interrupted at any point and then resumed in mid execution. It is easier, from the hardware design point of view, to design instruction sets where the trap aborts the instruction, discarding any partial results and rolling the system state back to the state prior to that instruction. As long as the instructions have no side effects prior to the occurence of a page fault, this approach leads to both simple hardware and simple software.

On most architectures designed for the support of virtual memory, the instruction set is carefully designed so that the validity of all addresses issued by an instruction can be tested before the instruction has any side effects. Unfortunately, many architectures, including the 80x86 and Motorola 68000, were designed for small processors, and the original designers never imagined that they would be used in a virtual memory context. When the designers of the Digital Equipment Corporation PDP-11/45 encounterd this problem, they came up with an interesting solution. The PDP-11/45 was a segmented machine introduced in the mid 1970's, and many features of the Motorola 68000 instruction set were first developed on the PDP-11 family. On this machine, as on the 68000, many instructions could increment or decrement up to three registers, the program counter plus two index registers, prior to the time a segment fault was detected. On this machine, special auxiliary registers were added to the processor to record which registers in the CPU had been incremented and by how much since the start of the current instruction. When there was a fault, the fault service routine could use these registers to undo any side-effects of the instruction that caused the fault prior to restarting that instruction.

A significant number of machines have been constructed which have virtual address translation hardware but which do not have instruction sets which allow demand paging. On these machines, an invalid address trap cannot be interpreted as an indication of a page fault because there is no way to restart the instruction which caused the trap. The virtual address translation hardware is still useful for two reasons: First, although demand paging has been ruled out, the operating system may still allow programs to explicitly request that pages of their address space be moved in and out of main memory. Second, the use of virtual address translation can allow each application program to run in its own virtual address space, thus making unintended communication between user programs impossible and greatly enhancing system security. This is why Unix, MacOS X and Windows NT are so much more reliable than classic DOS, and earlier versions of Windows or MacOS; in those older systems, failures in an application could cause arbitrary damage to other applications or to the operating system itself.

An Example Page Fault Service Routine

In order to illustrate the construction of a page fault trap service routine, some assumptions must be made about the hardware. These involve such things as how virtual addresses are translated, what needs to be done to restart the instruction which caused the trap, and how disk input-output is done when the contents of a frame must be copied to or from disk. Additional assumptions must be made about how the software uses the disk for backing storage. These issues are discussed in the following paragraphs.

The example presented here will assume that the hardware for virtual address translation is based on a page table, as shown in Figure 13.5, where the page table is stored in main memory. We will assume that communication between the central processor and the memory management unit is through input-output interface registers using the inp and outp functions; this is typical of machines where the memory management unit is an optional device that was designed independently of the central processor. The interface to this memory management unit is given in Figure 13.7.

MMUADDR Memory Management Unit Address Register
This read-only register holds the virtual address that was used by the processor to cause the most recent trap. This is set by the memory management unit hardware when an invalid address is detected and remains constant until the next invalid address is detected.

MMUTABLE Memory Management Page Table Address Register
This write-only register holds the physical address of the page table in main memory. The memory management unit adds the page number field of each memory address to this (as a word displacement) in order to get the memory address of the page table entry for that page. Page table entries are 32 bits each, with the following format:
      _______________________________________________________________
     |_______________________________________|_____________________|_|
                         |                       unused, 11 bits    |
                       FRAME (20 bits)                            VALID
The virtual address and physical address handled by this memory management unit have identical formats:
      _______________________________________________________________
     |_______________________________________|_______________________|
                         |                               |
virtual address:       PAGE                            BYTE (in page)
physical address:      FRAME                           BYTE (in frame)
Figure 13.7. An example memory management unit interface.

The memory management unit interface described in Figure 13.7 raises an immediate question: If the virtual and physical addresses have identical 32-bit formats, what good is the memory management unit? To answer this, keep in mind that these are address formats. If you have a machine with a 32-bit physical address, this does not mean that you purchased 4 gigabytes of RAM. You may well have only 64 megabytes; this only requires a 26 bit physical address, so 6 bits of your physical address are actually unused.

This example will assume that all instructions are restartable; that is, that no instruction has any side effects until all virtual addresses issued by that instruction have been translated to physical addresses. As a result, the page fault service routine need not try to undo any side effects before returning to retry the instruction which caused the trap. It will be assumed that a simple return from trap will restart the instruction in question; that is, the program counter passed to the trap service routine points to the instruction which caused the trap, and has not yet been incremented.

As mentioned above, some assumption must be made about how virtual addresses are associated with disk addresses. The assumption which will be used here is that the virtual address space is mapped to a single disk file; as a result, each page of the virtual address space can be stored on one sector of that disk file, and the DISKREAD and DISKWRITE interface given in Figures 12.3 can be used for moving pages to and from disk. We will assume here that these operate using the physical address of the memory buffer to be read or written to disk.

One final issue must be addressed before a complete example can be presented: Where is the page fault trap service routine stored if all of memory is virtual? Certainly, the page fault trap service routine cannot be allowed to be copied out to disk! The most common solution to this is to introduce two modes of central processor operation, one mode in which virtual address translation is turned on, and one mode in which it is turned off so that programs may directly access main memory; in this case, user programs run with virtual address translation turned on, but system programs such as the page fault service routine, input-output drivers, and other service routines run with virtual address translation turned off. The occurance of a trap or interrupt, in this model, turns off the memory management unit, and the return-from-trap instructioin turns the memory management unit back on.

The data structures supporting the memory management unit given in Figure 13.7 can be described in a high level language as shown in Figure 13.8.

const bytesperpage = 4096 { a very common but not universal value };
      sectorsperpage = ... { depends on the disk sector size };
      maxpage = 1048575 { 2**20 - 1 because the page number is 20 bits };
      maxframe = 1048575 { 2**20 - 1 because the frame number is 20 bits };
      minframe = ... ; { frames below minframe belong to the system };
      topframe = ... ; { frames above topframe have no physical memory };
      MMUTABLE = ... ;
      MMUADDR = ... ;

type pagenumber = 0..maxpage;
     framenumber = 0..maxframe;

var  pagetable: array [pagenumber] of packed record
          frame: framenumber { what frame is this page in };
          unused: 0..2047 { 2**11 - 1 because there are 11 unused bits }; 
          valid: boolean { true if page in main memory };
     end;

     frametable: array [framenumber] of packed record
          page: pagenumber { what page is in this frame };
          used: boolean { true if a page is in this frame };
     end;

     pagefile: diskvarpointer;

procedure mmuinit;
var  p: pagenumber;
     f: framenumber;
begin
     for p = 0 to maxpage do pagetable[p].valid = false;
     for f = minframe to topframe do frametable[f].used = false;
     outp( MMUTABLE, address( pagetable ) );
end;

Figure 13.8. Data structures for page fault management, in Pascal.

In the Pascal programming language, the keyword packed indicates that the fields of the following type should be stored efficiently, for example, by squeezing them into a single word as multiple fields. Thus each entry in an object of type pagetable has the format of a page table entry given in Figure 13.7.

The initialization routine given in Figure 13.8 sets every entry in the page table to invalid, indicating that no pages of the virtual address space are currently in main memory, and sets every usable entry in the frame table to indicate that the frame is currently unused. Note that the frame table entries below minframe will never be used by the virtual memory algorithm because they are reserved for the operating system, and frame table entries above topframe will never be used because no physical memory has been purchased to fill those addresses. Finally, the initialization sets the memory management unit's page table base register to the address of the page table data. The nonstandard (but common) Pascal function address returns a pointer to the data object that was given as a parameter.

The frame table is, in a sense, redundant, since it can be reconstructed from the page table, but as we will see, it is very convenient to keep both data structures. In a real system, the value of minframe is frequently obtained from the loader, for example, using codetop, the address of the highest location the loader has loaded, as used in Chapter 8. Many systems computer maxframe by running a memory test program to empirically search through the main memory looking at each page to see if it works. A more flexible implementation should also compute sectors per page by examining the sector size of the disk being used instead of using a compile-time constant.

The page fault trap service routine that operates with the data structures from Figure 13.8 will typically be structured as shown in Figure 13.9:

procedure pagefaultservice;
var
     trapaddress: integer;
     page: pagenumber;
     frame: framenumber;
     done: integer;
begin
     { first, get the address of the instruction that caused the trap }
     trapaddress = inp(MMUADDR);
     page := trapaddress div bytesperpage;

     { second, find a free frame }
     frame := findframe;

     { third, adjust the page table and frame table }
     pagetable[ page ].frame := frame;
     pagetable[ page ].valid := true;
     frametable[ frame ].page := page;
     frametable[ frame ].used := true;

     { copy the page from disk }
     DISKREAD(
          pagefile,
          makepointer(frame * bytesperpage), bytesperpage,
          (page * sectorsperpage), done
     );
     waitdiskdone( done );

     { return to the user }
end { pagefaultservice };

Figure 13.9. A page fault service routine.

The findframe function in the code in Figure 13.9 searches the frame table for an idle frame; if such a frame is unavailable, it forces some page out to disk in order to create an idle frame. This code assumes that the makepointer function takes an integer representing a memory address and converts it to a pointer to that address, overriding the strict type checking of Pascal.

Page Replacement Policies

The findframe function used in the page fault service routine shown in Figure 13.9 may occasionally find an idle frame, particularly when the system has just started, but it will usually have to force some other page out of main memory in order to make an idle frame. The particular policy used to decide which page to force out is called the page replacement policy. The selection of an appropriate page replacement policy can have a tremendous impact on the performance of a virtual memory system. For example, if the page which is forced out is one of the pages referenced by the instruction which caused the fault, there will be an immediate page fault when that instruction is restarted.

The optimal page replacement policy, developed by Laszlo Belady in the late 1960's, is to order the pages according to when they will be needed in the future, and replace that page which will be needed at the most distant future time. Unfortunately, Belady's optimal page replacement policy cannot be implemented without knowing the complete future history of the program which caused the fault. Since the only way to accurately predict the future is to wait until it is the present, the only way to compute the future history of a program is to run it! As a result, the optimum sequence of page replacements can only be determined in retrospect.

The fact that there is an optimal but impractical policy is useful because it allows more realistic policies to be judged in terms of how close to the optimum they come. We can run a program and collect the sequence of virtual addresses it references, compute how long the program would have taken to run using the optimal page replacement policy, and then compare this with actual run times under more practical policies.

An eminently practical but somewhat stupid page replacement policy is to simply choose the page to be replaced at random. The random replacement policy makes no use of any knowledge of program performance, and can therefore be thought of as the worst policy that can be used without resorting to deliberately malicious policies. As such, the random policy also provides a useful benchmark for the judgement of other policies. Experimentally, random replacement has been shown to result in roughly three times as many page faults as Belady's optimal policy over a reasonable range of main memory sizes.

The first-in first-out page replacement policy is relatively easy to implement and has some intuitive appeal. In this policy, the page which has been in main memory the longest is selected for replacement. The intuitive appeal of this is that it seems fair, allowing all pages to stay in main memory for about the same amount of time, assuming page faults come at an approximately constant rate. Unfortunately, empirical tests of this policy have shown that it is no better than random replacement. Apparently, the fact that a page has been in memory a long time is not a useful predictor of the future need for that page.

The least recently used page replacement policy is much better, being perhaps the best policy which does not rely on actual prediction of future program behavior. This policy is based on the observation that most programs exhibit significant temporal locality. That is, if a page has been recently used, it is highly likely that it will be used again, and if it has been a long time since a page has been used, it is unlikely that it will be needed again. Least recently used replacement requires that the hardware record, for each reference to a page, the time of that reference (or something equivalent to the time). When it comes time to replace a page, a search of all page frames is conducted in order to find the frame holding the least recently used page.

Although systems have been built which used least-recently-used replacemen, the hardware required to record the time of last use is expensive, so most practical systems use some approximation. Empirical results suggest that least recently used replacement leads to no more than twice the number of page faults which would result from Belady's optimal policy over a wide range of practical memory sizes.

The clock replacement policy is a widely used example of an approximation of the least recently used policy. In this case, hardware is required to record references to each page, but only one bit is needed per frame, the mark bit. When a frame is used, the memory management unit sets the marked bit for that frame in the page table; it is up to software to reset this bit. If the software periodically resets the bit, then those pages with the marked bit set were used more recently than those with the bit reset. As a result, it is easy to select one of the less recently used pages for replacement, but the particular page which is least recently used cannot be determined.

There is a problem with the clock policy, as given in the previous paragraph. It requires the memory management unit to know about the frame table! If the only table known to the memory management unit is the page table, as in our example, we can put the mark bit in the page table entry. Then, instead of looking at frametable[i].mark, we will look the mark bit of the page table entry that references entry i of the frame table. This will be pagetable[frametable[i].page].mark.

The name of the clock policy comes from a natural pictoral representation of the scheme used to reset the mark bit and search for a replacible page. As shown in Figure 13.10, the clock policy is frequently described in terms of a circular arrangement of page frames with a 'clock hand' representing the search around the circle of frames looking for the page to replace.

             12          frame numbers
       11        _  1
                / |
    10         / /    (2)
              / /<------------ clock hand
   9          o/       (3)     (moves clockwise)

    (8)                4

        7          (5)         Marked frames are
              6                  parenthesized

Figure 13.10. An abstract view of clock replacement on 12 frames.

When a page fault occurs, the clock hand sweeps around the list of frames until it encounters a frame that is not marked. This is one of the less recently used pages, so it is the one selected for replacement. As the clock hand passes over frames during its search for an unmarked page, it erases the marks on the frames it skips. Given the initial state shown in Figure 13.10, after one page fault, the clock hand will point to frame number 4 and the marks on frames numbered 2 and 3 will be reset.

The rate at which the clock algorithm resets mark bits depends on the frequency of page faults, since the clock hand only moves when there is a page fault. If there are few faults, the clock hand moves slowly and pages must remain unreferenced for a long time before they are considered to be candidates for replacement. If there are many faults, the clock hand moves quickly and pages need only remain unreferenced for a short time before they are considered for replacement. Empirically, the clock policy performs only slightly worse than the basic least recently used policy, so it is an excellent choice in real systems. Code to implement the clock policy on the example memory management unit from figure 13.7 (augmented with a mark bit) is shown in Figure 13.11.

var  pagetable: array [pagenumber] of packed record
          frame: framenumber { what frame is this page in };
          unused: 0..1023 { 2**10 - 1 because there are 10 unused bits }; 
          valid: boolean { true if page in main memory };
          mark: boolean { set by hardware if page used };
     end;

     clockhand: framenumber { the clock hand, initialize to minframe };

function findframe: framenumber;
begin
     { rotate clock hand to candidate frame }
     clockhand := clockhand + 1;
     if clockhand > topframe then clockhand := minframe;
     while frametable[ clockhand ].used
       and pagetable[ frametable[ clockhand ].page ].mark
     do begin
          pagetable[ frametable[ clockhand ].page ].mark := false;
          clockhand := clockhand + 1;
          if clockhand > topframe then clockhand := minframe;
     end { while };

     { hand now points to the frame which will be used }
     if frametable[ clockhand ].used then begin
          DISKWRITE(
               pagefile,
               makepointer( frametable[ clockhand ].page * bytesperpage ),
               bytesperpage, (page * sectorsperpage), done
          );
          pagetable[ frametable[ clockhand ].valid := false;
          frametable[ clockhand ].used := false;
     end;

     { return the frame }
     findframe := clockhand;
end { findframe };

Figure 13.11. Code for the clock page replacement policy.

Note that the declaration of pagetable in Figure 13.11 is a modification of that given in Figure 13.8, and that the findframe function shown here is called from Figure 13.9. This implementation of the clock policy is typical of modern virtual memory implementations.

The Working Set Concept

Empirical analysis of the behavior of programs in a virtual memory environment has shown that most programs have, at any time, what is called a working set of pages. Pages in the working set are frequently referenced, while pages outside the working set are not referenced at all. Intuitively, the working set holds the currently relevant procedures, the local data for those procedures, and any global data structures being manipulated by those procedures. Over time, the working set of a program may change, either gradually or abruptly. For example, the working set of a program which is slowly computing its way through an array will gradually change as the program advances from one page to the next of the array, but when the program finishes the computation there is likely to be an abrupt change in the working set as the program begins the next phase of its computation, perhaps something that has nothing to do with that array.

If the number of frames available to a program is smaller than the working set of that program, there will be a very high number of page faults and not much computation will be done. When this happens, the system is said to be thrashing. Where thrashing is a problem, it is quite common for immense performance improvements to result from the addition of very small amounts of main memory. If the number of frames available to a program is larger than its working set, the number of page faults depends primarily on the rate at which the working set changes and has little to do with the number of available frames. The number of page faults only falls to zero when the entire memory used by the program fits in main memory. In general, adding more memory to a system which is not thrashing will not have much impact on performance, and the price/performance ratio for a virtual memory system is typically minimized by buying just enough memory to avoid thrashing, but no more.

It should be noted that when a program is thrashing, some computation is still being done; the program is making progress, but memory behaves as if it is much slower than main memory (but never slower than the disk being used as backing store). If the number of frames is below a certain threshold, however, no progress can be made at all! This absolute lower limit on the number of frames is determined by the number of different pages that a single instruction can reference. Traditional single operand machine instructions and modern RISC machine instructions require at least two page frames, one to hold the page from which the instruction is fetched, and one to hold the page to or from which the operand is transferred.

For complex instruction set machines, where instructions may span word boundaries and one instruction may address n distinct memory addresses, each of which may span a word boundary, single instructions may reference 2(n+1) pages. On the old DEC PDP-11 or the very similar Motorola 680x0 families, this means one instruction can touch as many as 10 pages; in both cases, the worst case is an indirect memory to memory move where the instruction itself, each of the indirect pointers, and each of the operands span page boundaries. Machines like the Intel Pentium may require even larger minimum numbers of frames for certain rarely used instructions such as memory to memory block transfer.

The graph shown in Figure 13.12 summarizes the above observations about the performance of a virtual memory system as a function of the number of pages available for a typical program.

             |     infinite
             |      |
 number of   |      |
page faults  |       \
to complete  |        \_
  program    |          \__
             |             \_______
             |                     ---------_________
             +------+-------+------------------------+-
                    a       b
                   number of available frames

       a = maximum number of pages referenced by some instruction
       b = approximate working set size
       c = total memory used by program

Figure 13.12. Performance of a virtual memory system.

The concept of the working set of a program is strongly related to the concept of locality of reference. Programs with a great degree of locality have well defined working sets, while those which exhibit poor locality have no well defined working sets. An ideal page replacement policy should identify the working set of a program and keep that in main memory.

One approach to actually using the concept of working set in designing a page replacement policy is to (somewhat arbitrarily) decide that any page referenced fewer than t time units before a page fault is in the working set, while any page referenced before that is a candidate for replacement. As with pure LRU, this would require that there be a record of the time of reference of each page, but it can be approximated using the same hardware used for clock replacement. For example, if a 'frame tracking' procedure is run every t/a this can manage an age field associated with each frame. The frame tracking procedure can do this by inspecting and clearing the used field on each frame. If the frame was used, the tracking procedure sets the age field to zero; if not, it increments the age field. When a page fault occurs, any frame with an age greater than a can be considered for replacement.

The working set replacement policy described above is more complex than the clock policy, and if only a fixed number of frames are available, it serves no useful purpose on a single user system. On multiple user systems, however, the simple clock policy favors programs which have large well defined working sets while it takes page frames away from programs with ill defined or small working sets. A policy which, in effect, allocates frames to programs in proportion to the sizes of their observed working sets can lead to much better response times, especially for optimally configured systems, which (as noted previously) are on the verge of thrashing, because if they weren't, it would be fair to say that someone spent too much money on main memory.

Enhancing the Performance of Virtual Memory

The virtual memory page replacement policies outlined in the previous sections will sometimes copy pages back from main memory to disk even though the contents of those pages have not changed since they were last copied into main memory. This is clearly an unnecessary cost, and many virtual memory systems avoid it by using additional hardware to record, for each page, whether or not it has been modified since it was last copied into main memory. This takes only one additional bit per frame table entry or per page table entry (depending on which is used by the hardware). A page which has been modified is usually referred to as a dirty page, a page which has not been modified is usually referred to as a clean page, and the bit which records the difference is referred to as the clean/dirty bit.

The set of frames can be divided into five disjoint subsets, those which are ...

clean and recently used,
dirty and recently used,
clean and old,
dirty and old, and
unused.

Clearly, when there is a page fault, the unused frames should be used before any pages are replaced, and the clean old frames should be used before waiting for the disk I/O to complete on a dirty page that is being cleaned. Some systems exploit this by a modification to the clock replacement policy where the sweep of the clock hand past a dirty and old frame schedules it for cleaning (copying to disk), and the sweep of the clock hand only stops when it finds a clean old frame.

This can allow one page fault to schedule multiple write operations as well as one read operation, allowing a disk scheduler to do useful work, and there is no need to wait for the write operations scheduled on a page to finish until the decision is made to re-use that page, allowing writes to overlap with user computation. Of course, if a write is started to clean a page, and then the user dirties that page again, the dirtying makes the write irrelevent when it does finish. This policy can be described as using anticipatory page cleaning, since it tries to clean pages in advance of the need to use them in page replacement.

On systems where the backing storage is able to transfer multiple consecutive pages in a time only slightly longer than the time to transfer one page, anticipatory paging becomes a practical option. In the common demand paging policy, only the page which was actually referenced by the page fault is fetched. Anticipatory paging is based on the observation that a reference to a particular page is frequently a good sign that the next page in sequence will be referenced in the near future. This is especially true if the page size is relatively small. When anticipatory paging is done, the page which has been brought into memory in anticipation of future use may indeed not be needed, so it is a prime candidate for replacement. This is why this policy would be inappropriate if a new disk seek were required for each page fetched, but quite appropriate if pages were stored in consecutive sectors, so there is no latency penalty for anticipatory page transfers.

The above approaches to anticipating the behavior of programs are automatic and take no advantage of advice the programmer can provide about the future behavior of programs. Some virtual memory systems provide special system services by which the programmer can provide advice to the system about the future behavior of the program. For example, the system might provide a service which allows a program to initiate anticipatory fetches for pages which, in the programmers estimate, will be needed in the near future. Another possible advisory service might allow the program to tell the system that some page, although dirty, will not be used again, or that some recently used page is a good candidate for replacement. These services can be used to speed the transition of a program from one working set to another, but will typically be of little use when a program is in its steady state.

Some systems allow user programs to give advice by two services, one which indicates that the user program expects to exhibit poor locality, and the other indicating that good locality is expected. Although this kind of advice is quite vague, it can be quite useful on systems which try to make extensive use of locality. For example, the anticipatory policies mentioned here will improve performance for programs exhibiting good locality, but they will probably actually lead to performance degradation for programs with more nearly random behavior. In many cases, it is possible for the programmer to predict the points during the execution of a program where that program will change from one kind of behavior to the other, and advise the system of this so that the system can choose the more appropriate policies.

In any case, most users will not modify their programs to give the system advice about the expected behavior of their code, so these advisory services will not be commonly used. Nonetheless, on many systems, there are a few heavily used programs where slight improvements can have a significant impact on user satisfaction; for example, database managers or text editors. If a system provides advisory services such as are described here, such programs can frequently make good use of them.

Other Uses of Virtual Memory

Although the terms virtual memory and demand paging are considered to be almost synonymous by many people, virtual memory can be used to serve other purposes. As already noted, virtual address translation hardware can be viewed as performing a run-time relocation function. This is particularly useful on multi-user systems where additional security is gained by giving each user a different virtual address space.

The presence of virtual address translation hardware which uses data structures describing the address space of user programs provides an opportunity to further enhance security and provide for a better debugging environment by marking each page in the address space with access rights information. Thus, pages or segments containing code can be marked as executable, those containing constants can be marked as read-only, and those containing variables can be marked as read-write, with the memory management unit being given the job of enforcing the rules implied by these markings. Most modern virtual memory hardware enforces such restrictions, and most operating systems on such machines will mark pages appropriately; as a result, errors such as attempting to change the value of a constant or execute a variable are detected immediately and prevented by the hardware.

If each program running on a multi-user system is given its own virtual address space, as defined by a page table, it is natural to assume that the actual pages referenced by different programs will be different. In fact, this need not be the case! It is quite possible to have two different page table entries which refer to the same actual page; as a result, it is possible for one page to be shared by two or more different virtual address spaces. This provides for the possibility of tightly controlled sharing of information between programs on such a system.

Some virtual memory systems allow programs to request that a file be used as backing storage for some range of pages in their address space. This allows programs to treat a file which has been inserted in their virtual address space as a large array; any access to that array is equivalent to input-output on that file. This is sometimes called virtual file input-output. On the Multics operating system, where the address space consists of many segments, each of which consists of many pages, the only way to read or write a disk file was to insert that file in the virtual address space as a segment. To load and run a program on such a system, all that must be done is to map the object file into the virtual address space and then jump to the starting address! Both Microsoft's Windows system and the Unix system now provide file services based on this idea.

Large Page Tables and Paged-Segmented Virtual Addresses

The simple paged memory model given above is not widely used because the page table is large and unwieldy. If a virtual address is 32 bits and the byte-in-page field of the address is 12 bits, this means that the page-number field of the address is 20 bits, implying over a million entries in the page table. If each page table entry is itself 32 bits, this implies that the page table is 4 megabytes! This is larger than the memory required by many applications, and it is the reason that most modern virtual memory systems combine paging and segmenting!

The virtual address structure shown in Figure 13.13 is typical of modern 32-bit architectures:

      _______________________________________________________________
     |___________________|___________________|_______________________|
               |                   |                     |
        SEGMENT (10 bits)     PAGE (10 bits)        BYTE (12 bits)
                               in segment            in page

Figure 13.13. A Paged Segmented Virtual Address.

Typically, if this format of virtual address is used, the memory management table register (MMUTABLE in Figure 13.7) refers to the segment table of the current virtual address. The SEGMENT field of the virtual address is used as an index into the segment table to find the description of one page table. If that entry in the segment table is valid, the FRAME field of the entry identifies the page frame that holds the page table for one segment. The page-in-segment field of the virtual address PAGE is an index into this page table, used to select the page table entry, and if this is valid, it refers to the page frame holding the desired page.

The computations involved in a paged-segmented address translation can be described in the same graphical style as was used in Figure 13.5. The result is shown in Figure 13.14

      virtual address          -- from CPU
 _____________________________________________
|_____________|_______________|_______________|
segment number page in segment address in page
      |               |               |____________________________
      |               |____________                                |
      |     __________________     |                               |
      |    |_____|____________|    |                               |
      |    |_____|__         _|    |                               |
      |    |_____|__ segment _|    |                               |
 index --->|_____|__  table  _|    |                               |
           |_____|____________|    |                               |
           |_____|____________|    |                               |
            valid     frame        |                               |
              |    holding page    |     __________________        |
              |       table        |    |_____|____________|       |
              |                    |    |_____|__        __|       |
              |                    |    |_____|__  page  __|       |
              |               index --->|_____|__  table __|       |
              |                         |_____|____________|       |
              | segment-fault           |_____|____________|       |
              | request                  valid    frame            |
       <------                             |   holding page        |
To CPU <-----------------------------------          |             |
                page-fault                           |             |
                request                       _______|_____________|_______
                                             |_____________|_______________|
                                              frame number  address in page

Figure 13.14. Paged Segmented Virtual Address Translation.

The scheme described in figure 13.14 sounds like it raises the cost of one reference to memory by the application program to 3 physical memory references, one for the segment table entry, one for the page table entry, and one for the operand itself, but modern memory management unit hardware always uses some form of cache or associative memory technology, within the memory management unit, to largely eliminate the cost of these references.

If we assume the dimensions given in Figure 13.7 as modified in Figure 13.13, it is easy to see that each page is 4096 bytes and that the address space consists of up to 1024 segments, where each segment contains up to 1024 pages. Because of this, the page tables and the segment table are all the same size, and with 4 bytes per entry, each of them exactly fills one page frame. The popularity of this particular virtual address format follows directly from this!

With this scheme, the page tables for segments can be moved to and from disk on demand. If a program references a segment that is currently marked as being invalid, a segment fault will occur; the page fault service routine should bring the indicated segment table into main memory when this occurs. Segment tables should only be moved to disk when all of the pages of the corresponding segment have already been moved to disk. If the LRU replacement algorithm is used, we can assure this easily by considering each use of a virtual address to be the use not only of a particular page but also the use of the page table for the segment holding that page. Thus, after any particular memory reference, two page frames will be marked as being most recently used. Extending this logic to the clock algorithm is not difficult.

Implications for Direct Memory Access Input Output

In the description given in this chapter, we have assumed that the DISKREAD and DISKWRITE interface to the disk's input-output driver uses physical memory addresses and not virtual memory addresses. This is the natural assumption when reading and writing pages in the page-fault handler, and it is natural to design direct-memory-access input-output hardware in terms of physical addresses, but when a user requests a read or a write to a disk file, the user will naturally give the virtual address of the user's buffer and not its physical address!

Therefore, the user interface to the input-output system must translate the virtual address of the user's buffer to the physical address before passing that address to the direct-memory-access hardware. This translation must typically be done by software, since only a few machines have been built where the direct-memory-access interface uses the memory management unit to communicate with main memory.

Software to translate virtual addresses to physical addresses is straightforward. Given the declarations in Figure 13.8, for example, the code given in Figure 13.15 will do this

function makephysical( virtualaddress: va );
int page: pagenumber;
    pa: physicaladdress;
begin
     page :=  va.page
     if pagetable[page].valid then begin { valid and translatable }
         pa.frame = pagetable[page].frame;
         pa.byte  = va.byte;
     end else begin { in hardware, this would be a page fault }
         pa := nil;
     end;
     makephysical := pa;
end;

Figure 13.15. Software to convert virtual addresses to physical addresses

There is one important problem with this! The code shown here must have access to physical memory, since the page table itself is not typically addressible in the user's virtual memory! For that matter, interrupt service routines and the queues used to communicate with them are also typically outside the virtual address space of the user, so we must have some way for the user to turn off the memory management unit whenever communicating with an interrupt service routine or direct-memory-access device.

As mentioned in the initial description of the memory management unit, in most systems, things can be set up so that any interrupt or trap turns off the memory management unit, and the instruction or instruction sequence used to return from interrupt or trap service routines is always able to restore the system to its original state, including turning the memory management unit back on.

Therefore, on most systems that use virtual memory, all calls to input-output services are done using an instruction that forces a trap, and most architectures have a selection of instructions that are reserved for this purpose. On the Intel 80x86, for example, the family of int instructions are used for this purpose. The int instruction has an 8-bit argument that specifies which of the 256 interrupt service routines in the interrupt vector is to service the trap. Many of these are assigned to hardware devices, but most are available for interface to the operating system. Typically, on machines with this type of structure, each system function is written as a trap service routine, and calls to system functions are written using the appropriate instruction to force that trap. Passing parameters to system function can be a bit messy in this model, but there is always a way to do it.

When dealing with direct memory access devices in a virtual memory environment, there is another problem. While the user's input-output buffer is generally allocated, in the user's address space, as a contiguous block of memory, the buffer might not be contiguous in physical memory because the different pages of the buffer may be mapped to arbitrarily scattered page frames, and some pages of the user's buffer might even be currently on disk at the time the user requests a read or write.

Therefore, virtual memory operating systems must typically inspect the user's buffer prior to a direct-memory-access operation, possibly forcing several software initiated page faults to bring the pages of the buffer into memory, and possibly rearranging page frames in main memory in order to make the physical pages of the buffer contiguous. Alternatively, all direct-memory-access input-output can be done to buffers in the system space, with blocking and deblocking code used to copy to and from the user's buffer. In the latter case, it is still possible to optimize in the event that the user's buffer is entirely in one page, and because of this, there is some value in taking care to start each buffer on a page boundary if you know your code will run in a virtual memory environment and you know that the device you are referencing is likely to be a direct-memory-access device and you know that your transfer matches the sector or block structure of the device. The list of ifs in the previous sentence explains why most users don't bother, but the performance gain by avoiding unneeded memory-to-memory copying can be significant.

The Limits of Virtual Memory

It is tempting to think that demand paged virtual memory eliminates the need to consider memory limitations in the design of applications programs, but this is not the case. Different algorithms and different top-level designs for large programs can result in vastly different working set sizes, different page fault rates, and as a result, significant differences in performance. Since optimal systems have just enough memory to avoid thrashing, careless program design which leads to large working sets can lead to very bad performance.

As an example, consider the problem of writing a compiler. Before virtual memory was available, most compilers were organized as a series of passes, with intermediate files to hold the data produced by one pass before it is read by the next. For example, the lexical analyzer might produce a file of lexemes for consumption by the syntax analyzer. The availability of large address spaces (whether virtual or not) encourages an alternate design, in which the different passes are combined and run concurrently, with data flowing between them through procedure calls. For example, the syntax analyzer might call the lexical analyzer each time it needs a lexeme.

With the multipass approach mentioned above, the working sets of each pass can be quite small and an abrupt working set change occurs at the end of each pass. On the other hand, there is a large amount of explicit input-output used to communicate between passes. With the single pass approach, there is no input-output needed to communicate between passes, but the working set of the program is the combined working sets of all of the passes! It is not hard to estimate typical numbers for the simple example of lexical analysis: A typical lexeme description might occupy only a few words, so many lexemes can be moved per disk access to the intermediate file used in the multipass approach. A typical lexical analyzer will have a working set consisting of a few pages of memory (for the code) plus perhaps one page for the stack frame and a few pages for recently referenced symbols in the symbol table. Thus, it is possible that a single call to the lexical analyzer might involve as many as ten page faults, while processing the same lexeme in a multipass compiler would require only a fraction of one disk access.

As another example, consider the problem of locality in large linked lists or binary trees. If such structures grow and shrink during an extended computation, successive list elements or successive nodes visited during a traversal of the tree may be located randomly throughout memory. As a result, traversing the list or tree will have very poor locality!

In the late 1960's, this problem began to have a very serious impact on many list processing applications, particularly in the field of artificial intelligence, where huge list and tree structures are central to almost all of the computations being done. This led the developers of languages that are heavily used in artificial intelligence research to develop new list management strategies for those languages.

These new strategies did not require users to change their programming habits. What they did was to change the allocation function used to add an element to a list so that that element was preferentially stored in the smame page as the pointer to the element. As a result, traversing a list or tree tended to visit clumps of list elements or tree nodes in each page before using a new page, and the result was frequently a very noticable improvement in run-time.

Terminology

The reader should be familiar with the following general terms after reading this section.

pages                           segments
demand paging                   anticipatory paging
page replacement policy         virtual address translation
primary memory                  secondary memory
main memory                     backing storage
frames                          page faults
page tables                     frame tables
address translation traps       instruction restart
random page replacement         Belady's optimal replacement policy
first-in first-out replacement  least recently used replacement
clock replacement               working set replacement
locality of reference           thrashing
clean pages                     dirty pages
separate address spaces         shared pages
read-only pages                 virtual file input-output

Exercises

Why is there no such thing as a segment frame?
What problem would have to be faced by the segment fault trap service routine which is not faced by a page fault trap service routine?
Outline how you might use a policy such as the clock policy for replacing segments in main memory in response to a segment fault.
How would the code shown in Figure 13.9 change if the hardware knew about the frame table and not the a page table? Give the changes!
Assume the paged-segmented model given in Figures 13.13 and 13.14. Rewrite the code from Figure 13.9 to operate in this more complex domain. Assume that page faults and segment faults use the same service routine, and that the code of that routine must check a bit in the MMUSTATUS device register in order to determine whether it was a page fault or segment fault.
How would the code shown in Figure 13.11 change if the hardware used a frame table instead of a page table? Give the changes!
Modify the code and data structures shown in Figure 13.11 to use a dirty bit to avoid copying pages back to disk when they are clean. Assume that the hardware can be modified to set the appropriate dirty whenever there is a write to a particular page.
Rewrite the code in Figure 13.15 to use the paged-segmented model of Figures 13.13 and 13.14.
Consider the diskfilereadblock routine given in Figure 12.10; this works under the virtual memory model of this chapter only if you assume that the buffer pointer is a physical address. Write an intermediate layer of code, userdiskfilereadblock, that has the same interface specification as the original, except that the buffer address is a virtual address. This should translate virtual to physical addresses, and it should call diskfilereadblock for each disjoint piece of the user's buffer.
Assuming a machine with one accumulator and a memory which is addressed in units of words, where the page size is moderately large, how many different page faults could be caused by an attempt to fetch and execute one instruction, if that instruction is one of the following:
a) LOAD X -- a single word instruction, including the operand address?
b) MOVE A,B -- a 3 word instruction (opcode and 2 operand address words)?
c) ADDF A,B,C -- a 4 word instruction (opcode, 3 operand addresses), where each operand is two words long?
The working set policy requires that the parameter t be given a value.
a) Outline an experiment you could perform to determine the best value for t on a particular system.
b) Could the page fault service routine dynamically adjust t so that it is automatically optimized?
Consider the following program involving the procedures "a", "b", "c", "d" and "e".
```
   a { known to take about .03 seconds };
   b { known to take about 5 seconds };
   c { known to take about .01 seconds };
   d { known to take about .01 seconds };
   e { known to take about 10 seconds };
```
The times quoted in the comments above are the run times of each procedure when there are no page faults. If you know that the average latency of the disk system is about .02 seconds, and you know that the average lifetime of an unreferenced page in main memory before it is replaced is about 3 seconds,
a) estimate the run-time of the above code assuming that it takes one page fault to bring each procedure into memory.
b) for which of the above procedures would it be profitable to initiate anticipatory paging or anticipatory replacement? Modify the above code by including calls to "anticipate(x)" and "replace(x)" where these would lead to a maximum performance improvement.
c) where should calls be inserted in the above code to indicate that it has good locality, and where should the system be told it has bad locality? Rewrite the code with additional calls to "good" and "bad" for this purpose.
If 99% of the memory cycles on a system run at main memory speed, while 1% of the cycles involve page faults, and if main memory takes 1 microsecond per access but it takes 10 milliseconds for a disk access, what is the average memory speed? Assume that there is always a good supply of clean pages for page replacement.
The division of a 32 bit virtual address into 10 bits of segment number, 10 bits of page number, and 12 bits of byte number works out very nicely if each page table and segment table entry is 32 bits. What is the perfect size for a page table or segment table entry if the 32-bit virtual address is broken on a 9/9/14 pattern?

Readings

There is a good discussion of page replacement policies in Section 5.2 of

The Logical Design of Operating Systems by Alan C. Shaw's (Prentice Hall, 1974)

A brief introduction to virtual memory is included in Section 2.3.3 of

Operating System Elements by Peter Calingaert's (Prentice Hall, 1982)

The classic survey paper on virtual memory is still one of the best technical surveys of the subject.

"Virtual Memory," by Peter Denning in Computing Surveys, volume 2, number 2 (Sept. 1970) pages 153-189.

This has been reprinted in Sections 6.2 and 6.3 of

Software Systems Principles by Peter Freeman (SRA, 1975)

It is interesting to look back on the first system which used demand paging. This was the ATLAS system designed at Manchester University in England. A good source is

"The ATLAS Supervisor" by Kilburn and Payne in the Proceedings of the 1961 Eastern Joint Computer Conference, AFIPS Conference Proceedings Volume 20, pages 279 to 294,

This was reprinted on pages 661-682 of

Programming Systems and Languages, by Rosen, (McGraw Hill, 1967).

It should also be noted that, although the bulk of the research on page replacement policies was done in the late 1960's and early 1970's, work continues. A good example is

"Converting a Swap-Based System to do Paging in an Architecture Lacking Page-Referenced Bits," by Babaoglu and Joy, in Proceedings of the 8th Symposium on Operating System Principles (ACM, 1979) pages 78 to 86.

The title of this paper describes much of what it covers, but the system described there also does a large amount of anticipatory paging, and the system described is Berkeley Unix, BSD 4.2, an important system in the history of Unix.

Another good example of recent work is

"WSClock - A Simple and Effective Algorithm for Virtual Memory Management," by Carr and Hennessy, in Proceedings of the 8th Symposium on Operating System Principles (ACM, 1979) pages 87-95.

This paper describes a simple combination of the working set page replacement policy with the clock policy.