22C:116, Lecture Notes, Feb. 8, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Page Tables and Frame Tables

    Note that the page table and the frame table are two different things:

    The Page Table describes the state of each page in virtual memory. There is one entry in the page table for every addressable page in the virtual address space.

    The Frame Table describes the state of each page frame in main memory. There is one entry for each page frame in the physical address space.

    Each page table entry typically has a field that describes where that page is currently stored -- for pages currently not in main memory, this would typically be the disk address where the page is stored; for pages in main memory, this would be the page frame holding the page.

    To support page replacement algorithms such as the clock algorithm, each frame table entry must hold, at the very least, a pointer to the corresponding page table entry.

  2. Address translation problems

    The easiest to understand virtual to physical address translation hardware holds the entire page table in the memory management unit:

                      Word in page
       Virt addr  |----------------------->| Phys Addr
       ---------->|  Page   ______  Frame  |--------->
                  |---     |      |   ---->|
                      |    | Page |  |
                      |    |Table |  |
                       --->|      |  |
            address        |______|  |
            invalid         |  |     |
          <-----------------    -----
    
    For large page tables, this is not implementable. For example, on the Intel 80386, the virtual address is 32 bits, of which 12 bits specify byte-in-page. Thus, the page table must contain a million entries. The entries themselves turn out to be 32 bits each, so this approach to implementing the memory management unit would require dedicating 4 megabytes of special purpose memory to page table management.

    The original ATLAS virtual memory system used an alternative approach to translation. The address translation hardware held a copy of the frame table. When a virtual address was presented to the hardware, it compared the desired page number with the page number stored in each frame table entry. If there was no match, a page fault was reported. If there was a match, the index of the frame table entry holding the match was used as the frame number. This comparison was done using fast parallel hardware, in what we would now call an associative memory.

    The atlas scheme only works if the number of frames is itself relatively small. On a modern machine with many megabytes of memory, the number of page frames may itself grow into the tens of thousands, so neither the entire frame table nor the entire page table can be held in the memory management unit registers.

    The usual solution to this problem is to hold the page table in main memory, and only bring those entries that reference recently used pages into the registers in the memory management unit. In this case, we say that the memory management unit contains a virtual address translation cache or address translation lookaside buffer. From the viewpoint of system software, we view the address translation process as if it was done through an address translation table in main memory.

  3. The problem of large page tables

    A typical modern machine, whether a DEC VAX, an Intel 80586, or an IBM RS6000 has a virtual address space of 4 gigabytes (2 to the 32 bytes), but it only has 4 to 16 megabytes of main memory. The typical virtual address map needed to describe such an address space contains over 1 million map entries, and requires on the order of 4 megabytes to store. Users with 4 megabytes of main memory would be quite unhappy if the system required the entire memory just to store the address map!

    The solution is to break up the virtual address map. In effect, the map itself is stored in virtual memory, and only those pages of the address map that contain map entries for recently used pages occupy page frames in main memory at any given time. Each page of the virtual address map is called a segment of the map.

    When such a system is discussed, the virtual address is typically broken up as follows:

     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
    |                   |                   |
           segment               page             word in page
    
    The segment field specifies what segment of the address map contains the desired map entry, while the page field specifies what entry in that segment of the map should be used.

    Assuming that the virtual address translation hardware uses a cache, there are now four different outcomes possible for each address translation operation:

    Hit: The desired page table entry was in the hardware address translation cache. Address translation takes place as expected.

    Miss: The desired page table entry is not in the hardware address translation cache. The address translation unit performs one memory cycle to fetch the page table entry for the desired page. Address translation takes place as expected by the software, but it is slowed by the additional hardware activity.

    Segment Fault: A miss occurs, but the desired page table entry cannot be fetched because the desired segment of the address map is not in main memory. The address translation unit therefore requests a fault, aborting the current instruction and allowing the operating system segment fault trap service routine to bring in the desired segment.

    Page Fault: A miss occurs, the desired page table entry is available, but the page is marked as invalid. The address translation unit requests a fault, aborting the current instruction and allowing the page fault trap service routine to bring in the desired page.

  4. The Software View

    With a paged segmented virtual address space, the software view of address translation is summarized by the following picture:

                     ___  ____
                 Seg  |  |    | Segment
    Virt Addr  |----->|  |    | Table
    ---------->|---   |  |____|       ____
               |-  | --->|____|----->|    | Page
                 | |     |____|   |  |    | Table
                 | | Page         |  |    |      
                 |  ------------->|  |____|       ____
                 |               --->|____|----->|    | Page
                 |                   |____|   |  |    |
                 | Byte in Page               |  |    |
                  --------------------------->|  |____|
                                physical addr--->|____|
                                                 |____|
    
    To translate one virtual address to a physical address, first a word must be fetched from the segment table; this holds the address of the page table for that segment. Next, a word must be fetched from the page table for that segment; this holds the frame number of the desired page, from which the physical address may be computed.

    A cache is always included in the address translation hardware so that these extra memory accesses are only required when there is a cache miss. In fact, if the cache entries may be directly manipulated by software, the cache itself can be entirely managed by software, in response to page faults.

  5. Address Translation and Protection

    Virtual address translation hardware does more than just translate addresses. It can also serve a central role in protection. In this discussion, the term protection domain is important. A protection domain describes, for one logical resource user, for example, a process or a person, what resources that user should be able to access and how. Each logical resource user therefore operates in a different domain.

    When there is One big virtual address space, a change of protection domains can be accomplished by changing the markings on the address map entries. Those map entries describing pages that the new user should not be able to access should be marked inaccessible; those entries describing pages the user should not be able to modify should be marked read-only, and so on.

    Editing address maps can be expensive, particularly if they are large, so it is more common to assign a separate virtual address map per domain. In this case, all that needs to be done when the domain changes is to change the pointer to the currently active segment table. The virtual address map then becomes the primary record of what pages are in each domain and what access each domain allows to each page.

    With one virtual address map per domain, it is common to find that most domains are fairly small; if a full virtual address map occupies a few megabytes, this can lead to a significant waste of resources, so many virtual addressing systems allow some mechanism to eliminate storage of irrelevant parts of the address map. For example, entire segments of the map may be marked as invalid in the segment table, so that no space must be allocated to the corresponding page tables, or there may be a map limit register that describes size of the address map.

    With one virtual address map per domain, sharing of pages between domains is accomplished by replicating the page table entries for the shared pages between all processes that share those pages. When the map is segmented, shared segments are generally used instead of shared pages.

  6. Emphasis added!

    In all cases, virtual memory rests on the fundamental empirical observation that most programs exhibit locality.

    The locality principle is that those pages that have been recently referenced are likely to be referenced again (temporal locality), and that, when new pages are referenced, they are likely to be near recently referenced pages (spatial locality).

    As a consequence, we can talk about the working set of a process, the set of pages it is currently using. When a process comes into execution, there will typically be a sequence of page faults as its working set is brought into memory, and then the system will settle down, with only an occasional page fault as the process changes its working set to work on new areas of memory.

    When the number of available page frames is insufficient to hold the number of pages in the working sets the processes currently running on a system, the system typically runs very slowly, with a large fraction of all memory accesses causing page faults; with each page fault, some page that will be needed in the near future is swapped to disk, guaranteeing that there will be yet another page fault in the near future. A system with this behavior is said to be thrashing.

    The application or mix of applications determines the working set size. For any particular application, a graph of page-faults per second versus number of page frames made available to that application will tend to look like the following:

           |
           |______    Working set size
           |      --_ |
     page  |         -|
    faults |          _
     per   |Thrashing   CPU bound
    second |    or    |-
           |I/O bound | _ 
           |          |  -__
           |          W     -------
         --+------------------------
           |  page frames available
    
    In this graph, the number of page frames in the working set is empirically determined by the location of the transition from normal CPU bound operation to the thrashing state.

    Some application exhibit a sharp thrashing transition -- these are programs with high locality and a well defined working set. Others exhibit a gentle transition to thrashing -- these are programs with poor locality and an ill defined working set.