22C:116, Lecture Notes, Lecture 11, Fall 2001

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Locality!

    In all cases, virtual memory rests on the fundamental empirical observation that most programs exhibit locality.

    The locality principle is that those pages that have been recently referenced are likely to be referenced again (temporal locality), and that, when new pages are referenced, they are likely to be near recently referenced pages (spatial locality).

    As a consequence, we can talk about the working set of a process, the set of pages it is currently using. When a process comes into execution, there will typically be a sequence of page faults as its working set is brought into memory, and then the system will settle down, with only an occasional page fault as the process changes its working set to work on new areas of memory.

    When the number of available page frames is insufficient to hold the number of pages in the working sets the processes currently running on a system, the system typically runs very slowly, with a large fraction of all memory accesses causing page faults; with each page fault, some page that will be needed in the near future is swapped to disk, guaranteeing that there will be yet another page fault in the near future. A system with this behavior is said to be thrashing.

    The application or mix of applications determines the working set size. For any particular application, a graph of page-faults per second versus number of page frames made available to that application will tend to look like the following:

           |
           |______    Working set size
           |      --_ |
     page  |         -|
    faults |          _
     per   |Thrashing   CPU bound
    second |    or    |-
           |I/O bound | _ 
           |          |  -__
           |          W     -------
         --+------------------------
           |  page frames available
    
    In this graph, the number of page frames in the working set is empirically determined by the location of the transition from normal CPU bound operation to the thrashing state.

    Some application exhibit a sharp thrashing transition -- these are programs with high locality and a well defined working set. Others exhibit a gentle transition to thrashing -- these are programs with poor locality and an ill defined working set.

  2. Page Replacement Policies

    What we want in a demand-paged virtual memory implementation is a system that keeps the working set in fast main memory while moving pages that are not in the working set back to slow disk. Thus, the policy governing the selection of what page to remove from main memory when there is a page fault determines, to a tremendous extent, the performance of a virtual memory system. The following policies will be discussed here:

  3. Page Tables and Frame Tables

    Note that many implementations of paged virtual memory rely on two distinct data structures, the page table and the frame table. Typically, only one of these is known to hardware, while the other is entirely the creation of system software. There are even some memory management units that have no complete knowledge of either table, but only store, in hardware, a cache of recently used page table entries, where this cache is managed by software.

    The page table describes the state of each page in the virtual address space. There is one entry in the page table for every addressable page in the virtual address space. This table frequently grows very large for large address spaces.

    The page table is indexed by the virtual address of the page, and each page table entry typically has a field that describes where that page is currently stored -- for pages currently not in main memory, this would typically be the disk address where the page is stored; for pages in main memory, this would be the page frame holding the page.

    The frame table is indexed by the physical address of the frame, and the entries describe the state of each page frame in main memory. For those frames currently in use, it describes what pages are is them. There is one entry for each page frame in the physical address space. At any instant, the frame table can typically be derived from the page table, but this derivation is computationally expensive.

    The minimum frame table is just an inverted index (a term from database management), holding the index of the page table entry holding the page currently stored in each page frame. On some systems, more information is stored in the frame table; such information as whether the frame has been recently used and whether the contents of the frame have been changed by the CPU are all as useful if stored in the frame table as they are if stored in the page table.

                    Page table                Frame table
    Page Number     __________    Frame       ___________
         |         |_|________|   number     |_|_________|
         |         |_|________|      |       |_|_________|
         |         |_|________|       ------>|_|_________|
          -------->|_|________|              |_|_________|
                   |_|________|              |_|_________|
                   |_|________|               |     |
                   |_|________|      <--------       ------->
                    |     |            mark        page number
         <----------       -------->
        rights               frame number
    
    All of the page replacement algorithms select a candidate for replacement by a scan of the frame table. The clock algorithm, for example, scans the frames for a frame holding a page with the mark bit set. It is natural to think of the mark bit as an attribute of the frame table entry, but in fact, it may be necessary to follow the frame table entry back to the page table to find the mark bit, since many MMU's know nothing of the frame table but only operate on the page table or on a cache of recently used page table entries.

  4. Hardware Support Needed for LRU

    LRU replacement requires special hardware to track every memory access, for example, to put a timestamp or serial number on each page frame or on each page-table entry. This timestamp is updated by the memory management unit with every memory reference.

          A Frame Table Entry
           _________________________________
          |__________|______________________|
          |  Serial  |     Page Number      |
          |  Number  |                      |
          |          |                      |
          |   added  |   original fields    |
    
    			or
    
          A Page Table Entry
           _________________________________
          |__________|______|_______________|
          |  Serial  |Rights| Frame Number  |
          |  Number  |      |               |
          |          |                      |
          |   added  |   original fields    |
    
    
    We can store the serial number in either the frame table or the page table, wherever the MMU can store it most conveniently. When there is a page fault, we search for the smallest (or oldest) frame-table entry (that is, the frame containing the page that was least recently used) and select it for replacement. If the serial number is stored in the page table, we must use the page number field of the frame table to find the page table entry that corresponds to the serial number.

    MULTICS on the GE 600 used true LRU replacement, back in the late 1960's, but few systems since that time have used true LRU. The MULTICS/GE600 implementation used a special high-speed core memory to implement its cache of all currently available page frames. The Following sequential algorithm was used by the address translation unit, implemented in fast hardware:

       I = 0
       Fetch entry(I) into temp
       while I < cache_size
       and temp.page /= virtual_address.page do
           I = I + 1;
           exchange temp with entry(I)
       endloop
       if temp.page = virtual_address.page
           Store temp into entry(0)
       else
           Report a page fault for virtual_address.page
           Report that temp.frame is the least recently used page frame
           Copy the map entry from main memory into cache entry(0).
    
    This keeps the entries in the cache ordered by recency of use. This ordering guarantees that the cache will be fast for references to recently used frames (those that are likely to be frequently used) while it is slow for frames that weren't recently used (those that are unlikely to be frequently used).

    The MULTICS/GE600 scheme is an elegant historical curiosity, of only limited practical value today because it is predicated on core memory technology. Modern VLSI technology is poorly matched to this algorithm.

    MULTICS was the first design for a large-scale timesharing system, developed with federal funds at Project MAC at MIT starting in the mid 1960's. The original development involved a consortium involving MIT, GE and Bell Labs, but Bell Labs dropped out of the project, and Honeywell bought GE's computer business and introduced the Honeywell 6000 as a successor to the GE 600.

    In the 1970's, MULTICS became one of two standard operating systems offered on Honeywell mainframes. the H6000 MMU abandoned the true LRU replacement of the GE 600 in favor of faster address translation schemes, and of course, all later versions of MULTICS, a system that survived into the late 1980's, used these more modern schemes.

    MULTICS was originally conceived of as a computer utility. The idea was that such a utility would be operated in much the same way as the telephone and power utilities, but instead of selling power or audio communication services, the computer utility would sell CPU time and disk storage to its customers. Today, this isn't a radical idea -- there are many computer utilities, ranging from national services like AOL to smaller local providers. The concept was a radical one in the mid 1960's, and this idea drove a number of technical developments.

  5. Clock page replacement

    The clock algorithm allows a good approximation of LRU replacement, although if there are too few page frames available, it degenerates to FIFO replacement.

    Clock replacement requires that there be one bit per page frame to record whether that page has been recently referenced. This bit is set by hardware when the page frame is referenced, and it is reset by the page-fault service routine, in the process of deciding what page frame to replace.

    To select a page frame to replace, the page fault service routine software sweeps through the page frames, resetting the referenced bit on each page frame and stopping when it finds a page frame with this bit already reset. The page in this frame is the one that is replaced (copied out to disk to make an empty frame).

    The clock algorithm is usually pictured with a clock face, where each minute on the clock face corresponds to a page frame. The clock hand remains fixed except during the execution of the page fault service routine. When a page fault occurs, the clock hand begins its sweep. The hand stops as soon as it sees an unmarked page frame. The clock hand removes the marks from each page frame it sweeps over.

    Data Structures and code for the Clock Algorithm

    The Clock Algorithm requires a frame table, that is, a table indexed by physical memory page frame number and containing, for each page frame, the virtual address or page number of the page in that frame, along with at least one status bit, the mark bit.

                              Frame Table
                            ________________
                           |_X__|___________|
                           |____|___________|
      Clock Hand -         |____|___________|
                   \       |____|___________|
                     ----->|____|___________|
                        |  |_X__|___________|
         Sweep Direction|  |_X__|___________|
                        |  |____|___________|
                        V  |_X__|___________|
                           |mark| what page |
    
    
    	find page to replace:
               while frame_table[clock_hand].mark is set do
                  reset frame_table[clock_hand].mark
                  clock_hand = (clock_hand + 1) mod table_size
               -- now, frame_table[clock_hand].what_page
               -- is the page number to be written back to disk,
               -- and clock_hand is the frame number to use with
               -- the page being brought in from disk.
    

  6. Modified Clock Replacement

    It turns out that the clock algorithm can be modified to work without a hardware supported referenced bit. The first large-scale use of this modification was in the DEC VAX family of computers, but many modern RISC systems also take advantage of this to simplify their memory management units.

    As the clock hand sweeps by a page frame, all page table entries referencing that frame are marked as invalid. In fact, the page is still in memory, but the memory management unit has been told that it is not there. This combination, where the page is in memory but the page-table entry for that page is marked as if it is not, is used to mean "not recently referenced".

    When the CPU references an invalid page, there is a page fault. If the page is actually in memory, this is called a soft page fault. The page table entry is simply changed from invalid to valid. This combination, page in memory and marked as such, is considered to mean "referenced".

    Modified Clock Page Table Entry

           _____________________________
          |______|______|_______________|
          | Real |Rights      Frame     |
          |Rights|                      |
          |      |                      |
          | soft |   hardware fields    |
          | bits |                      |
    
    Real Rights = Rights = valid

    Marked. This page is not a candidate for clock replacement; it has been recently used.

    Real Rights = valid; Rights = invalid

    Not Marked. A candidate for replacement. Faults involving this page are "soft"; this means, the page should be marked by setting the hardware rights to the real rights.

    Real Rights = Rights = invalid

    Paged out! Faults involving this page are "hard" -- page replacement and disk I/O are needed to resolve this fault.

    Real Rights = invalid; Rights = valid Strange! This combination could perhaps be used by software for something, but it is not clear what. Access to this page is legal and will not cause a fault.

  7. Dirty Pages

    A dirty page is a page that has been modified since it was copied from disk to main memory. Dirty pages must be copied back to disk when the frames that hold them are preempted to hold other pages. Clean pages, those that are not dirty, need not be copied back to disk if original copy of the page on disk is still available. This can cut the time needed to service some page faults in half.

    Tracking clean and dirty pages requires one bit per page frame; typically, we assume that this bit is set to mark the page as dirty whenever there is a write operation on that page, and we let the software reset this bit whenever it writes the contents of a page frame to disk. The latter operation is called cleaning the page frame.

    This can be combined with the clock algorithm as follows:

    As the clock hand sweeps by dirty page frames they are scheduled for cleaning; the actual write operations to disk take place in parallel with computation and are typically done in whatever order is convenient for the disk interface hardware and software.

    The clock hand only stops its sweep when it encounters a page frame that is both unreferenced and clean.

    This scheme (modified clock replacement) is typical of modern virtual memory operating systems. This scheme requires minimal hardware support and yet delivers performance comparable to LRU replacement.

    If the computer does not include hardware to record the fact that a page is dirty, this may be done by software by marking clean pages as read-only. When a write operation is made to a read-only page, the resulting soft page fault marks the page as read-write, which is interpreted to mean that it is a dirty page.

    Some implementations of such schemes are enhanced with anticipatory paging -- for example, when there is a page fault for page i, they might bring in pages i, i+1 and i+2. This assumes that a program that references page i is highly likely to use the following pages. Since this assumption is not always a good one, such systems usually provide a way to turn off the anticipation logic in the page replacement algorithm. Anticipatory paging works very well for programs that use numerous large arrays and it provides few benefits for programs dominated by complex linked lists jumbled into the heap.

    Exercise: Outline a procedure to be called from the page-fault service routine that will search the set of page frames and select a page to replace. Do this for the simple clock algorithm, the clock algorithm with clean-dirty page management, and each of the above with memory management unit hardware that does not mark used and dirty pages. Before you do this, you may want to solve the following:

    Subsidiary Exercise: What are the states of a page-frame.

    Subsidiary Exercise: The clock algorithm cyclicaly sweeps through the table of all page frames. This is not the same as the address map of the virtual address space! Describe the relationship between these two data structures!