22C:116, Lecture Notes, Lecture 11, Fall 2001

Douglas W. Jones
University of Iowa Department of Computer Science

Locality!
In all cases, virtual memory rests on the fundamental empirical observation that most programs exhibit locality.
The locality principle is that those pages that have been recently referenced are likely to be referenced again (temporal locality), and that, when new pages are referenced, they are likely to be near recently referenced pages (spatial locality).
As a consequence, we can talk about the working set of a process, the set of pages it is currently using. When a process comes into execution, there will typically be a sequence of page faults as its working set is brought into memory, and then the system will settle down, with only an occasional page fault as the process changes its working set to work on new areas of memory.
When the number of available page frames is insufficient to hold the number of pages in the working sets the processes currently running on a system, the system typically runs very slowly, with a large fraction of all memory accesses causing page faults; with each page fault, some page that will be needed in the near future is swapped to disk, guaranteeing that there will be yet another page fault in the near future. A system with this behavior is said to be thrashing.
The application or mix of applications determines the working set size. For any particular application, a graph of page-faults per second versus number of page frames made available to that application will tend to look like the following:
```
       |
       |______    Working set size
       |      --_ |
 page  |         -|
faults |          _
 per   |Thrashing   CPU bound
second |    or    |-
       |I/O bound | _ 
       |          |  -__
       |          W     -------
     --+------------------------
       |  page frames available
```
In this graph, the number of page frames in the working set is empirically determined by the location of the transition from normal CPU bound operation to the thrashing state.
Some application exhibit a sharp thrashing transition -- these are programs with high locality and a well defined working set. Others exhibit a gentle transition to thrashing -- these are programs with poor locality and an ill defined working set.
Page Replacement Policies
What we want in a demand-paged virtual memory implementation is a system that keeps the working set in fast main memory while moving pages that are not in the working set back to slow disk. Thus, the policy governing the selection of what page to remove from main memory when there is a page fault determines, to a tremendous extent, the performance of a virtual memory system. The following policies will be discussed here:
- Random if you do worse than this, shame!
- FIFO curiously, as bad as random.
- LIFO obviously horrible!
- LRU the best practical algorithm.
- Clock almost as good as LRU, commonly used.
- Belady's optimal policy impossible to use.
- Random page replacement requires no intelligence and works moderately well. When a page frame is needed, the random algorithm simply selects any page frame at random, dumps the page from that frame onto disk, and then uses it. If the memory reference pattern of the application program is very irregular, exhibiting very poor locality, random replacement may may even be the best policy. This occurs, for example, in some list processing applications.
- FIFO or first-in first-out replacement is an obvious policy, where the page that has been in main memory the longest is the one replaced. In practice, with most applications, this turns out to be as bad as random replacement. The fact that a page has been in memory for a long time does not imply that it is not part of the working set.
- LIFO or last-in first-out replacement is included in this list only for the sake of completeness. This policy is among the worst possible policies! The page most recently brought into main memory is frequently part of the working set, so sending it back to disk is almost always a very bad idea.
- LRU or least-recently-used replacement is usually the best choice, if the necessary hardware is available. This policy selects for replacement the page in main memory that has been least recently used. If a page has not been recently used, it is reasonably likely it will not be needed anytime soon. Unfortunately, LRU replacement requires that the MMU perform some fairly complex record keeping, for example, by keeping a running record of when each page in memory was accessed.
- Clock replacement is almost as good as LRU replacement. This scheme divides the set of pages in main memory into those that have recently been used, and those that are less recently used, and replaces one of the less recently used pages, using an algorithm to be discussed later. The MMU needs to maintain only one bit per page frame to support this policy, and there are clever ways to implement approximations of this policy with even less hardware support.
- Belady's optimal page replacement policy is unimplementable because it requires predicting the future. This policy orders all pages in main memory by how far into the future they will be needed, and replaces that page that will not be needed for the longest time into the future. Proof that this policy is optimal is straightforward, and we can determine the expected number of page faults a program would produce under this policy by retrospective analysis of the string of all memory addresses issued by a running program. As a result of this, we can compare the performance of practical policies with Belady's policy; such comparisons (first done by Belady in the late 1960's) show that LRU replacement usually comes surprisingly close to Belady's scheme. In effect, we conclude that how recently a page has been used is generally a good predictor of how likely it is to be needed again.
Page Tables and Frame Tables
Note that many implementations of paged virtual memory rely on two distinct data structures, the page table and the frame table. Typically, only one of these is known to hardware, while the other is entirely the creation of system software. There are even some memory management units that have no complete knowledge of either table, but only store, in hardware, a cache of recently used page table entries, where this cache is managed by software.
The page table describes the state of each page in the virtual address space. There is one entry in the page table for every addressable page in the virtual address space. This table frequently grows very large for large address spaces.
The page table is indexed by the virtual address of the page, and each page table entry typically has a field that describes where that page is currently stored -- for pages currently not in main memory, this would typically be the disk address where the page is stored; for pages in main memory, this would be the page frame holding the page.
The frame table is indexed by the physical address of the frame, and the entries describe the state of each page frame in main memory. For those frames currently in use, it describes what pages are is them. There is one entry for each page frame in the physical address space. At any instant, the frame table can typically be derived from the page table, but this derivation is computationally expensive.
The minimum frame table is just an inverted index (a term from database management), holding the index of the page table entry holding the page currently stored in each page frame. On some systems, more information is stored in the frame table; such information as whether the frame has been recently used and whether the contents of the frame have been changed by the CPU are all as useful if stored in the frame table as they are if stored in the page table.
```
                Page table                Frame table
Page Number     __________    Frame       ___________
     |         |_|________|   number     |_|_________|
     |         |_|________|      |       |_|_________|
     |         |_|________|       ------>|_|_________|
      -------->|_|________|              |_|_________|
               |_|________|              |_|_________|
               |_|________|               |     |
               |_|________|      <--------       ------->
                |     |            mark        page number
     <----------       -------->
    rights               frame number
```
All of the page replacement algorithms select a candidate for replacement by a scan of the frame table. The clock algorithm, for example, scans the frames for a frame holding a page with the mark bit set. It is natural to think of the mark bit as an attribute of the frame table entry, but in fact, it may be necessary to follow the frame table entry back to the page table to find the mark bit, since many MMU's know nothing of the frame table but only operate on the page table or on a cache of recently used page table entries.
Hardware Support Needed for LRU
LRU replacement requires special hardware to track every memory access, for example, to put a timestamp or serial number on each page frame or on each page-table entry. This timestamp is updated by the memory management unit with every memory reference.
```
      A Frame Table Entry
       _________________________________
      |__________|______________________|
      |  Serial  |     Page Number      |
      |  Number  |                      |
      |          |                      |
      |   added  |   original fields    |

			or

      A Page Table Entry
       _________________________________
      |__________|______|_______________|
      |  Serial  |Rights| Frame Number  |
      |  Number  |      |               |
      |          |                      |
      |   added  |   original fields    |
```
We can store the serial number in either the frame table or the page table, wherever the MMU can store it most conveniently. When there is a page fault, we search for the smallest (or oldest) frame-table entry (that is, the frame containing the page that was least recently used) and select it for replacement. If the serial number is stored in the page table, we must use the page number field of the frame table to find the page table entry that corresponds to the serial number.
MULTICS on the GE 600 used true LRU replacement, back in the late 1960's, but few systems since that time have used true LRU. The MULTICS/GE600 implementation used a special high-speed core memory to implement its cache of all currently available page frames. The Following sequential algorithm was used by the address translation unit, implemented in fast hardware:
```
   I = 0
   Fetch entry(I) into temp
   while I < cache_size
   and temp.page /= virtual_address.page do
       I = I + 1;
       exchange temp with entry(I)
   endloop
   if temp.page = virtual_address.page
       Store temp into entry(0)
   else
       Report a page fault for virtual_address.page
       Report that temp.frame is the least recently used page frame
       Copy the map entry from main memory into cache entry(0).
```
This keeps the entries in the cache ordered by recency of use. This ordering guarantees that the cache will be fast for references to recently used frames (those that are likely to be frequently used) while it is slow for frames that weren't recently used (those that are unlikely to be frequently used).
The MULTICS/GE600 scheme is an elegant historical curiosity, of only limited practical value today because it is predicated on core memory technology. Modern VLSI technology is poorly matched to this algorithm.

MULTICS was the first design for a large-scale timesharing system, developed with federal funds at Project MAC at MIT starting in the mid 1960's. The original development involved a consortium involving MIT, GE and Bell Labs, but Bell Labs dropped out of the project, and Honeywell bought GE's computer business and introduced the Honeywell 6000 as a successor to the GE 600.
In the 1970's, MULTICS became one of two standard operating systems offered on Honeywell mainframes. the H6000 MMU abandoned the true LRU replacement of the GE 600 in favor of faster address translation schemes, and of course, all later versions of MULTICS, a system that survived into the late 1980's, used these more modern schemes.
MULTICS was originally conceived of as a computer utility. The idea was that such a utility would be operated in much the same way as the telephone and power utilities, but instead of selling power or audio communication services, the computer utility would sell CPU time and disk storage to its customers. Today, this isn't a radical idea -- there are many computer utilities, ranging from national services like AOL to smaller local providers. The concept was a radical one in the mid 1960's, and this idea drove a number of technical developments.
Clock page replacement
The clock algorithm allows a good approximation of LRU replacement, although if there are too few page frames available, it degenerates to FIFO replacement.
Clock replacement requires that there be one bit per page frame to record whether that page has been recently referenced. This bit is set by hardware when the page frame is referenced, and it is reset by the page-fault service routine, in the process of deciding what page frame to replace.
To select a page frame to replace, the page fault service routine software sweeps through the page frames, resetting the referenced bit on each page frame and stopping when it finds a page frame with this bit already reset. The page in this frame is the one that is replaced (copied out to disk to make an empty frame).
The clock algorithm is usually pictured with a clock face, where each minute on the clock face corresponds to a page frame. The clock hand remains fixed except during the execution of the page fault service routine. When a page fault occurs, the clock hand begins its sweep. The hand stops as soon as it sees an unmarked page frame. The clock hand removes the marks from each page frame it sweeps over.
Data Structures and code for the Clock Algorithm
The Clock Algorithm requires a frame table, that is, a table indexed by physical memory page frame number and containing, for each page frame, the virtual address or page number of the page in that frame, along with at least one status bit, the mark bit.
```
                          Frame Table
                        ________________
                       |_X__|___________|
                       |____|___________|
  Clock Hand -         |____|___________|
               \       |____|___________|
                 ----->|____|___________|
                    |  |_X__|___________|
     Sweep Direction|  |_X__|___________|
                    |  |____|___________|
                    V  |_X__|___________|
                       |mark| what page |


	find page to replace:
           while frame_table[clock_hand].mark is set do
              reset frame_table[clock_hand].mark
              clock_hand = (clock_hand + 1) mod table_size
           -- now, frame_table[clock_hand].what_page
           -- is the page number to be written back to disk,
           -- and clock_hand is the frame number to use with
           -- the page being brought in from disk.
```
Modified Clock Replacement
It turns out that the clock algorithm can be modified to work without a hardware supported referenced bit. The first large-scale use of this modification was in the DEC VAX family of computers, but many modern RISC systems also take advantage of this to simplify their memory management units.
As the clock hand sweeps by a page frame, all page table entries referencing that frame are marked as invalid. In fact, the page is still in memory, but the memory management unit has been told that it is not there. This combination, where the page is in memory but the page-table entry for that page is marked as if it is not, is used to mean "not recently referenced".
When the CPU references an invalid page, there is a page fault. If the page is actually in memory, this is called a soft page fault. The page table entry is simply changed from invalid to valid. This combination, page in memory and marked as such, is considered to mean "referenced".
Modified Clock Page Table Entry
```
       _____________________________
      |______|______|_______________|
      | Real |Rights      Frame     |
      |Rights|                      |
      |      |                      |
      | soft |   hardware fields    |
      | bits |                      |
```
Real Rights = Rights = valid
Marked. This page is not a candidate for clock replacement; it has been recently used.
Real Rights = valid; Rights = invalid
Not Marked. A candidate for replacement. Faults involving this page are "soft"; this means, the page should be marked by setting the hardware rights to the real rights.
Real Rights = Rights = invalid
Paged out! Faults involving this page are "hard" -- page replacement and disk I/O are needed to resolve this fault.
Real Rights = invalid; Rights = valid Strange! This combination could perhaps be used by software for something, but it is not clear what. Access to this page is legal and will not cause a fault.
Dirty Pages
A dirty page is a page that has been modified since it was copied from disk to main memory. Dirty pages must be copied back to disk when the frames that hold them are preempted to hold other pages. Clean pages, those that are not dirty, need not be copied back to disk if original copy of the page on disk is still available. This can cut the time needed to service some page faults in half.
Tracking clean and dirty pages requires one bit per page frame; typically, we assume that this bit is set to mark the page as dirty whenever there is a write operation on that page, and we let the software reset this bit whenever it writes the contents of a page frame to disk. The latter operation is called cleaning the page frame.
This can be combined with the clock algorithm as follows:
As the clock hand sweeps by dirty page frames they are scheduled for cleaning; the actual write operations to disk take place in parallel with computation and are typically done in whatever order is convenient for the disk interface hardware and software.
The clock hand only stops its sweep when it encounters a page frame that is both unreferenced and clean.
This scheme (modified clock replacement) is typical of modern virtual memory operating systems. This scheme requires minimal hardware support and yet delivers performance comparable to LRU replacement.
If the computer does not include hardware to record the fact that a page is dirty, this may be done by software by marking clean pages as read-only. When a write operation is made to a read-only page, the resulting soft page fault marks the page as read-write, which is interpreted to mean that it is a dirty page.
Some implementations of such schemes are enhanced with anticipatory paging -- for example, when there is a page fault for page i, they might bring in pages i, i+1 and i+2. This assumes that a program that references page i is highly likely to use the following pages. Since this assumption is not always a good one, such systems usually provide a way to turn off the anticipation logic in the page replacement algorithm. Anticipatory paging works very well for programs that use numerous large arrays and it provides few benefits for programs dominated by complex linked lists jumbled into the heap.
Exercise: Outline a procedure to be called from the page-fault service routine that will search the set of page frames and select a page to replace. Do this for the simple clock algorithm, the clock algorithm with clean-dirty page management, and each of the above with memory management unit hardware that does not mark used and dirty pages. Before you do this, you may want to solve the following:
Subsidiary Exercise: What are the states of a page-frame.
Subsidiary Exercise: The clock algorithm cyclicaly sweeps through the table of all page frames. This is not the same as the address map of the virtual address space! Describe the relationship between these two data structures!