22C:116, Lecture Notes, Feb. 24, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

Focus on Protection Mechanisms
The mechanism - policy distinction emerged with the work of Anita K Jones on the Hydra system at Carnegie Mellon University.
Prior to this, protection was frequently managed on an ad-hoc basis, or mechanisms were implemented to support a specific policy. The policy supported was frequently inadequate, and the fact that the mechanism could be generalized was usually accidental.
As a rule, those who "own" information should control the policy used to protect that information, and the system should allow for any reasonable policy.
Crude Protection Mechanisms
Crude protection mechanisms are still very common! The most common such mechanism is the 2-state system.
In User state, use of instructions that operate on input/output devices and the memory management unit are forbidden. If a forbidden instruction is executed, there is a trap.
System state allows all instructions to be executed. This state is entered when a trap occurs, and the instruction sequence to return from the trap typically resets the system to user state.
Such 2 state systems are typical of the marketplace. Far more sophisticated systems are possible -- for example, the 80286 (and higher) chips support a fully general mechanism, which, unfortunately, is not fully exploited by any commercially available operating system.
All unprivileged user code runs in user state on a 2 state system. The current state (system or user) is saved with the program counter when there is an interrupt or trap, and then system state is entered as part of the control transfer to the interrupt service routine. Return from interrupt (or return from trap) restores the state of the interrupted program.
Any memory management unit can be used, although the variety of policies that can be implemented depends on the unit. A simple MMU that merely limits the range of allowed addresses on fetch and store operations will effectively protect disjoint users from each other. A complex MMU that allows paged addressing will allow complex information sharing between users.
If there is a paged MMU, either the page holding the interrupt service routine must be in the user's address space, marked read/execute only, or there must be a system address space supported by the MMU -- for example, the current state (user or system) can be given to the MMU as an extra high-order bit of the address.
System calls
On systems with no protection mechanisms, system calls are merely procedure calls. Conceptually, this remains true when there is a protection mechanism, but with protection mechanisms, the implementation of system calls is more complex.
On systems with protection mechanisms, system calls typically involve not only a transfer of control from the user program to the code of the system, but they also involve a change of operating state from user state to system state.
Typically, the transfer of control from user code to system code is accomplished by deliberately using an unimplemented or privileged instruction, causing a trap. This both transfers control and changes the state of the system. The trap service code that intercepts system calls gives new, software defined meanings to unimplemented or privileged instructions that are used for system calls.
In an unprotected system, system calls are usually done with the standard calling sequence and standard parameter passing mechanisms used by user programs. The system code and all user code typically shares a single memory address space, and aside from the changes made to the library to prevent the loading of duplicate copies of the system code, linkage between user code and system code is done by the standard linkage editor.
In a protected system, the semantics of a system call is the same as in an unprotected system. The only difference lies in the reaction of the system to errors (does it detect the error quickly, or does chaos emerge slowly as the consequences of the error propagate through the system until the whole system crashes).
Aside. To hardware, a trap looks like an interrupt. To software a trap is a call to a system procedure from user code. An interrupt is an asynchronously initiated context switch.
To software, code executing in a trap service routine is viewed as part of the process that caused the trap. Thus, that code may call on scheduling primitives such as P and V, with all of their effects on the running process.
In a demand paged system, for example, this is essential! The page-fault-service routine must schedule the reading and writing of pages, and while the I/O is in progress, the program that caused the page-fault must wait. This is easily implemented by the same read and write logic that is used to block user code while I/O is in progress, and wile the process that caused the page fault waits, it is natural to schedule other processes.
To the software, the interrupt service routines are quite different. These are usually viewed as separate logical processes, but instead of running under the authority of the software process scheduler, they run under the authority of the interrupt hardware (which is usually viewed as being at a higher priority than any software priority level).
One result of this is that the normal scheduling services such as P and V are not available to the interrupt service routines. Parts of the scheduler are implemented by interrupt service routines, but if an interrupt service routine wants to wait, it must do a return from interrupt, and if it wants to do a signal operation, it may not usually relinquish control through the scheduler the same way a normal user process does.
Implementing System Calls
In UNIX, the trap instructions used to issue system calls are hidden from the user. If you read a user program and find a call to a system routine, such as the read service, it looks just like a procedure call. Inside the system code, you'll find the corresponding procedure, and it is easy to imagine that the user directly calls this procedure.
In fact, the situation is quite different! When a user calls read, this actually calls a "system call stub" that is part of the standard system library. This stub is a tiny procedure with a body that typically containis just one trap instruction. The trap service routine is responsible for determining, from the trap instruction that was executed, what system call was requested, then making the parameters accessible in system state and calling the appropriate system procedure. This is illustrated here:
```
   USER STATE              /  SYSTEM STATE
                          /
       Read( F, Buf )    /
       {                /
         TRAP ---------/---> Trap Service Routine
       }              /      {
                     /         Make F, Buf addressable;
                    /          Read( F, Buf );
                   /         }
```
The user's code and the internal code for Read may all be in a high level language (such as C). The only low level language code needed is the code to actually cause the trap and the code to transfer the arguments across the "firewall" between user state and system state. Other machine code may be needed to actually do the input/output and service interrupts, but again, this is usually carefully encapsulated so that only a few hundred lines of machine code are needed in the entire operating system.
Focus on Memory Protection
If access to the memory management unit is protected so that only the system can change the memory management registers, then the memory management registers can be considered to be a protected part of the user's process state.
This trivially allows each user to be given access to a disjoint region of the systems virtual memory.
Memory protection is a key element in protecting the resources of a single-CPU or shared memory machine. It is essential that users not be able to arbitrarily manipulate the variables or code of the system; if I/O is memory-mapped, memory protection can even be used to control access to I/O devices.
This trivial policy can be done with almost any MMU hardware, but if paged or segmented access is allowed, far more can be done. Here's an illustration, using a paged system:
```
        Trivial policy
                         USER 1           USER 2
                          ____             ____
        MMU mapping      |  o-|-----      |  o-|-----
        registers        |  o-|---  |     |  o-|---  |
                         |  o-|-  | |     |  o-|-  | |
                         |____| | | |     |____| | | |
                                | | |            | | |
                      _______   | | |  _______   | | |
        Pages on     |       |<-  | | |       |<-  | |
        disk or in   |_______|    | | |_______|    | |
        main memory     _______   | |    _______   | |
                       |       |<-  |   |       |<-  |
                       |_______|    |   |_______|    |
                          _______   |      _______   |
                         |       |<-      |       |<-
                         |_______|        |_______|
```
To switch between these two users, the system must not only change their CPU registers (Accumulator, Program counter, etc), but it must change the contents of the MMU registers.
If the user address space doesn't contain any system code, then the MMU must have a second set of mapping registers that apply in system state, or alternately, the mapping must be turned off in system state. The latter is simple, but has performance problems. The former (with two sets of mapping registers) is easy -- but it may make it difficult for system code to address arguments to system calls, since they're in the wrong address space. The alternative of allowing system and user code to both address the same memory requires that there be some mechanism that changes the addressability of system pages when system state is entered -- for example, the MMU could declare that access to system pages is legal only when the program is in system state.
Sharing Memory Between Users
Here s a picture showing how memory mapping units can support shared memory:
```
                         USER 1            USER 2
                          ____              ____
        MMU mapping      |  o-|-----       |  o-|-----
        registers        |  o-|---  |      |  o-|---  |
                         |  o-|-  | |      |  o-|-  | |
                         |____| | | |      |____| | | |
                                | | |             | | |
                      _______   | | |   _______   | | |
        Pages on     |       |<-  |  ->|       |<-  | |
        disk or in   |_______|    |    |_______|    | |
        main memory     _______   |       _______   | |
                       |       |<-       |       |<-  |
                       |_______|         |_______|    |
                                            _______   |
                                           |       |<-
                                           |_______|
```
There are some problems with this! How do users arrange to share pages (or segments)? What if shared pages are not at the same virtual address?
In most UNIX systems, read-only code segments are always shared between all processes that are executing the same program. Shared read-write segments are allowed under System V UNIX, using a scheme borrowed from the 1966 vintage Berkeley Timesharing System. This required the following two system calls:
```
  Create shared segment ( pages, name, security )

  Insert shared segment ( pages, name, security, address )
```
When a shared segment is created, it is given a name. Unfortunately, this name is in a name-space that is flat (unstructured) and there are no tools provided to prevent users from selecting names that conflict.
Sharable segments have a size, an identifying name, and some security attributes (on the Berkeley system, a password; on UNIX, a short access control list with the same format as that used for files).
When a process needs to address a shared segment, the user specifies the segment, the number of pages desired (which may be less than the size), and the virtual address where the segment should appear in the user's address space.
If two different processes insert the same segment at different virtual addresses, then pointers that refer to part of the segment from within the address space of one process will not refer to the same place from within the virtual address of the other process.
Alternative Approaches to Sharing Memory
On the Encore Multimax under the UMAX operating system, a variant of UNIX, there was only one system service used for memory sharing:
```
    Share( page )
      marks the page as shared, but has no immediate effect.
```
Marking a page as shared has no effect until a process executes a fork system call. At that point, all non-shared pages are copied (at least, this is the user's illusion), while pages marked as shared remain accessable by both the parent and child.
This scheme allows pages to be shared between processes only if the common ancestor of those processes established the sharing. This is sufficient to allow multiple processes to cooperate in solving some problem on behalf of a single user, but it doesn't allow sharing between users.
This scheme guarantees that a shared page will occur in the same place in all address spaces that share it.
On demand paged UNIX systems, the trivial implementation of Fork is expensive -- copying all non-shared pages of a process requires that all pages be pulled into memory and copied, a job that may require many disk accesses in a row. To avoid this, most virtual memory UNIX systems use what is called copy-on-write. After a fork, all pages are physically shared but marked as read-only. When a user tries to write on a private page, the trap service routine notes that the page is not supposed to be a shared page and makes the copy, at that point. Thus, only pages that are actually modified get copied, and the copying operation is distributed over the life of the child processes.
Alternative Approaches to Sharing Memory
The MULTICS system had only one operation that supported sharing:
```
    OPEN( file, address )
      Maps pages of the indicated file into the process's
      address space at the indicated address.
```
This relies on the file system to provide the necessary protection of files to determine who can open what.
The MULTICS system was built on the GE 600 by a consortium of MIT and GE. Honeywell bought out GE's computer division and continued selling MULTICS systems through the 1970's. There are still a few MULTICS systems running, and other manufacturers, notably Prime, have copied most of the ideas from MULTICS.
The name UNIX is a pun on MULTICS. UNIX was written at Bell Labs after Bell dropped out of the MULTICS consortium.
MULTICS used a paged-segmented model of virtual memory, and each segment could address one file. Mapping a file into the virtual address space of a process was the only way to read or write that file. (Sequential I/O to terminals or mag-tape was done through different mechanisms.)
If two users opened the same file at different virtual addresses (different segment numbers), then they had problems if they put any pointers in the file.