22C:116, Lecture Notes, Feb. 27, 1995

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Focus on Protection Theory

    From a theoretical point of view, we can talk about any kind of object and any kind of user. It is worth asking if users are themselves a kind of object, but what matters is that users act on objects using operations, where the type of each object may determine the set of applicable operations.

    Here, we are not concerned with the nature of objects, other than the fact that objects may be operated on by users using operations selected from a finite set. Files fit this model, with the operations read and write. Directories fit this model, with the operations make-directory-entry and delete-directory-entry. Memory pages or segments also fit this model, with the operations load and store.

    We are also not concerned with the nature of users. From the the point of view of the theory, users may be humans, programs, or anything else that may envoke operations on objects.

  2. Static Protection Models

    The access matrix model is a universal model of the instantaneous state of a protected system. It shows, at one instant, all users, all objects, and the set of operations each user is allowed to apply to each object in the system. Pictorially, access matrices are generally presented as follows:

                   | Cake | Tea   |  Objects
             ------+------+-------|
              Fred | Eat  | Drink |
             ------+------+-------|
              Lucy | Bake | Drink |
                   |      | Brew  |
             ---------------------   Allowed
             Users                   Operations
    
    The above example illustrates the (rather steriotypical) interaction of two users, Fred and Lucy, centered around two objects, Cake and Tea. The operations Eat and Bake apply to cake; Fred can Eat Cake, but not Bake it, and Lucy can Bake Cake, but not Eat it. The operations Drink and Brew apply to Tea. From the point of view of the access matrix, neither the names of users, the names of objects, nor the names of operations have any meaning other than the explicit relations between users, objects and operations detailed in the matrix.

    Access matrices may be used to control access to any resource, by any kind of user. Their use is not limited to computers; for example, they have been used in the Russian Gulag to control the access of prisoner/laborers to rooms in a prison/factory.

    The access matrix describes allowed actions, but it does not describe an enforcement mechanism. In the case of the Russian prison, the rules of the matrix were enforced by guards with guns.

    On a computer, one could store the access matrix inside the operating system, and have the system check the matrix whenever a user tried to access a resource. The problem with this is that the matrix is usually quite sparse -- that is, on most systems, most matrix entries will say "no access".

    It is also important to note that the access matrix is a static description of who can do what operations on which resources. The model must be extended to describe how access rights are changed, and these extensions frequently differ markedly depending on how the access matrix is implemented.

  3. Implementing the Access Matrix

    There are two practical ways to implement the access matrix, access control lists and capabilty lists. Each of these allows efficient storage of the information in the access matrix.

    The access control list implementation of the access matrix given above is:

            Access Control Lists
                      --------       --------
            Objects  |  Cake  |     |  Tea   |
                      --------       --------
                         |              |
                      --------       --------
            Access   | Fred   |     | Fred   |
             Control |    Eat |     |  Drink |
              Lists  |--------|     |--------|
                     | Lucy   |     | Lucy   |
                     |   Bake |     |  Drink |
                      --------      |  Brew  |
                                     --------
    
    An access control list or ACL is what results when the access matrix is stored colum-wise. An access control list is a column of the matrix is stored with a protected object. Each entry in an access control list specifies a user and the operations that user may perform on that object.

    One problem with access control lists is that they may be quite large. It is easy to imagine the access control lists for some widely used files occupying more storage than the data in the file. The usual solution to this is to introduce named sets of users -- for example, OTHERS to specify the rights given to all users not explicitly mentioned in the list.

  4. Example the MULTICS file system.

    Under the classic MULTICS system, which was also used by the PRIMOS operating system, each file has an ACL, with access such as the following:

                Read
                Write
                Traverse -- for directories
                Execute -- for data
                EditACL
    
    The read and write rights should be obvious.

    The traverse right is used in the context of hierarchical directory structures. A user who has no right to read or write a directory can mention that directory in the path name of a file as long as the user has traverse rights to that directory -- the user is just passing through, but may not open the directory to list its contents.

    The right to edit an access control list illustrates one approach to allowing the contents of the access control matrix to be modified. In a sense, all users with the right to edit the access control list of an object can be thought of as co-owners of the object. If one user (the legitimate owner of an object) grants the right to edit that object's ACL to another user, it is like granting the other user a power of attourny over the object.

  5. Example the UNIX file system.

    Under UNIX, each file has an ACL, but the ACL is strictly limited in structure. Each UNIX ACL has 3 fixed entries:

         a) An entry for the owner of the file
         b) An entry for others in the same group as the the owner
         c) An entry for the general public
    
    The rights associated with each entry are:
                Read
                Write
                Execute/Traverse
    
    The MULTICS project originated as a joint project between MIT, GE and Bell Labs in the mid 1960's. In the late 1960's, Bell pulled out of the MULTICS project, largely because the technical staff of the computing group had concluded that MULTICS was getting too complicated. UNIX was born from this frustration, and the name UNIX is a pun on MULTICS -- suggesting a scaled down system designed to meet a scaled down set of goals.

    The simple to implement ACL scheme of UNIX is an example of this. Unlike the MULTICS ACL, it is not general, but it meets the needs of most users of small computer systems. As a result, every UNIX ACL fits exactly in 9 bits of the I-node of the file it describes!

    MULTICS was intended to meet the needs of a general information utility, and as such, it had to deal with the problems of large numbers of users, including users from competing organizations. The generality of the MULTICS access control list is well justified by this environment, and the success of PRIMOS in meeting the needs of compaines like Compuserve or The Source (modern examples of computer utilities) amply illustrates the appropriateness of the MULTICS design.

    When UNIX systems are extended beyond single departments, the short ACL of UNIX becomes a problem. Distributed versions of UNIX such as the Appolo Domain operating system have been forced to add general ACL's back onto the UNIX file model in order to meet the demands of the large user base that such a system can serve.

  6. Implementing the Access Matrix

    As mentioned above, capability are the alternative to access control lists. The capability list implementation of the sample access matrix given above is as follows:

             ------   --------------------
            | Fred |-| Tea     | Cake     |
             ------  |   Drink |      Eat |
                      --------------------
    
             ------   --------------------
            | Lucy |-| Tea     | Cake     |
             ------  |   Drink |     Bake |
                     |   Brew  |          |
                      --------------------
             Users    C-lists
    
    A Capability List is the dual (in the topological sense) of an access control list. Instead of storing one column of the access matrix with a protected object, a row of the access matrix is stored with the user.

    A capability list entry, known as a capability, consists of the name of an object and the rights one user has to that object.

    This does not change the rights a user has to a resource, and it does not change the set of policies that can be implemented, but it does change the economics of certain operations.

    With a capability-list based protection mechanism, it is clear that users should not be granted the right arbitrarily edit their own capability lists! Instead, it is common to consider the capability list itself to be an object to which rights can be granted. Thus, Fred in the above example might have the right to edit Lucy's capability list. Typically, users might have the right to move capabilities from any list they can read to any list they can write, but users would not be able to modify a capability, except by removing access rights from it.

  7. Capability Based Addressing

    The most important application of capability lists in modern computer systems is to addressing. With capability-based addressing, each address is formed from two parts, as follows:

             ADDRESS
             __________________
            |_______|__________|
                |         |
                |       Index into an addressed object
                |
              Index into a capability list
              used to select an object
    
    Capability based addressing, as such, was invented for the Chicago Magic Number computer, a paper machine that was described by people at the University of Chicago in the late 1960's. Capability lists were invented by people involved with the MULTICS project, although MULTICS was not itself a purely capability based machine.

    Capability based addressing solves some interesting problems. For example, on UNIX, users can frequently name objects that they may not access, and users may find themself in the odd situation where they own a file (and pay for the storage it occupies) yet they can't address it because they have no rights to the directory where that file is listed.

    Capability based addressing solves this problem by associating the right to access an object with the ability to name it. User's names of objects are always relative to some capability list, and any global absolute name for an object is hidden inside the capabilty and is never visible to the user.

  8. Examples

    The most common use of capability based addressing is in paged or segmented virtual memory address translation. Each entry in a page table clearly indicates both the object to which that entry grants access and the access rights granted, and the page table associated with each user is the page table for that user.

          VIRTUAL ADDRESS
          __________________
         |_______|__________|
             |         |
             |       Word in page
             |
           Index into the process's C-list for pages
    
    Note that it is difficult to imagine an efficient implementation of paged virtual address translation using one access control list per page. This would require either an associative memory or a search of some kind with each memory reference. In contract, a simple table lookup suffices to check the access rights in a conventional (capability list based) virtual address translation system.

    Another common example of the use of a capability list is in accessing open files. Even when access control lists are used to control the right to open a file, once the file is open, it is common to simply enter the file description into a list of open files attached to a process. Files in this list are referenced not by name, but by their position in the list, and the entries in this list indicate not only what file is involved but how it was opened -- that is, what access rights apply. Clearly, the list of open files is a capability list.

         Read( f, buffer, address )
               |            |
               |          address of a block in the file
               |
            index into the processes open file C-list
    
    These two examples illustrate a problem with the way capabilities are used in modern systems. Instead of a single capability list per user, where that list describes all the resources the user is allowed to access, we tend to give users a collection of lists, one list for each type of resource. This is awkward.

    Some systems have been built that solve this problem, notably the IBM AS/400 (and System 38). Another commercial system in this class is the Plessy System 250. Experimental systems that have solved this problem include the Cambridge CAP system and the Carnegie-Mellon Hydra and C.mmp systems. These will be discussed later.

    It is worth noting that, on Multics, when a file is opened, the sectors of that file become visible as a segment in the virtual address space of the process that requested the open operation. Thus, for Multics, there is no distinction between the capability list describing the segments a process may access and the capability list describing the files the process may access.

  9. Domains

    The domain of a process is the set of all objects that process may manipulate.

              Domain = C-list
    
    In the abstract, domain switching involves a change of the current working C-list. Domain switching is very important! It is very common, and systems must control it.

    For example, when a trap occurs, the system typically changes from a user domain to the system kernel's domain, and on return from trap, the system switches back to the user's domain.

    The entire reason for introducing two states in order to build a simple protected system is to create two domains, a system domain, that includes all objects, and a user domain, that includes only some objects.

    One of the most complex domain switching mechanisms was introduced on the Multics system. On this system, each process has a level between 0 and 16. In addition, each segment has a level, and when a process attempts a memory access, the hardware checks the process level against the level of the accessed segment.

    A Multics memory access is legal if the process has level higher than (or the same as) the segment, and illegal if the process level is below the segment level.

    The Multics hierarchy of domains is a generalization of the two domain model of the primitive systems discussed previously. Multics allowed the operating system itself to be subdivided, so that only the process manager operated at the highest level, level 0. The virtual memory manager was at level 1, accounting was at a more restricted level, things like line-printer spoolers were even more restricted, leaving a variety of levels were available to users.

    A user might run undebugged code at level 15, so that bugs in the code wouldn't damage other parts of the user's program.

    To further complicate the Multics model, some pages can be marked as gateways. A procedure call to address zero in a gateway page is the only legal memory reference from a process running at a numerically higher level.

    This causes the current level of the process to be pushed on the stack along with the return address. The called procedure then runs at the level of the segment containing it, and the level is reset to the original level when the procedure returns.

    A MULTICS gateway is the generalization of a trap in a two-level security system. In addition to allowing system calls to run at the system level, it allows users to divide their code into more secure and less secure segments.

    For example, consider a system where Joe, one of the users, owns a private database (say Chem Abstracts), and wishes to sell access to that database on a per-use basis. This means that Joe can't allow direct access to the database, he only wants other users to access it through his access code (which produces an appropriate billing record every time someone calls it).

    Joe would simply arrange for his access code to run at a low numbered level (also called an inner ring), and he would require that users wishing to run his code do so from a higher numbered level (or outer ring), crossing a gateway as they call his code.

    The above kind of consideration was a natural outcome of thinking about a computer utility -- the motivation behind MULTICS.

    Note, however, that the strictly hierarchic scheme of MULTICS does not solve some problems. For example, it fails to solve the problems of two mutually suspicious users. Either one or the other must be given greater access rights if code from both users must share an address space.

  10. The UNIX Setuid Mechanism

    The UNIX setuid mechanism provides another example of a domain switching mechanism. Under UNIX, there is one bit associated with each executable file indicating what user ID that file is to be run under. Recall that the user ID is needed to determine what files a running program may open.

    Under UNIX, the default user ID used when a new program is executed (by the exec system call) is the user ID of the program that requested the exec. If the setuid bit on an executable file is set, however, executing that file will cause it to run under the user ID of the program's owner.

    The Setuid bit completely solves the mutual suspicion problem. It was patented by Ken Thompson, the originator of UNIX.

    To illustrate this, consider the following scenario: Joe has a program that Fred needs to run, but Fred doesn't trust Joe and Joe doesn't trust Fred. If Joe releases the program so Fred has execute-only access to the code and the setuid bit is turned on, then Fred cannot steal Joe's secrets by examining the code, and when Fred runs the program, the program cannot access any data that Joe could not access directly.

    Furthermore, if the program requires data from some file of Fred's, Fred can preven Joe from directly reading that file while still allowing the program to access the file by opening the file prior to executing the program and then letting the program read the file as, for example, standard input.

    The result is that the only files of Fred that Joe's program can damage are the ones Fred left open when he called the program (of course, it can also access any of Fred's data that is publically accessible). Even though Fred ran it, Joe's program runs under Joe's user ID and has free access Joe's data.

    The UNIX setuid bit is considered dangerous, and it is. Users who need to solve the mutual suspicion problem must use it, but they should not do so until they understand exactly what they are doing. Careless use of this bit has created serious problems in the past, including major security holes in Emacs.

    Furthermore, more recent implementations of UNIX have actually defeated some of the functionality of setuid by distinguishing between a user's effective user ID (the ID used for opening files) and the user's real user ID. The problem is that the setuid system call allows a program to force execution with the real user ID of the current user, thus opening files that, under the original UNIX specification could be hidden from an untrusted program.