22C:116, Lecture 20, Spring 2002

Douglas W. Jones
University of Iowa Department of Computer Science

  1. File Systems -- Allocating Sectors to Files

    When a user writes off the end of a physical disk or a disk partition, the driver simply returns an error indication and there is no change to the nature of the drive or partition. When a user writes beyond the end of a disk file, the situation is quite different. We expect the system to enlarge the file in order to hold the additional data written.

    Whenever the write operation on any memory medium causes storage to be allocated to hold the data written, we refer to this as demand allocation. Demand allocation is frequently used for the stack segment in virtual memory systems - an attempt to push beyond the current limit of the stack segment causes a page fault, and the system response to this fault is to enlarge the stack segment by one page frame. An attempt to read beyond the end of the stack segment is handled quite differently, it is an error.

    With disk files, similarly, we expect attempts to read from beyond the end-of-file to return an end-of-file indicator, but write operations beyond the end-of-file force demand allocation of additional space to that file.

    Given a file and the request to add a sector to that file, what disk sector should be allocated. Most early file systems did nothing interesting here, they simply allocated the next available sector from the free-space pool, without regard to what sector it was or what file it was being added to.

    The problem with this is that the sector added to a file can have an effect on the read-time for that file. If the sectors of a file are arranged carefully, paying attention to the optimal interleave, and skew patterns for the disk being used, and using consecutive tracks within a cylinder before changing cylinders, the effects of latency (both rotational and head motion) will be minimized. On the other hand, if sectors are allocated at random, the latency times will average half a rotation for each sector, plus a random seek.

    The "aftermarket" solution to this problem is called a "disk defragmentor". This is a program that can be run on a file system that has been built under such a poor file system. A typical disk defragmentor does rearranges the storage allocated to the files so that each file is stored in consecutive disk sectors and all of the free space is lumped into one free block. If this program is run periodically, access times to files will be far better than they would be under the basic file system that the defragmentor was written for.

    A far better solution is to have the file system itself work to keep the files stored contiguously. To do this, the interface to the storage allocator for the file system must be changed. Instead of just asking for another sector, the user must specify where on disk that sector should be. When extending a disk file, the system gives the address of the last sector of the file to the allocator, and the allocator tries to find a free sector that can be reached with minimal latency from that sector. When creating a new disk file, things are more complex because starting a file at the first available free space might prevent an older file from growing. Therefore, so long as there are unused disk tracks, it might be a good idea to start each new file at the start of a disk track, in order to allow plenty of room for growth. If the file never grows beyone one or two sectors, the extra space in that track will eventually be used to extend other files allocated in the same or adjacent cylinders.

    Another idea is to use anticipatory allocation. So long as a file is open and being written, it makes sense to allocate disk space to that file in large blocks. This way, big files can be written very quickly with few references to the disk's free-space data structures, and the resulting storage allocation on disk will readable with very little latency. When the file is finally closed, any allocated space beyond the end-of-file can be deallocated. (This assumes, of course, that the end-of-file is the highest byte written, not the highest byte allocated to the file -- this difference is easy to maintain in the software for the file system!)

    Most UNIX systems use a combination of the above strategies. Typicaly, the root directory of each file system is allocated in the middle of the disk or partition allocated to that file system, and each time a new directory is created, the system tries to create that directory in a different cylinder near the parent directory. When a new data file is created within a directory, the system tries to put it in the same cylinder as the directory, or in a nearby cylinder. As a result, the latency time needed to traverse the directory tree from root to a particular file remains small, and the latency time it takes to read the file itself is also small.

  2. File Systems -- File Names Versus I-Numbers

    The UNIX file system introduced a new idea when it was introduced, a hidden software layer dividing the directory manager from the low-level issues of file management. In previous systems, as well as in many later systems, it was common to store the data relating files to their locations on disk directly in the directory -- in effect, the directory manager was required to be aware of how files were stored on disk.

    In UNIX, on the other hand, the directory manager is comparatively trivial. A UNIX directory entry has the following (abstract) structure:

    	name     - textual - a component of the file name
    	i-number - integer - the i-node number of this file
    
    Thus, all the directory manager does is relate textual file names to file numbers, known as i-numbers. The ls command, in UNIX, lists the names of the files in a directory. The ls -i command lists names and i-numbers.

    The i-number of a file is an index into an array of i-nodes stored on disk that give the actual attributes of this file. For all files, the i-node holds:

    	File type -- (normal file, directory, character, or block)
            Owner ID -- each file is owned by a specific user.
            Group ID -- users may belong to groups, each file is in one group.
            Owner rights -- the access rights conferred to the owner.
            Group rights -- the access rights conferred to group members.
            Public rights -- the access rights conferred to everyone else.
    	Date of last modification
    
    To see this information, use the UNIX ls -l command; this lists these fields (and other information ) from the i-node, along with the file names.

    For normal files and directories, the remainder of the I-node gives the file size and describes where the file is stored on disk, giving the disk addresses of the first few sectors of the file, the disk address of a sector holding the disk addresses of more sectors of the file, etc.

    The file types character and block are used for device drivers. In this case, the remainder of the i-node gives the major and minor device numbers, where the major device number names a driver, and the minor device number selects one of the devices that driver can manipulate. Some devices have a deeper numbering hierarchy, so there is also some support for unit and subunit numbers.

  3. File Systems -- Disk Layout

    In a UNIX file system, the i-table, that is, the table of i-nodes describing files in that file system, is usually stored in the middle of the range of cylinders controlled by that file system. In theory, the i-table could be stored anywhere, but because access to files requires frequent lookup in the i-table, the latency time between looking for an entry in the i-table and looking at a particular set of data can be minimized by putting the i-table near the center of the disk range of cylinders.

    When a file system is mounted on a system, the file system software needs to find the basic dimensions of the disk. The information needed for this is stored in what is called the superblock. This must be stored in an easy to find location such as sector 0 of the device (sector 0, track 0, cylinder 0). It is easy to infer that the superblock must contain the following information:

    	The range of disk addresses managed by this file system
    	Starting disk address of the i-table
    	Number of entries in the i-table
    	Some kind of free-space data structure
    

  4. Caches

    Opening a file involves a search of the directory structure for the file name being used. For each directory on the path from the root to the file, the i-node for that directory must be read into memory. Finally, the i-node for the bile being opened must also be read into memory. I-nodes in memory are at the heart of the open-file data structure!

    Because so many i-nodes are inspected, fast access to i-nodes is crucial. This is accomplished in UNIX by maintaining a (software!) cache of recently used i-nodes in main memory. All of the i-nodes on the paths to popular files and directories tend to stay in this cache, so no disk accesses are usually needed for them.

    In addition, UNIX systems routinely maintain a cache of popular disk sectors, the sector cache. As a result, attempts to read sectors that have recently been read or written are generally handled by memory-to-memory copies from this cache instead of requiring disk accesses.

    This scheme offers performance levels comparable to those of RAM disk without the need to explicitly copy from hard disk to RAM disk at system startup time.