If an opened file is some kind of software that fits the model:
___ _______ ____________| _______ | READ | | | READ | |_______| Open | |_______| | WRITE | File | | WRITE | |_______|____________| |_______| |___What is the open() operation supported by the file manager? In effect, it creates an open-file data structure, linked as needed to the data structure describing the underlying disk and to the code that translates addresses of sectors within a file to addresses of sectors on the underlying disk.
For example, consider the following data structure for implementing the generic device interface:
record pointer to READ routine pointer to WRITE routine device-dependent fields end recordThe specialized version of this for an open file might look like the following:
record pointer to READ routine pointer to WRITE routine pointer to disk interface record mapping information for sectors of the file end recordGiven this, the open service takes a textual file name and returns a pointer to one of these open file records, newly created for access to the named file.
The primary job of a directory manager is to maintain a mapping between textual file names, as presented to the open function, and the information needed to build an open file description record.
The directory itself is the data structure on disk used by the directory manager in performing this mapping.
The simplest approach to directory management was used on many early computer systems, including most early floppy-disk based systems. On such systems, a fixed part of each disk was reserved for the directory, which took the form of a flat table, for example:
Directory = array [1..64] of record name: packed array [0..7] of char; start-sector: disk-address; sectors: integer; end record;Here, as in many crude file systems, a linear additive mapping scheme is used to describe the disk space occupied by a file. It is trivial to extend such a directory to include such features as the date of file creation, the length of the file, in bytes, and similar details.
In general, users don't like flat name spaces, and they quickly start building hierarchic spaces using simple conventions such as the now almost universal
filename.extensionformat. In fact, the name of this file,
/csf/jones/.public-html/opsys/notes/0220.htmlcould as easily be treated as a single long character string in a flat name space as it can be treated as a description of a path through a tree. Flat directory structures become unwieldy, though, so most modern systems support some variant on the tree-structured directory.
Tree-structured directories appear to have originated with proposals put forth by workers at project Mac for what eventually became the Multics system. These ideas were published in paper by Dennis and Van Horn, in Communications of the ACM, March 1966.
The key idea is to store directory data structures such as that outlined for a simple flat directory system in files. One bit attached to each file determines whether that file contains data or a directory. Whether it is a data file or a directory, all other details of storage allocation for the file are typically the same. Only the interpretation and the set of allowed operations on the file's contents changes.
The UNIX file system is a variant on the theme of hierarchically structured directories. Underlying the user-level directory manager is a primitive flat file system using integers as file names. The integers are simply indices into an array of I-nodes, the I-table, allocated out near the center of the disk:
___________________________________________________ |super| | | | |block| | I-table | | |_____|________________|_________|__________________|The superblock at the start of the disk contains the disk address of the I-table and its size, along with free-space data structures. In order to provide a degree of crash protection, the superblock is actually stored redundantly. The I-table is in the middle of the disk in order to minimize seek times in moving the heads from I-table to data and back.
Each I-node has one bit indicating whether the associated file is a directory or a data file. Other than that, all other attributes such as how storage is allocated for files are the same, whether it's a directory or a data file.
Under the UNIX system, each directory is a sequence of ordered
pairs
UNIX does allow data files to be referenced from multiple directories.
Thus, the same file may appear under different names in two or more
directories.
The redundancy of the UNIX directory structure (self-links and back links)
allows a program called fsck (file system check) to reconstruct the directory
structure in the face of many possible failures. On most UNIX systems,
fsck is routinely run as part of the startup script. The redundancy of the
UNIX directory structure was not designed with support for fsck in mind;
other systems, notably the Xerox Distributed File System (XDFS), have provided
far greater redundancy, so that the entire directory system can be
reconstructed from the data files themselves.
Under XDFS, the prefix on each sector of each file contains the sector
number of that file relative to the file it is in, along with the file
number of the file (analogous to a UNIX I-node). As a result, the entire
data structure describing mappings from
Under XDFS, sector zero of each file contains the file number of the
directory that references that file, as well as the name of the file and
any related information for the file. Thus, by finding sector zero of each
file on the system (by scanning the entire disk) the directory structure
of the file system can be reconstructed. The net result is that if some
sectors on a disk are damaged, no data not in those sectors will be lost.
Under UNIX, the prohibition on directory structures other than a tree
with back links (..) and self-links (.) is imposed in order to avoid problems
with reclamation of storage when links are deleted. Each I-node contains
a count of the number of incoming links; this is incremented when a link to
that I-node is added, and it is decremented when a link to that I-node is
removed. If the count reaches zero, the file is deleted.
The problem with this scheme is that it can't detect it when a directory
becomes inaccessible because the circular and self-links prevent the count
for the directory from going to zero. As a result, UNIX uses one command
to remove links to data files (rm) and another command to delete directories
from the directory tree (rmdir). The Cambridge File System eliminates this
problem by using dynamic garbage collection to reclaim unreachable files or
directories.