Homework 7 Solutions

22C:116, Fall 2001

Douglas W. Jones
  1. Problems from the text:
    a) Do problem 5 on page 449.
    If we had no open system call, the read and write calls would, of necessity, have to include the literal file name and file position in place of the file descriptor used in UNIX.

    Having no open file descriptors would change some aspects of the semantics of running programs! In UNIX, a running program can open and then access a file -- and we could still do this using the modified model of this assignment. However, in UNIX, one program can open the file and then run another program that reads or writes the file -- the open file descriptor is passed from one to the other, and the recipient will be able to access the file, as opened, even if the recipient could not open the file. The proposed change would not permit this!

    The second change would be one of efficiency. Opening the file involves traversing a directory structure and checking access rights. If we must do this with each access to a file, we introduce considerable overhead. This overhead could be reduced, possibly to acceptable extent, by careful maintenance of a per-process cache, maintained by the system, associating the file names each process has recently referenced with the relevant parts of the directory information.


    b) Do problem 9 on page 450.
    We can use this flat file system to imitate a hierarchical file system such as that of UNIX by simply using the entire path name as a file name. Thus, we might have a file named "usr/jones/bin/testprog" on this system. As far as the system is concerned, the slash has no special significance; it is just a character. As far as users are concerned, though, it is a path delimiter.

    We'd probably want our user-level interface routines for the file system to maintain a current working directory. If the user's file name begins with slash, it would be taken as a globally defined name and used as such. If the name doesn't begin with slash, it would be prefixed with the current working directory's name before being given to the system.


    c) Do problem 19 on page 450.
    Elinor is right because the search of the i-node table in RAM is not just for the purpose of saving memory! If two processes have the same file open, the system can only resolve interactions between the processes if the system tracks the fact that the two are using the same i-node. For example, if the i-node was not shared and one process extended the file, changing its length, the other process would not learn of this change!

    d) Do problem 24 on page 451.
    Changes that Oliver Owl makes to his thesis may not be properly backed up! If there is ever a system crash, when he restores his thesis from backup, he may get something very strange, with some parts of the thesis collected from a prior version and some parts reflecting changes he has made. Depending on the file system organization and the details of the backup program, he may even find arbitrary deletions and arbitrary inclusions of miscellaneous data from the same or other files!

    However, if the file system is written to interact with the backup program, so that files modification during the backup process causes those files to be re-backed up or if the text editor application maintains a stable storage model of files being edited, Oliver Owl will be protected from his foolishness. Most file systems and most text editors do not do this, but it can be done!


    e) Do problem 30 on page 451.
    If the average seek is 13 cylinders, at 6 msec per cylinder, it will take 78 msec to seek. Assume a 100 msec rotational latency (1/2 revolution) per block, and 25 msec I/O time per block. Therefore, it takes 78 + 100 + 25 msec or 203 msec to read each block. At 100 blocks per file, the total is 20300 msec or 20.3 sec.

    If the average seek is 2 cylinders, at 6 msec per cylinder, it will take 12 msec to seek. Assume a 100 msec rotational latency (1/2 revolution) per block, and 25 msec I/O time per block. Therefore, it takes 12 + 100 + 25 msec or 137 msec to read each block. At 100 blocks per file, the total is 13700 msec or 13.7 sec.

    If we had an even better file system, it would try to select consecutive sectors in a track whenever possible; since this system has 8 sectors per track (inferred from the ratio of transfer time to rotational latency time) we can probably do fairly well with this optimization.

  2. The Question: A linear search for some sector of some file is inefficient! One alternative would be to treat the file number as the I-number of the file, and use an I-table, as in UNIX, to locate all of the sectors of each file. Another alternative, the one chosen by Xerox, was to concatenate the file number and sector number, and then use the resulting double integer as the index into a B-tree, stored on disk, used to gain fast access to the sectors of any file.

    Assuming that the disk has just been mounted on the host computer, how many read accesses would you expect it to take to find some sector of some file. Assume a 1 Gigabyte disk with 4K bytes per sector, and assume that both the sector number and the file number are 32 bit integers, and assuming that the disk is about half full, with the average file size being 8K but some few files being very large.

    1,000,000,000 bytes on disk, half used, so 500,000,000 bytes of data. The sector size is 4K bytes, so there are 125,000 sectors of data.

    The keys used in the B-tree are 64 bits (2 32-bit numbers), and selecting one 4K sector from the 250000 sectors on the disk takes, at minimum, an 18 bit disk address. For the sake of simplicity, we'll use a 32 bit disk address, so each B-tree index-node entry takes 64 plus 32 or 96 bits, which is 12 bytes. Therefore, each B-tree index node contains, at most, 341 entries; if the average index node is 3/4 full, we have, on average, a 256-ary tree.

    2562<125,000<2563, so our B-tree has 3 levels of indexing, with all the leaves on level 4. To read a leaf, therefore, takes 4 disk accesses if no B-tree index nodes are cached in main memory. (they aren't because we just mounted the disk!)

    In contrast, if this is a UNIX system, a typical file (2 sectors long) will have the disk addresses of its sectors contained entirely in the I-node, so, given a sector number and an I-number for a file on a freshly mounted disk, all we need to do is read one I-node from the I-table on disk, then read the desired sector. This means, just 2 accesses. Unix is cleverly optimized for fast access to the first sectors of any file!

  3. The Question: Suppose a problem occurs that destroys some number sectors somewhere on the disk, possibly including some sectors of file 0. How much of the file system can be reconstructed after this accident. Contrast this with a similar accident on a UNIX file system. Which system minimizes the damage? Assuming both systems use comparable RAM cache to speed disk access, what performance penalty would you expect to pay for this minimized damage?
    Under the XDFS, a linear search of the disk can find all data sectors that survived the disaster and reconstruct a B-tree allowing efficient access to those sectors by file number (equivalent to the UNIX concept of I-number).

    Traversing this B-tree, we can then find all sectors claiming to be sector 0 of some file. From these, we can verify, or if needed, reconstruct the directory entries that reference each file where sector 0 survived the crash. From the directory entries that survived the crash, we can verify, or if needed, reconstruct sector zero of each data file. Only in cases where both the directory entry for and sector zero of a file were both destroyed in the crash will we be unable to properly give that file a home in the post-crash directory hierarchy.

    In contrast, if the I-node describing a UNIX file is lost in a crash, there is no way to recover that file. The UNIX fsck utility can perform some recovery, however, using the . and .. links in each directory to rebuild the directory hierarchy after a crash. Once this is done, it is possible to sweep through unused disk sectors and find fragments of lost files, but there is no way to distinguish between files that have been deleted, possibly some time ago, and files who's I-nodes were destroyed in the crash. Furthermore, there is no way to automatically organize the sectors of lost files into the correct order other than by examining them with attention to their content and knowledge of the expected data format.