Assignment 9, due July 18

Part of the homework for 22C:50, Summer 2003
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: The Unix I-node scheme works pretty well, but it seems awfully complicated. Here is an alternative scheme: Instead of using I-numbers to refer to entries in an I-table, we use I-numbers as the sector number of the sector holding information about the file. In all cases, the first 32 bytes of the sector describing the file hold things like the size of the file, in bytes, the date of last modification, etc. The use of the remainder of the sector depends on the file size as follows:
For small files, the remainder of the sector holds the actual data stored in the file.
For medium sized files, the remainder of the sector holds the disk addresses of the sectors of the file.
For large files, the remainder of the sector holds the disk addresses of index sectors, that is, sectors that hold the disk addresses of sectors in the file.
For huge files, the remainder of the sector holds the disk addresses of index sectors holding disk addresses of index sectors holding disk addresses of sectors in the file.
Part a) Assume sectors are 512 bytes and sector numbers are 32 bits (4 bytes) each. How big is the largest small file, the largest medium sized file, the largest large file and the largest huge file, in bytes? Show your work (that is, show the factors you multiplied to get each of the larger sizes)!
Part b) Assume sectors are 4096 bytes and sector numbers are 64 bits (8 bytes) each, and answer the same questions as were asked in part a.
Part c) One measure of file system performance is how many disk sectors must be read in order to examine the first byte of a file, starting with just the I-number of a file. For what file sizes does the scheme proposed here outperform the Unix scheme? For what file sizes does the scheme proposed here perform worse than the Unix scheme? Assume 512 byte sectors and 32 bit sector numbers for this question.
Background: The old standard Unix backup utility, tar, when given a starting directory, traverses the tree of directories rooted at that directory and copies all the files it finds it finds to the output, along with a record of the file name and other attributes. Traditionally, the output was a tape drive (hence the name, Tape ARchive), but these days, it is as likely to be a CD-RW disk. The tar utility will also restore files from such a backup, recreating the desired directory tree and filling in all the files. (It has been common, from the start, to compress the output of tar; sometimes, people jokingly speak of a "tarred and feathered" archive to refer to what ought properly be called a "tarred and compressed" archive.)
Part a) Explain why tar also serves as a disk defragmenter when you use it to make a backup of a disk, reformat that disk, and then restore from this backup.
Part b) The Unix directory structure allows symbolic links, that is, directory entries that associate a file name with a different file name, given textually, instead of associating a file name with a file number, and it allows multiple hard links to refer to a file, that is, it allows two different directory entries, with different names, possibly in different directories, to refer to the same data file. If you use tar on a directory hierarchy that contains these, what kinds of problems might you want to check for after you restore that hierarchy from the archive?