Assignment 6, Solutions

Part of the homework for 22C:112, Spring 2011
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Background: The simplest adequate implementation of a disk scheduler consists of a linear array of pointers to queues, one per cylinder of the disk. When a disk request is enqueued, it is added to the queue for the cylinder of that request. When the disk interrupt service routine begins to serve a cylinder, it sets that queue aside, replacing it with an empty queue, and then works on the set-aside queue. When successive interrupts empty the set-aside queue, the interrupt service routine moves to a new cylinder.
a) The need to set aside the queue before serving it is the result of what could otherwise happen if disk requests were enqueued in the queue for the cylinder currently being served. What is the problem and why is it solved by setting the queue aside before beginning to serve it? (0.5 points)

If the interrupt service did not set aside the queue for the current cylinder before working on it, and if a user kept enqueueing I/O requests for the current cylinder as fast as the interrupt service routine processed requests, the interrupt service routine would never service requests to other cylinders. This would violate our requirement that the scheduler be fair.
By setting aside the queue before starting to service it, new requests for the same cylinder fill a new queue. This queue will be serviced only after all other cylinders have been given their chance, therefore guaranteeing fair service to all cylinders.

b) Consider this alternative approach: The queue is maintained as a binary search tree, sorted by linearized disk address (of the various BST organizations, a splay-tree works best, but this is not relevant to the problem). The left child is for all requests less than or equal to a node. A unidirectional elevator algorithm is used, traversing the tree from left to right and deleting each node as it is served. Does this approach suffer from the problem identified above? Explain why or why not. (0.5 points)

No. The key is "the left child is for all requests less than or equal to the current node" combined with "left to right traversal". Because of this combination, new requests for the same disk address will always be placed "behind the elevator" as it moves forward, so it will always make forward progress. No pattern of enqueues can prevent fairness.
If left to right traversal were used with the rule that "the right child is for all requests greater than or equal to the current node" the situation would be different. In that case, new requests for the same disk address could prevent the scheduling policy from ever advancing to a different disk address.
Background: Consider the following organization for a file system: Each file in the file system is given a unique 32-bit file ID number. Each sector of the file has, of course, a sector number in that file; assume this is also a 32-bit number. Therefore, each sector in the file system can be given a unique number that is simply the 64-bit concatenation of the file number with the sector number.
One way to organize the sectors of a file on disk is to use this 64-bit sector number as the key in a search tree, where the leaves are the disk sectors of files, and the internal nodes of the tree are index sectors used to locate the correct leaf. The B-tree data structure is particularly appealing for this purpose (see the Wikipedia entry on B-trees).
Assume that disk sectors are 512 bytes, and that linearized sector numbers are 64 bits long.

Some preliminary work. Index sectors in this B-tree will need to hold pairs of the form [filesector,disksector] which are 128 bits long, that is 16 bytes. As a result, index sectors will hold at most 32 entries. A quick check of the definition of B-trees given on Wikipedia shows that in a B-tree, the average population of an index node is about 3/4 of the maximum, so we expect the B-tree to be approximately a 24ary tree.
For reference, here are the powers of 24:
24, 576, 13,824, 331,776, 7,962,624.

a) If the aggregate content of all data in all disk files occupies 1 gigabyte (that is 2³⁰ bytes), how many disk sectors would be occupied by the B-tree needed to provide access to this data. (0.5 points)
First, 2³⁰ bytes of data will at least 2²¹ sectors of 512 (2⁹) bytes each. That is about 2 million sectors.
The number of index sectors will be about 2²¹/24 which is 87,382 (plus 1/24th of this, 3641, plus 1/24th of that, 152, plus 1/24th more 7) which is exactly 91,182. Giving an exact answer is stupid in this case, since the exact answer depends on exactly how many index nodes there are, and this depends on the history of insertions and deletions in the B-tree, which we don't know. The best answer is "on the order of 100,000 index nodes."

b) If the aggregate content of all data in all disk files occupies 1 gigabyte, How many disk accesses would be required to find the location of a randomly selected disk sector. (0.5 points)

The height of this tree would be expected to be at least log₂₄2²¹, which (rounding up to the next integer) is 5, determined from the table of powers of 24 given above. This is the answer. It is probably exact, since the height of the tree is very stable over huge variations in how things are sliced and diced between the different index sectors.
Background: Consider the problem of allocating space to growing files. If data sectors were the only concern, and there were only one file, the optimum way to allocate space for one file would be to allocate every second or third sector along one track to the file (catching the interleaved sectors on successive revolutions), filling successive tracks of one cylinder until the cylinder is full, and then filling the adjacent cylinders successively. Interleaving is needed because the interrupt service routine needs time to work and as a result cannot schedule operations on the immediately succeeding sector.
a) If the file system uses some kind of index sectors to locate the data sectors of a file, how would you modify the above scheme to allow files to be read as quickly as possible? (0.5 points)

I would store the index sector that references each block of successive data sectors right before those data sectors, since it will need to be read before those data sectors are read. Aside from this, I would not change the sequential order suggested above.

b) Suppose your application has multiple files open for output and it occasionally adds sectors to each of them. Suggest an allocation policy that tries to appoximate an optimal allocation pattern without knowing in advance which files will end up being large files and which will stay small. (0.5 points)

Until disk space is very sparse, always store each file starting on a separate track. Basically, you allocate the whole track to the file assuming it will continue growing to fill that track, and only after the file is closed do you return the unused part of the track to the free-space pool. If an application opens several files for output, preferentially locate their initial tracks in the same cylinder, some cylinder that has enough free tracks available to start all of the files.