Before discussing file systems, it's useful to spend some time on a simpler class of devices, timers. In a sense, a timer is the simplest of all I/O devices, because it performs no I/O. There are many kinds of timers; the simplest is the periodic timer. All it does is request a periodic interrupt.
For example, consider a timer that fires every millisecond. This might present the following interface to the software:
expired -- once a millisecond, the timer sets this single bit interface register. When this bit is set, the timer also requests an interrupt. The timer interrupt service software must reset this bit before returning from interrupt or re-enabling interrupts in order to turn off the interrupt request.
This simple timer will be used for the remainder of this lecture, but note that far more complex timers are common. For example, interval timers contain a register that is decremented at a constant rate, for example, once per microsecond. The timer requests an interrupt when the register reaches zero, and the interrupt service routine may load a new count in the register to request the next interrupt.
It is rare to find software that requires a periodic interrupt at the same rate as the hardware period of the timer. When the application requires a periodic interrupt at longer intervals, however, this can by synthesized from the basic periodic interrupt of the hardware clock as follows:
periodic-interrupt-service-routine: reset( expired ) countdown := countdown - 1; if countdown = 0 call user-service returnHere, the procedure user-service is called at the end of an interval defined by the periodic decrementing of the variable countdown; user-service should reset countdown each time it is called. This allows periodic interrupts if it is always set the same, or aperiodic interrupts if it is set differently. In effect, this creates, in software, a programmable interval timer.
Application programs rarely want an interval timer. What they typically want is a service such as the following:
wait( t ) -- when a process calls this operating system service, that process will be delayed t time units (the particular time unit depends on the system; we will use milliseconds here).
On a system with only one process and an interval timer, this service would be trivial to implement. On a system with more than one process and a periodic timer, this service is far more interesting. In such a context, many processes may be awaiting different times, and there may easily be more than one process waiting for the same time instant.
Thus, there must be a list of waiting processes, with a record, for each process, of the time it is awaiting. When a timer expires, it must somehow search this list and wake up each process waiting for the current instant.
The need to block processes and then wake them up suggests that each timer record will have a semaphore on which the process blocks. This leads to the following data structure:
Timer record: link fields, as needed for list management delay -- the requested time delay sem -- semaphore used to block process Timer list: a linked list of timer recordsThe wait service seen by an application program would then be as follows"
wait( t ) get a timer record r initialize r.sem to a count of zero ) r.delay := t append( r, timer-list ) P( r.sem )A single timer record can be permanently attached to each process. The interrupt routine needed to support this service is as follows:
periodic-interrupt-service-routine: reset( expired ) for each timer record r in timer-list r.delay := r.delay - 1 if r.delay <= 0 then remove( r, timer-list ) V( r.sem ) endif endloop returnCode written along the above lines is actually used on some systems, but it begins to behave very badly on systems with large numbers of waiting processes. The problem is that interrupt service routines should never perform open ended computations such as scanning of linked lists. This work should be moved to an application process if at all possible.
Nonetheless, the above code implements what can properly be called a virtual interval timer available to each applications process. This notion of virtual timers is an important example of virtual device interfaces; its primary importance is that timer, as a class of device, have the simplest of semantics.
The key to avoiding complex interrupt processing in the implementation of timer services is to rely on a sorted list of timer records, each holding the time being awaited. This requires the following data structure:
Timer record: link fields, as needed for list management time -- the requested time at which the timer should fire sem -- semaphore used to block process Timer queue: a priority queue of timer records, sorted by time Clock: the current timeIf the system is to run for long periods, the variables used to hold clock and the time fields of timer records must have sufficient precision that overflows are not likely during the life of the system. A year's worth of milliseconds will overflow a 32 bit counter, so realistic systems should 48 or 64 bit counters.
Some systems maintain all time values in, for example, milliseconds since some arbitrary dawn of time. Others maintain time values in more expensive forms, with fields divided out for year, day, hour, minute, second and millisecond. There is little reason to do the latter, however, since these divisions are usually only needed for the output of dates, and internal time and interval arithmetic is simplified by holding time values in a simple form.
If high precision arithmetic is difficult, an alternative scheme for representing the time field in the timer record is as the incremental delay between the previous timer's firing and the current timer. The delay field of the head of the queue would then be the delay until that timer expired, and the delays of successive timers would each be relative to their predeciessors.
Assuming that the time fields are absolute times, in milliseconds, and that the timer interrupt service routine maintains a global variable called clock, measuring milliseconds from the beginning of time, the wait service can be coded as follows:
wait( t ) get a timer record r initialize r.sem to a count of zero ) r.time := clock + t enqueue( r, timer-queue ) P( r.sem )The interrupt routine needed to support this service is as follows:
periodic-interrupt-service-routine: reset( expired ) clock := clock + 1 -- count time while head( timer-queue ).time <= clock r := dequeue( timer-queue ) V( r.sem ) end loop returnThis code rests on a priority queue. The simplest such data structure is a sorted linked list, where the enqueue service sorts new items into the list in time order, the head of the list is always available, and the dequeue service simply removes the head element from the list.
This simple implementation moves most of the work to the enqueue service, where it belongs, but if the list is long, the enqueue service may take some time. There are faster algorithms that use binary trees or heap-ordered trees to achieve O(log n) operation times on the queue. You can get more information on this subject from An Empirical Comparison of Priority Queue and Event Set Implementations in Communications of the ACM, 29, April 1986, pages 300-311. Code for interesting priority queue algoritnms is available to WWW users.
Adapting the above code to a programmable interval timer poses some problems. Primary among these is the problem of maintaining the clock. Every time the interval timer expires, the clock can be updated to reflect the interval that has just passed, but for inquiries about the clock value between updates, the running count (countdown, in the example given earlier) must be inspected to get the current value.
The semantics of timers is not always immediately obvious. When a process says wait s seconds this does not mean to wait exactly s seconds and then continue. Rather, it means to wait at least s seconds. If there are other ready processes at the end of the wait, we generally make no attempt to predict which process will be running at the end of the wait; we merely insist that the waiting process become ready when the timer expires.
This is consistent with the usual weak assumptions made about parallel processes -- that they advance unpredictably, so that we can make no assumptions about how CPU resources are divided between them. Usually, proof of correctness under these weak assumptions is easier than proof under more complex assumptions, so we use these weak assumptions even when stronger assumptions are reasonable.
Note that the presence of a timed-wait or equivalent service is one of the defining characteristics of a real-time operating system! Real-time applications, demand stronger assumptions. Typically, the defining characteristic of a real-time system is the presence of deadlines by which the software must complete some actions.
When parallel programming is applied to real-time systems, we must ask if there is a feasible schedule. That is, we ask not only if there is enough CPU time overall to complete the entire computation, but we also ask if each task can accomplish the necessary computation before each deadline.
If there is no feasible schedule, the system cannot succede, but if there is a feasible schedule, we must also ask whether the scheduling algorithm is able to find the feasible schedule among the infinite set of possible schedules. Two classes of real-time scheduling algorithms have been widely studied.
Rate monotonic priority scheduling will always find feasible schedules if there is approximately 1/3 idle CPU capacity. In this scheme, processes are scheduled with priorities related to the intervals between deadlines. Processes dealing with deadlines that have a short interval are given high priorities, while those dealing with deadlines that have a long interval are given relatively lower priorities.
Usually, rate monotonic scheduling is discussed under the assumption that intervals are periodic, in which case, the priority of a process is fixed proportionally to the rate at which it must meet deadlines. In fact, there is no reason to limit this scheme to periodic deadlines, but introducing aperiodic deadlines does force the system to use dynamically varying priorities.
Deadline based scheduling uses the time of the next deadline to be met by a process as its priority. Thus, if a process must finish some job by time t, the priority of that job is t. Distant future deadlines have low priority, while imminent deadlines have high priority. Deadline basee scheduling is somewhat uncommon in practice, but it can find a feasible schedule even when there is no excess CPU capacity.