22C:116, Lecture 12, Fall 1999

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Timer hardware

    Before discussing file systems, it's useful to spend some time on a simpler class of devices: timers. In a sense, a timer is the simplest of all I/O devices, because it performs no I/O. There are many kinds of timers; the simplest is the periodic timer. All it does is request a periodic interrupt.

    All modern operating systems, UNIX, Windows and Mac/OS included, use periodic timers. In the case of early UNIX the timer fired 60 times per second, a frequency derived from the power supply. Modern UNIX systems almost universally have the system clock fire 100 times per second. In the case of the DOS and Windows environments, the standard interval is 18.2 ticks per second. Most modern systems use hardware that is able to provide far more general timer service, but commercially available operating systems almost always ignore the generality of the hardware in favor of using it to provide a simple periodic interrupt.

    As an example of a very simple hardware interface, consider a timer that fires every millisecond. This might present the following interface to the software:

    expired -- once a millisecond, the timer sets this single bit interface register. When this bit is set, the timer also requests an interrupt. The timer interrupt service software must reset this bit before returning from or re-enabling interrupts in order to turn off the interrupt request.

    This simple timer will be used for the remainder of this lecture, but note that far more complex timers are common. For example, interval timers contain a register that is decremented at a constant rate, for example, once per microsecond. The timer requests an interrupt when the register reaches zero, and the interrupt service routine may load a new count in the register to request the next interrupt.

  2. Timer Software

    It is rare to find software that requires a periodic interrupt at the same rate as the hardware period of the timer. When the application requires a periodic interrupt at longer intervals, however, this can by synthesized from the basic periodic interrupt of the hardware clock as follows:

    periodic-interrupt-service-routine:
       reset( expired )
       countdown := countdown - 1;
       if countdown = 0 call user-service
       return
    
    Here, the procedure user-service is called at the end of an interval defined by the periodic decrementing of the variable countdown; user-service should reset countdown each time it is called. If user-service always sets countdown to the same value, the user routine will be called periodically; if user-service sets different values each time it is calle, it will be aperiodic. This creates, in software, a programmable interval timer, but most modern real-time-clock hardware directly supports this function, with a countdown register decrementing at several megahertz and the rule that the hardware will request an interrupt when the register reaches zero.

    Application programs rarely want just one interval timer. What they typically want is a service such as the following, callable from within any user process:

    wait( t ) -- when a process calls this operating system service, that process will be delayed t time units (the particular time unit depends on the system; we will use milliseconds for the examples presented here).

    Exercise: How can you implement the example wait() service in terms of the UNIX timer primitives -- see the man page for setitimer() for documentation.

    On a system with only one process and an interval timer, this service would be trivial to implement. On a system with more than one process and a periodic timer, this service is far more interesting. In such a context, many processes may be awaiting different times, and there may easily be more than one process waiting for the same time instant.

    Thus, there must be a list of waiting processes, with a record, for each process, of the time it is awaiting. When a timer expires, it must somehow search this list and wake up each process waiting for the current instant.

    The need to block processes and then wake them up suggests that each timer record will have a semaphore on which the process blocks. This leads to the following data structure:

    Timer record:
       link fields, as needed for list management
       delay -- the requested time delay
       sem -- semaphore used to block process
    
    Timer list:
       a linked list of timer records
    
    The wait service seen by an application program would then be as follows"
    wait( t )
       get a timer record r
       initialize r.sem to a count of zero )
       r.delay := t
       append( r, timer-list )
       P( r.sem )
    
    A single timer record can be permanently attached to each process. The interrupt routine needed to support this service is as follows:
    periodic-interrupt-service-routine:
       reset( expired )
       for each timer record r in timer-list
          r.delay := r.delay - 1
          if r.delay <= 0 then
             remove( r, timer-list )
             V( r.sem )
          endif
       endloop
       return
    
    Code written along the above lines is actually used on some systems, but it begins to behave very badly on systems with large numbers of waiting processes. The problem is that interrupt service routines should never perform open ended computations such as scanning of linked lists. This work should be moved to an application process if at all possible.

    Nonetheless, the above code implements what can properly be called a virtual interval timer available to each applications process. This notion of virtual timers is an important example of virtual device interfaces; its primary importance is that timer, as a class of device, have the simplest of semantics.

    Exercise: Explore the documentation for the UNIX timer services and write a critical essay on the deficiencies of these services. There are many!

  3. Improving Timer Performance

    The key to avoiding complex interrupt processing in the implementation of timer services is to rely on a sorted list of timer records, each holding the time being awaited. This requires the following data structure:

    Timer record:
       link fields, as needed for list management
       time -- the requested time at which the timer should fire
       sem -- semaphore used to block process
    
    Timer queue:
       a priority queue of timer records, sorted by time
    
    Clock:
       the current time
    
     _____________     ______     ______     ______
    | Timer Queue |-->| Link |-->| Link |-->| Link |--X
    |_____________|   |------|   |------|   |------|
                      | time |   | time |   | time |
                    / |------|   |------|   |------|
     Sort by order /  | sem  |   | sem  |   | sem  |
     of time fields!  |______|   |______|   |______|
    
    If the system is to run for long periods, the variables used to hold clock and the time fields of timer records must have sufficient precision that overflows are not likely during the life of the system. A year's worth of milliseconds will overflow a 32 bit counter, so realistic systems should 48 or 64 bit counters.

    Some systems maintain all time values in, for example, milliseconds since some arbitrary dawn of time. Others maintain time values in more expensive forms, with fields divided out for year, day, hour, minute, second and millisecond. There is little reason to do the latter, however, since these divisions are usually only needed for the output of dates, and internal time and interval arithmetic is simplified by holding time values in a simple form.

    Exercise: The classic Unix representation of the date uses a signed 32 bit count of the seconds since Midnight January 1, 1970, GMT. When will the UNIX date overflow? (It is interesting to note that MacOS and UNIX were both designed without Y2K errors in their operating kernels.)

    Assuming that the time fields are absolute times, in milliseconds, and that the timer interrupt service routine maintains a global variable called clock, measuring milliseconds from the beginning of time, the wait service can be coded as follows:

    wait( t )
       get a timer record r
       initialize r.sem to a count of zero )
       r.time := clock + t
       enqueue( r, timer-queue )
       P( r.sem )
    
    The interrupt routine needed to support this service is as follows:
    periodic-interrupt-service-routine:
       reset( expired )
       clock := clock + 1   -- count time
       while head( timer-queue ).time <= clock
          r := dequeue( timer-queue )
          V( r.sem )
       end loop
       return
    
    This code rests on a priority queue. The simplest such data structure is a sorted linked list, where the enqueue service sorts new items into the list in time order, the head of the list is always available, and the dequeue service simply removes the head element from the list.

    This simple implementation moves most of the work to the enqueue service, where it belongs, but if the list is long, the enqueue service may take some time. There are faster algorithms that use binary trees or heap-ordered trees to achieve O(log n) operation times on the queue. You can get more information on this subject from An Empirical Comparison of Priority Queue and Event Set Implementations in Communications of the ACM, 29, April 1986, pages 300-311. Code for interesting priority queue algoritnms is available to WWW users.

    Adapting the above code to a programmable interval timer poses some problems. Primary among these is the problem of maintaining the clock. Every time the interval timer expires, the clock can be updated to reflect the interval that has just passed, but for inquiries about the clock value between updates, the running count (countdown, in the example given earlier) must be inspected to get the current value.

  4. Date Overflow Considerations

    The above code relies on a global variable "clock" that can be incremented once per timer interrupt. This is a source of significant problems in real systems. For example, consider a timer that increments once per millisecond. This will increment by 1000 per second (so about 10 bits of timer precision are devoted to counting seconds. Counting 60 seconds per minute takes roughly another 6 bits of precision, and counting 60 minutes per hour takes another 6 bits, leaving 10 bits out of a typical 32 bit word available for counting hours. 1000 hours is just 40 days of 25 hours each, so we can quickly estimate that our 32 bit counter will overflow in just 6 weeks! If we replace rough estimates such as those used above with accurate arithmetic, we get a clock lifetime of 49.71027 days.

    Using the above algorithm, we would expect our system to run correctly for a month, and then we would expect a brief burst of irregular behavior as the clock rolled over, after which it would deliver correct performance for another month. This is hardly acceptable!

    Therefore, we seek a better solution!

    One approach is to simply use a higher precision counter. For example, we could replace our 32 bit counter with a 64 bit counter; with this, our system would last for millennia instead of mere months. We pay a price, however, in slow arithmetic, unless we're running on a machine with a 64 bit ALU.

    Another approach is to note that, while we may well want our system to run for longer than one month, most of the timed delays requested by users are likely to be on the order of seconds. We can easily rewrite the timer data structures to support this by replacing the time field in each timer record with a relative delay field:

    Timer record:
       link fields, as needed for list management
       delay -- the delay between the previous timer and this
       sem -- semaphore used to block process
    
    Timer queue:
       a sequential queue of timer records, sorted by expiration time
    
    Clock:
       the current time
    
     _____________     ______     ______     ______
    | Timer Queue |-->| Link |-->| Link |-->| Link |--X
    |_____________|   |------|   |------|   |------|
                      | delay|   | delay|   | delay|
     Relative delay / |------|   |------|   |------|
     from one record  | sem  |   | sem  |   | sem  |
     to the next.     |______|   |______|   |______|
    
    This data structure adds only marginally to the complexity of the timer services, while it allows very fast arithmetic to be used for typical short delays.

  5. Real Time Considerations

    The semantics of timers is not always immediately obvious. When a process says wait s seconds this does not mean to wait exactly s seconds and then continue. Rather, it means to wait at least s seconds. If there are other ready processes at the end of the wait, we generally make no attempt to predict which process will be running at the end of the wait; we merely insist that the waiting process become ready when the timer expires.

    This is consistent with the usual weak assumptions made about parallel processes -- that they advance unpredictably, so that we can make no assumptions about how CPU resources are divided between them. Usually, proof of correctness under these weak assumptions is easier than proof under more complex assumptions, so we use these weak assumptions even when stronger assumptions are reasonable.

    Note that the presence of a timed-wait or equivalent service is one of the defining characteristics of a real-time operating system! Real-time applications, demand stronger assumptions. Typically, the defining characteristic of a real-time system is the presence of deadlines by which the software must complete some actions.

    When parallel programming is applied to real-time systems, we must ask if there is a feasible schedule. That is, we ask not only if there is enough CPU time overall to complete the entire computation, but we also ask if each task can accomplish the necessary computation before each deadline.

    If there is no feasible schedule, the system cannot succede, but if there is a feasible schedule, we must also ask whether the scheduling algorithm is able to find the feasible schedule among the infinite set of possible schedules. Two classes of real-time scheduling algorithms have been widely studied.

    Rate monotonic priority scheduling will always find feasible schedules if there is approximately 1/3 idle CPU capacity. In this scheme, processes are scheduled with priorities related to the intervals between deadlines. Processes dealing with deadlines that have a short interval are given high priorities, while those dealing with deadlines that have a long interval are given relatively lower priorities.

    Usually, rate monotonic scheduling is discussed under the assumption that intervals are periodic, in which case, the priority of a process is fixed proportionally to the rate at which it must meet deadlines. In fact, there is no reason to limit this scheme to periodic deadlines, but introducing aperiodic deadlines does force the system to use dynamically varying priorities.

    Deadline based scheduling uses the time of the next deadline to be met by a process as its priority. Thus, if a process must finish some job by time t, the priority of that job is t. Distant future deadlines have low priority, while imminent deadlines have high priority. Deadline basee scheduling is somewhat uncommon in practice, but it can find a feasible schedule even when there is no excess CPU capacity.