22C:116, Lecture 26, Fall 2000

Douglas W. Jones
University of Iowa Department of Computer Science

  1. Clocks Revisited!

    Real-time systems need to be able to measure time! Consider the example of a program to control a stepping motor. Stepping motors are motors that rotate a fixed increment every time an electronic step signal is sent to them. Stepping motors are quite common in computer controlled mechanisms, and the step signals for the motor are usually issued by software. The following example illustrates a fragment of a simple stepping motor control program:

      loop until position = destination
        delay( steptime() );
        step(); position++;
      end loop
    
    The function steptime computes the appropriate time per step as a function of the current motor velocity and whether the motor should be accelerating or decelerating. Since real motor rotors have inertia and are unable to accelerate instantly, calling step too frequently when the motor is barely turning will cause the control system to lose control, and similarly, if the motor is spinning quickly, the software will lose control of the motor if it calls step too infrequently.

    On a centralized system, this bit of code requires a realtime clock with an interface that privides a separate delay timer for each process. In a distributed real-time system, the problem arises of how to deal with time. A natural solution is to install a single real-time clock somewhere on the system. A clock server process could await messages from client processes that wish to be informed of the arrival of some particular time or the passage of some particular interval.

    A typical message to the clock server process might have the meaning "send me a reply after one second" or "send me a message at ten AM". The clock server would maintain a data structure listing the outstanding requests, and when one of the requested times arrives, it would send a reply.

    There are three problems with this approach. First, if communication delays are significant, the accuracy of the clock service will be degraded. Second, if the system is large, the clock server may become a bottleneck, limiting system performance. Third, if messages are lost, recovery is difficult because most protocols for detecting message loss rely on the availability of a clock!

    All three of these objections lead to the desire to use multiple clock servers, usually one on each computer in the network; this eliminates the possibility of lost messages to the clock server and it eliminates communication delays.

    The problem with this is that all of these clocks must be synchronized if the distributed system is to provide useful real-time service!

    Absolute clock synchronization is impossible! The best that can be done is to guarantee that two clocks are within some error bound of each other. As a consequence, the relative times of two events on a system cannot be exactly compared. Either one event is clearly before the other, or the two events occurred within the error bound of the clocks used, in which case we cannot say which event occurred first, an in fact, different observers may draw different conclusions about which was first!

  2. Real Clocks

    Real clocks never keep exact time. Consider first an ideal clock. The clock hands on this clock show the real time, and the ideal clock ticks exactly once per second. Now, consider a real clock, built using any physical mechanism. Of course, the builder of the clock hopes it will tell real time and hopes that it will tick exactly once per second, but the actual interval between ticks will always vary slightly. This variation may be because of thermal effects, quantum fluctuations, vibration, electromagnetic interference, or any of a number of other causes. Whatever the cause, this variation is called clock jitter.

    As a result of jitter, real clocks do not tick exactly once per second, but only at an average rate that the builder hopes will be once per second, after it has been properly regulated. Real clocks are never regulated exactly; instead, they have average rates that run a bit fast or a bit slow. The slow but systematic departure of the real clock time from ideal time is called the clock drift. The mathematics of the random walk caused by the clock jitter guarantees that there will eventually be some clock drift, even for perfectly regulated clocks!

    If you graph the time of day shown on the real clock versus the time of day shown on the ideal clock, where you would hope for a straight diagonal line, you will see a slightly wavy line that starts out with errors centered on the diagonal but slowly and systematically departs from the diagnonal. The wavyness of this line is a random walk determined by the clock jitter, while the slow systematic departure from the diagonal is the error in clock regulation.

    As a result of these considerations, the times shown on any collection of independent real clocks will slowly diverge, even if they are all identical and perfectly calibrated.

    It should be noted that truly independent clocks are frequently very hard to make! When two clocks are near each other, they may influence each other in unexpected and subtle ways. There's a well known story of an 19th century observatory that had three pendulum clocks, and to the surprise of the timekeepers, the pendulums in all three clocks remained perfectly synchronized. It turned out that the clocks were so precisely tuned to the same frequency that once one clock was within a few milliseconds of ticking, the sound of the tick of another clock would jar its escapement free, forcing it to tick. As a result, if the clocks were started randomly, the natural random drift of the clocks would slowly bring them into near synchronization, and once they were synchronized, they remained exactly synchronized. Radio frequency coupling can similarly synchronize modern quartz clocks that are near each other.

  3. Clock Synchronization Using a Central Authority

    If we have a central authority that is trusted, for example, the National Institute of Standards atomic clocks in Fort Collins Colorado, we can synchronize all clocks to that authority. The time from the NIST clocks is broadcast continuously on the WWV radio station; For many years, Heathkit sold a WWV receiver with an RS-232 port on the back. This receiver even had a correction you could set to take into account the propagation delay of the time signal as it traveled from Fort Collins to wherever your receiver was located.

    The GPS (Global Positioning System) satelites provide another relatively authoritative time base; each of these satelites includes a local clock of very high quality, and the broadcast of their local time-of-day is available anywhere on or near the earth.

  4. Adjusting the time.

    Given an authority, for example, GPS or WWV, what do you do if the authority disagrees with your local clock? A simple assignment of a new value to the local time of day can cause chaos! If the time of day jerks forward suddenly, there may be no damage (but in real-time control systems, for example, the stepping motor example, even this may cause loss of control). If the time of day jerks backward, some time of day will occur twice, and time will briefly appear to move backward, with the consequence that, for example, some timestamps will appear to have been made in the future. This can lead to severe problems even in non-realtime applications such as database systems.

    The solution to this problem is to avoid direct assignments to the time of day. Instead, if the official time of day is ahead of the local realtime clock, the local clock can be run fast until the clocks agree, and if the official time of day is behind the local clock, the local clock can be run slower until the clocks agree.

    The UNIX system service adjtime() works this way. A call to adjtime() can be used to request the system clock run fast or slow until it gains or loses the required amount. This is, of course, a privileged system call!

    The adjustments to the speed of the clock made by these services do not involve changing the frequency of the oscillator inside the clock (typically a quartz crystal). Instead, the adjustments are done entirely in software.

    For clock software based on a periodic clock interrupt, for example, one that gives a nominal 100 interrupts per second, the clock might maintain the time of day by counting milliseconds. Normally, the clock software would add 10 milliseconds to the time of day every time there is a clock interrupt. If the clock is instructed to "run fast" in order to gain time, the clock software might add 11 milliseconds for every clock interrupt, while if the clock has been instructed to "run slow", it might add only 9 milliseconds per clock interrupt.

    The problem is a bit more complex when the underlying clock hardware is based on a programmable interval timer, but the solution is equivalent. If the next clock event is to be scheduled 100 milliseconds into the future and the clock has been instructed to make up time, it might set the timer to 90 milliseconds and then, when the timer interrupt arrives, it would update the time of day variable by 100 milliseconds. Similarly, if the clock has been instructed to lose time, it might set the timer to interrupt after 110 milliseconds.

    Note, in real systems, that the hardware clock is usually precise to better than 1 part in 1000, and that when clock is found to be in error, it is typically reset to its correct value by adjustments to the clock speed on the order of 1 part in 100. The 1 part in 10 adjustments used in the above examples would be too abrupt for many applications.

  5. Synchronizing one clock to another

    To keep one clock in a network synchronized to another more authoritative clock on the same network, the following code would typically be called periodically:

        send time request to remote server;
        t1 = local time;
        await t, the time sent by the remote server;
        t2 = local time;
        advance local time by t - (t2+t1)/2 time units.
    
    This code assumes that the remote server responds immediately when it gets a time request, and that the outgoing network delay is comparable to the incoming network delay. As a consequence, (t2+t1)/2 is an estimate of the time at which the remote server inspected its clock in order to get the time t.

    If the interval t2-t1 is significantly greater than the shortest measured interval in previous transactions, it may be prudent to assume that there was contention for resources somewhere in the system, and thus that (t2+t1)/2 is not a good estimate, and thus, that the returned time should not be accepted. This leads to the following more robust version of the code:

        send time request to remote server;
        t1 = local time;
        await t, the time sent by the remote server;
        t2 = local time;
        triptime = t2-t1;
        if triptime < mintriptime
          then mintriptime = triptime;
          else mintriptime = mintriptime + epsilon;
        if triptime < (mintriptime + tolerance)
          then try to gain t - (t2+t1)/2 time units;
    

    Given two computers, each with an unreliable clock, and each running the above algorithm periodically to keep their clocks synchronized, it is not hard to show that the maximum difference between the two clocks will be bounded by the rate at which they drift apart without synchronization times the interval at which they are resynchronized, and that the rate of drift of the composite time relative to real time will be the average of drift rates of the individual clocks.

    There are two generalizations of this algorithm to more than two clocks. One approach is to have each clock in the group synchronize to one other, for example, by arranging the clocks in a logical ring, with each clock synchronizing itself to the time given by the clock to its left.

    A second alternative is to have each clock estimate its error relative to each of the other clocks in the system and then compute an average of the error estimates, possibly weighting this average by how much trust each clock places in each of the others and discarding measurements from other clocks that appear to be unacceptable, for example, because of trip times that are too long. After each cycle of estimating errors, this average can be used to adjust the speed of the local clock.

    The expected drift rate of a composite clock constructed from a number of identical local clocks will be the average of the drift rates of each independent clock. If all clocks assign the same weights to each clock in their local computations, the resulting drift rate will be the weighted average of the independent clock drift weights.

    The expected error of any particularl clock relative to the composite clock will depend on the inherent drift rate of that clock and on the frequency with which it is resynchronized. Frequent resynchronization does not improve the drift rate of the composite, it merely keeps the times reported by the clocks that make up that composite closer to each other. Similarly, the use of a ring topology, where each clock synchronizes itself to only one other, does not change the drift rate, only the error of individual clocks relative to the composite.

    Variations on these kinds of clock synchronization algorithms are used by the various national timekeeping authorities (NIST in the United States) to reconcile their clocks. This is done within the set of clocks used by each national standars organization, and also between the clocks maintained independently by different nations in order to arrive at the value for Coordinated Universal Time, the composite representing all of the high quality national standards around the globe.