## Книга: Distributed operating systems

## 3.1.3. Clock Synchronization Algorithms

### Разделы на этой странице:

3.1.3. Clock Synchronization Algorithms

If one machine has a WWV receiver, the goal becomes keeping all the other machines synchronized to it. If no machines have WWV receivers, each machine keeps track of its own time, and the goal is to keep all the machines together as well as possible. Many algorithms have been proposed for doing this synchronization (e.g., Cristian, 1989; Drummond and Babaoglu, 1993; and Kopetz and Ochsenreiter, 1987). A survey is given in (Ramanathan et al., 1990b).

All the algorithms have the same underlying model of the system, which we will now describe. Each machine is assumed to have a timer that causes an interrupt *H *times a second. When this timer goes off, the interrupt handler adds 1 to a software clock that keeps track of the number of ticks (interrupts) since some agreed-upon time in the past. Let us call the value of this clock *C. *More specifically, when the UTC time is *t, *the value of the clock on machine *p *is *Cp(t). *In a perfect world, we would have *Cp(t)=t *for all *p *and all *t. *In other words, *dC/dt *ideally should be 1.

Real timers do not interrupt exactly *H *times a second. Theoretically, a timer with *H *= 60 should generate 216,000 ticks per hour. In practice, the relative error obtainable with modern timer chips is about 10^{-5}, meaning that a particular machine can get a value in the range 215,998 to 216,002 ticks per hour. More precisely, if there exists some constant ? such that

the timer can be said to be working within its specification. The constant ? is specified by the manufacturer and is known as the **maximum drift rate. **Slow, perfect, and fast clocks are shown in Fig. 3-5.

**Fig. 3-5.** Not all clocks tick precisely at the correct rate.

If two clocks are drifting from UTC in the opposite direction, at a time ?*t* after they were synchronized, they may be as much as 2? ?*t* apart. If the operating system designers want to guarantee that no two clocks ever differ by more than ?, clocks must be resynchronized (in software) at least every ?/2? seconds. The various algorithms differ in precisely how this resynchronization is done.

### Cristian's Algorithm

Let us start with an algorithm that is well suited to systems in which one machine has a WWV receiver and the goal is to have all the other machines stay synchronized with it. Let us call the machine with the WWV receiver a **timeserver. **Our algorithm is based on the work of Cristian (1989) and prior work. Periodically, certainly no more than every ?/2? seconds, each machine sends a message to the time server asking it for the current time. That machine responds as fast as it can with a message containing its current time, *C*_{UTC}, as shown in Fig. 3-6.

As a first approximation, when the sender gets the reply, it can just set its clock to *C _{UTC}*. However, this algorithm has two problems, one major and one minor. The major problem is that time must never run backward. If the sender's clock is fast,

*C*will be smaller than the sender's current value of

_{UTC}*C.*Just taking over

*C*could cause serious problems, such as an object file compiled just after the clock change having a time earlier than the source which was modified just before the clock change.

_{UTC}**Fig. 3-6.** Getting the current time from a time server.

Such a change must be introduced gradually. One way is as follows. Suppose that the timer is set to generate 100 interrupts per second. Normally, each interrupt would add 10 msec to the time. When slowing down, the interrupt routine adds only 9 msec each time, until the correction has been made. Similarly, the clock can be advanced gradually by adding 11 msec at each interrupt instead of jumping it forward all at once.

The minor problem is that it takes a nonzero amount of time for the time server's reply to get back to the sender. Worse yet, this delay may be large and vary with the network load. Cristian's way of dealing with it is to attempt to measure it. It is simple enough for the sender to record accurately the interval between sending the request to the time server and the arrival of the reply. Both the starting time, *T*_{0}, and the ending time, *T*_{1}, are measured using the same clock, so the interval will be relatively accurate, even if the sender's clock is off from UTC by a substantial amount.

In the absence of any other information, the best estimate of the message propagation time is (*T*_{1}–*T*_{0})/2. When the reply comes in, the value in the message can be increased by this amount to give an estimate of the server's current time. If* the* theoretical minimum propagation time is known, other properties of the time estimate can be calculated.

This estimate can be improved if it is known approximately how long it takes the time server to handle the interrupt and process the incoming message. Let us call the interrupt handling time *I*. Then the amount of the interval from *T*_{0 }to *T*_{1 }that was devoted to message propagation is *T*_{1}-*T*_{0}-*I*, so the best estimate of the one-way propagation time is half this. Systems do exist in which messages from *A *to *B *systematically take a different route than messages from *B *to *A, *and thus have a different propagation time, but we will not consider such systems here.

To improve the accuracy, Cristian suggested making not one measurement, but a series of them. Any measurements in which *T*_{1}–*T*_{0} exceeds some threshold value are discarded as being victims of network congestion and thus unreliable. The estimates derived from the remaining probes can then be averaged to get a better value. Alternatively, the message that came back fastest can be taken to be the most accurate since it presumably encountered the least traffic underway and thus is the most representative of the pure propagation time.

### The Berkeley Algorithm

In Cristian's algorithm, the time server is passive. Other machines ask it for the time periodically. All it does is respond to their queries. In Berkeley UNIX, exactly the opposite approach is taken (Gusella and Zatti, 1989). Here the time server (actually, a time daemon) is active, polling every machine periodically to ask what time it is there. Based on the answers, it computes an average time and tells all the other machines to advance their clocks to the new time or slow their clocks down until some specified reduction has been achieved. This method is suitable for a system in which no machine has a WWV receiver. The time daemon's time must be set manually by the operator periodically. The method is illustrated in Fig. 3-7.

**Fig. 3-7.** (a) The time daemon asks all the other machines for their clock values. (b) The machines answer. (c) The time daemon tells everyone how to adjust their clock.

In Fig. 3-7(a), at 3:00, the time daemon tells the other machines its time and asks for theirs. In Fig. 3-7(b), they respond with how far ahead or behind the time daemon they are. Armed with these numbers, the time daemon computes the average and tells each machine how to adjust its clock [see Fig. 3-7(c)].

### Averaging Algorithms

Both of the methods described above are highly centralized, with the usual disadvantages. Decentralized algorithms are also known. One class of decentralized clock synchronization algorithms works by dividing time into fixed-length resynchronization intervals. The *i*th interval starts at *T*_{0}*+iR *and runs until *T*_{0}*+(i+1)R, *where *T*_{0 }is an agreed upon moment in the past, and *R *is a system parameter. At the beginning of each interval, every machine broadcasts the current time according to its clock. Because the clocks on different machines do not run at exactly the same speed, these broadcasts will not happen precisely simultaneously.

After a machine broadcasts its time, it starts a local timer to collect all other broadcasts that arrive during some interval 5. When all the broadcasts arrive, an algorithm is run to compute a new time from them. The simplest algorithm is just to average the values from all the other machines. A slight variation on this theme is first to discard the *m* highest and *m* lowest values, and average the rest. Discarding the extreme values can be regarded as self defense against up to *m* faulty clocks sending out nonsense.

Another variation is to try to correct each message by adding to it an estimate of the propagation time from the source. This estimate can be made from the known topology of the network, or by timing how long it takes for probe messages to be echoed.

Additional clock synchronization algorithms are discussed in the literature (e.g., Lundelius-Welch and Lynch, 1988; Ramanathan et al., 1990a; and Sri-kanth and Toueg, 1987).

### Multiple External Time Sources

For systems in which extremely accurate synchronization with UTC is required, it is possible to equip the system with multiple receivers for WWV, GEOS, or other UTC sources. However, due to inherent inaccuracy in the time source itself as well as fluctuations in the signal path, the best the operating system can do is establish a range (time interval) in which UTC falls. In general, the various time sources will produce different ranges, which requires the machines attached to them to come to agreement.

To reach this agreement, each processor with a UTC source can broadcast its range periodically, say, at the precise start of each UTC minute. None of the processors will get the time packets instantaneously. Worse yet, the delay between transmission and reception depends on the cable distance and number of gateways that the packets have to traverse, which is different for each (UTC source, processor) pair. Other factors can also play a role, such as delays due to collisions when multiple machines try to transmit on an Ethernet at the same instant. Furthermore, if a processor is busy handling a previous packet, it may not even look at the time packet for a considerable number of milliseconds, introducing additional uncertainty into the time. In Chap. 10 we will examine how clocks are synchronized in OSF's DCE.

- 3.1.4. Use of Synchronized Clocks
- 3.1. CLOCK SYNCHRONIZATION
- 15.2 Synchronization
- 3 Synchronization in Distributed Systems
- Algorithms
- 15.4 Resource Synchronization Methods
- 15.6.1 Synchronous Activity Synchronization
- 1.3.2 The Clock
- Using the hwclock Command
- D.2. Clocking
- 1.3.13 Real-time Clock
- 2.1.6 The Clock Sources