OpenBSD Journal

OpenNTPD with adjtime/adjfreq tuning

Contributed by deanna on from the you-want-it-when? dept.

Chris Kuethe (ckuethe@) writes:

This article briefly describes the improvements in the timekeeping capabilities of OpenBSD when an ntpd is able to discipline the clock with adjtime(2) and adjfreq(2). The word "NTP" refers to various versions of the Network Time Protocol; "ntpd" refers to the NTP daemon shipping with OpenBSD. The graphs below show the same data; the second and third are zoomed in to show the performance of kernel support more clearly.

All machines synchronize to a Symmetricom (formerly Datum) time server. To prevent this reference server from being overloaded, we front-end it with a set of Sparcstation LXes ("their clocks may be crap, but they're at least consistently crappy") running as NTP servers for campus. For this experiment, my amd64 workstation was used as a baseline timer while polling the reference clock and our three other servers every ten seconds with 'rdate -pnav <server>'. The sparcstations are 2 hops and about 3.6ms away from the reference, while my workstation is 3 hops and about 2.7ms away. The decreased ping times can probably be attributed to the 1000Mbit ethernet interface in my workstation compared to the 10Mbit ethernet interface in the sparcstation.

The green trace shows ntpd without kernel support trying to follow our reference clock. While xntpd is reputed to have some clever filtering and steering algorithms, it does need kernel support to be most effective - its performance in this configuration would be almost identical. Several years ago the ntp kernel hooks were removed from OpenBSD. With nothing more than a write-only interface to adjtime, any NTP daemon will have difficulty properly steering the system's time. It would have to poll the time source and call adjtime to try keep the offset close to zero, which could cause fairly scary-looking changes in time. A cheap clock chip doesn't help matters any.

Recently, adjtime was extended to allow ntpd to query the kernel for the progress of any gentle time adjustments, and the adjfreq syscall was added to allow the kernel's timer frequency to compensate for variations in specific clock hardware. These changes are considerably simpler than the old NTP hooks and seem to perform quite well, as shown below. Before these changes, ntpd suffered the same problem, and would often lurch from being fast to slow to fast again. The magnitude of the sawtooth was greatly reduced however, by modifying ntpd to query the kernel for any adjtime currently still in progress, and using this to not request an excessively large correction.

The red trace is from my amd64 workstation, synchronized directly to the time server. Looking at the raw output from rdate, it seems to be about 50uS behind the timeserver. This is close to two orders of magnitude less than the round-trip time to the reference clock and can probably be explained by network delays and processing overhead. The consistently low offset is to be expected, as it is being synchronized directly to our reference clock.

The blue and purple traces are from the second and third of our timeservers running ntpd. The spikes in the time offset are difficult to explain. No errors or warnings were reported by ntpd during this period; only frequency adjustments of a few hundredths to a tenth of a part per million. Perhaps time2 has a slightly better clock chip than time3 as its spikes seem smaller than those from time3. No network errors and only three collisions were reported by netstat, suggesting that that routing or switching delays might be responsible. This is a reasonable guess, as time1 often calls adjtime very close to the time that the other servers also exhibit a spike.

Future tests will replace the sparcstations with either Soekris NET4801 or Nexcom EBS1563 systems. Both have 100Mbit ethernet interfaces on a PCI bus; our test Soekris and Nexcom machines have 233MHz and 667MHz x86 compatible processors, respectively. Poul Henning-Kamp claims sub-microsecond precision from a Soekris with a rubidium frequency standard; we do not expect to replicate this using the stock oscillators, but these more modern machines should do a better job than the 50MHz sparcstations.


Performance of an ntp daemon with and without kernel support:

Performance of kernel supported NTP daemons

Performance of kernel supported NTP daemons (zoomed in)


  Sum of offsets Average offset StdDev
Direct to refclock -1.20004s -5.22143 x10-5 3.23350 x10-4
time1 (no kernel support) 1785.67s 7.76952 x10-2 4.92617 x10-2
time2 (kernel support) 2.10263s 9.14863 x10-5 5.34393 x10-4
time3 (kernel support) 3.31528s 1.44249 x10-4 1.52957 x10-3

230611s, 22983 samples

Updates to the original paper, which is a work in progress, will appear on Chris Kuethe's site at University of Alberta.

(Comments are closed)


Comments
  1. By frantisek holop (165.72.200.11) on

    it could have been pointed out that the pictures are scaled down.

    either turning them into links to the normal sized ones, or just leave them as they were. undeadly is almost html agnostic anyway.

    Comments
    1. By Anonymous Coward (64.231.233.75) on

      > it could have been pointed out that the pictures are scaled down.

      They are at the original proportions and there are larger images on Chris's site, which is linked to at the bottom of the article.

      Please send comments like this to editors@ undeadly instead of posting them here.

  2. By baldusi (200.68.102.49) on

    The Green line seems to have a very clear positive bias. Shouldn't an NTPd in this situation detect a faster clock and then undershoot when adjusting so as to on average be in the true time?

    Comments
    1. By Chris Kuethe (129.128.11.77) ckuethe@ on

      It doesn't just seem to have a bias, it does have a bias. Note that the sum of the offsets is about 1785s in 230611s, compared to two or three seconds for the other two servers.

      As for correcting the bias, that's the point of being able to call adjtime(NULL, &tv) - that you can get information back from the kernel about how much time adjustment was done. Without knowing how much time is left to be adjusted you can't tell whether you're waiting for adjtime to finish or not. Asking for a larger adjustment than is necessary is going just cause you to bounce between being ahead or behind. In some cases, the oscillator's frequency is so far off that adjtime just can't keep up.

      The right thing to do is what kernel-assisted ntp daemons do. Calculate a frequency correction and use that to tune the local oscillator so it's not biased...

  3. By sthen (85.158.44.146) on

    > Poul Henning-Kamp claims sub-microsecond precision from a Soekris with a rubidium frequency standard

    Note that's not "the RTC on the Soekris is highly precise", it's "you can hook up a PPS to the GPIO lines and get precise signals that you can feed into NTP or other software to use as a timesource" (http://www.freebsd.org/cgi/man.cgi?query=CPU_ELAN and RFC2783 are relevant too). You can use this interface for things other than a timesource too - e.g. timestamping pulses (see http://phk.freebsd.dk/Gasdims/). (yes, I know, wrong OS but you might still be interested...)

  4. By Chris (24.76.100.162) on

    Very cool.

  5. By julien (81.57.235.161) on

    Hi

    can you show us how do you make those graphs (stats cmd with ntpdate or other tool and gnuplot, maybe) ?

    thanks

    Comments
    1. By Chris Kuethe (129.128.11.75) ckuethe@ on

      The graphs were generated by repeatedly calling "rdate -pnav $server" inside a loop, the output of which was fed to awk to pull out just the fields I wanted. Plotted with gnuplot.

  6. By Alan Watson (132.248.81.29) alan@alan-watson.org on http://www.alan-watson.org/

    This is a great illustration of the improvements to timekeeping. I've just got a couple of comments.

    It has been possible to query the kernel for any remaining adjtime adjustment for as long as I can remember. However, until the change you mention, you had to be root. Now, you do not have to be root, which means in particular that the unprivileged ntpd process can obtain the remaining offset.

    Second, even if you do not have adjfreq, you can improve the precision of timekeeping dramatically by simply calling adjtime once per second to correct for the clock skew. Brutal, but effective. I showed Henning a patch to ntpd to implement this two years ago, but he rightly held out for implementing adjfreq in the kernel.

    Regards,

    Alan

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]