Help! My NTP setup won't synchronise. I'm running Tru64 5.1b and I'm trying to synchronise to a Cisco router.
I've stripped down the NTP setup so that my server is only synchronising with 1 time server (I know this is bad practice but it simplifies the problem). My source time server is usually synchronised (indicated by a * when running ntpq -p remotely; though I've occassionaly seen a # meaning exceeds maximum distance).
My ntp.conf contains details of the 1 server and the path of the drift file (which contains 0)
Please find the ntpq -p output below. The telltale indicator for my server is always ' '! Implying it fails the sanity checks. However the other values indicate ntp polling is taking place.
# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
<mymachine> <ntpserver> 4 u 38 64 377 7.098 187.693 101.990
Running the ntpq 'as' command shows that the server is being 'rejected':
ntpq> as
ind assID status conf reach auth condition last_event cnt
===========================================================
1 42572 9014 yes yes none reject reachable 1
However if you look at the full details it IS passing all of the sanity checks (indicated by flash = 00). The filtoffset value is changing by 104ms each poll(it will loop through negative values too).
* Why will it not accept this time server as a valid synchronisation source? (Or how can I make it accept this time server as a valid synchronisation source)
* What is filtoffset? What do these step changes represent (are they bad)?
Due to corporate restrictions I am unable to try synching to any of the available internet NTP servers. My hunch is that my NTP source is 'not good enough' but I'd like to know why.
Any help, hints or document references would be greatly appreciated.
Thanks,
Becks
Note: If you are the author of this question and wish to assign points to any of the answers, please login first.For more information on assigning points ,click
here
I have been running ntpdate before starting the daemon. Other forums suggested that maybe I needed some patience but I let it run over the last weekend and it hadn't synchronised by Monday.
I'm trying to synchronise to a Cisco router, not a Windows server - am I likely to encounter similar problems? :-(
NB The Cisco router is using an Alpha server as its time server. Though I guess this doesn't prove that a Cisco router can 'serve' time to an Alpha. Unfortunately, I can't access this Alpha Server directly.
The three lines identified as filtdelay, filtoffset and filtdisp reveal the roundtrip delay, clock offset and dispersion for each of the last eight measurement rounds, all in milliseconds. Note that the dispersion, which is an estimate of the error, increases as the age of the sample increases. From these data, it is usually possible to determine the incidence of severe packet loss, network congestion, and unstable local clock oscillators. There are no hard and fast rules here, since every case is unique;
--- however, if one or more of the rounds show large values or change radically from one round to another, the network is probably congested or lossy. ---
Thanks for the link - I'd recommend the debug section at the bottom. I'm currently reading about the intersection algorithm (the last check on this list)... though I'm not sure how relevant this is on my 1 ntp server system! NB I'm getting ' ' when running ntpq -pe which implies I'm failing sanity checks, so I'm not getting as far as the intersection algorithm.
My ntp server (cisco router) is directly connected to my alpha server. I'm not losing packets there. However the ntp server (router) has a bad connection to its ntp server (though it just about maintaining it's *)
Running ntpq -p <ntp server> the offset is looping round n the cisco router as follows OFFSET 573.460 (jitter consistenty around 200; delay around 30) 678.159 888.096 993.450 1098.87 1203.27 1308.72 1413.83 -321.82 LOST CONNECTION TO NTP SERVER offset=1623.63 starts looping round again; jitter stabilises to 200; delay consistently 30
So the connection from my alpha to my ntp server (router) is good. But the connection from the ntp server (router) to it's stratum 2 server is dodgy - would this cause me to reject it as a time server?
Do I consider the jitter and offset values of an NTP server when considering if it's a valid time source? I thought it would be enought that it is time synchronised (as indicated by *)?
NTP is a robust protocol so momentary (hours) loss of sync or connectivity is not goinjg to cause jumps in the time. The downstream clients just freewheel until the server appears to be stable. If there is a time drift, it will be slowly adjusted once the server is stable (assuming you are running xntpd and not just jumping the clock with ntpdate.
In your case, the Alpha to Cisco connection is clean but the Cisco to NTP source is questionable. But more important, no NTP server should have one server. A minimim of 3 to 5 servers should be used in the Cisco box. That way, all the sources can be evaluated and a stable sync will be obtained.
It looks like your cisco router has authentication enabled. The "reject" is listed in the "auth" column...
remove any ntp authentication statements from your cisco configuration and try again.
If you rely on authentication (or the cisco router is not accessable) you need to put the ntp keys on your UNIX system and adjust the ntp configuration accordingly.
Authorisation is not an issue. If you 'Enable Authorisation' this allows you to respond to encoded requests as well as any other requests. I'm receiving replies from the server - if I had failed authorisation it wouldn't even reply. Thanks anyway; every penny helps!
I found something interesting in: www.lnf.infn.it/computing/unix/ntp/debug.htm Debug 8, While the algorithm can tolerate a relatively large frequency error (over 350 parts per million or 30 seconds per day), various configuration errors (and in some cases kernel bugs) can exceed this tolerance, leading to erratic behavior. This can result in frequent loss of synchronization, together with wildly swinging offsets. ... If the error increases by more than 22 milliseconds per 64-second poll interval, the intrinsic frequency must be reduced by some means.
This was written by the same bloke who wrote the protocol.
I was seeing 104 milliseconds per 64-second poll!! So that symptom is identified but no resolution.
I also found out more details of the NTP configuration for my router and above. This was not in line with NTP guidelines either. Consequently errors were increasing down the NTP path.
So in summary: NTP works because you have a selection of time servers. It can make intelligent decisions about which are the best tickers. Please note that when it makes it's decisions it uses the distance to the root, the root dispersion as well as the delay, offset and jitter values for the peer, time intersections and a whole lot else!
I can only assume that a number of these parameters fell out side error bounds specified in the algorithm; unfortunately I haven't found any definitive documentation to confirm this.
Due to corporate restrictions it looks like I'm going to have to revert to a dial up service. An alternative solution is to set up ntpdate as a cron job but I don't trust my source enough to do that!!