Nagios lockup for about 8.5 hours

Kevin Keane subscription at kkeane.com
Sat Jul 11 12:27:21 CEST 2009


You know, I missed something crucial in your original post. You said 
that nagios hung for several hours. How did you actually determine that, 
other than by the missing pnp graphics? Do you know if the nagios daemon 
had stalled or actually terminated? If Nagios did stall for about five 
hours, it would explain the message about "system time change" - the 
five hours would appear to Nagios as a millisecond. And when it woke up 
again, it would be kind of disoriented: "where am I? What day is it?"

Can you correlate any events in the system event log with this hanging?

Andrew Noonan wrote:
> Not that I can determine.  We run everything in Central time, and the
> nagios user is currently running in that TZ.  Actually, now that I
> think about it, it's a fairly moot point, as the logs that I listed
> don't have a timestamp, they are epoch stamps.  Since those show a ~5
> hour jump, this could not be a TZ change, but either a pause with
> Nagios, or the system clock would have to be changed, as that would
> effect the epoch values.  Given the consistency of the 5-minute cron
> entries I mentioned earlier, I don't think a system clock change
> happened either.
>
> On Thu, Jul 9, 2009 at 8:56 AM, Kevin Keane<subscription at kkeane.com> wrote:
>   
>> Has the time zone that Nagios runs under changed, maybe? That would not
>> affect the log files or NTP, since both usually always run on UTC.
>>
>> Andrew Noonan wrote:
>>     
>>> Sorry Kevin, I was out yesterday or I would have responded earlier.  I
>>> don't think that's the case.  I forgot to mention it in the earlier
>>> email, but I checked the log files of a periodic cron job that also
>>> runs on the same server every 5 minutes, and its logs show an
>>> uninterrupted timestamp.  In addition, I also monitor NTP through
>>> nagios (and graph with PNP), and up until the outage, the local skew
>>> was less then a second.
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Tue, Jul 7, 2009 at 9:56 PM, Kevin Keane<subscription at kkeane.com> wrote:
>>>
>>>       
>>>> It seems to me that for some reason your system clock has changed by
>>>> about five hours. Did you change your system by any chance from local
>>>> time (Eastern time, probably, based on the five-hour difference) to UTC?
>>>> Or maybe your clock had drifted for a long time. When the clock skew
>>>> becomes too great, NTP refuses to update the time (because there is no
>>>> way to be sure that the time signal isn't the one that's incorrect). If
>>>> you restart NTP, it will set your clock regardless of the clock skew.
>>>>
>>>> The following "immediate check" messages probably occurred because
>>>> Nagios thought that these services hadn't been checked for five hours.
>>>>
>>>> Andrew Noonan wrote:
>>>>
>>>>         
>>>>> I've been testing out Nagios in general to replace our current system
>>>>> and I noticed a strange blank in my PNP graphs this morning.  When I
>>>>> looked closer, I found that nagios had basically hung for several
>>>>> hours.  Then, the log shows a warning of:
>>>>>
>>>>> [1246958195] Warning: A system time change of 0d 4h 56m 48s (forwards
>>>>> in time) has been detected.  Compensating...
>>>>>
>>>>> and then for several hours, messages like:
>>>>>
>>>>> [1246958830] Warning: The check of host 'superhost1' looks like it was
>>>>> orphaned (results never came back).  I'm scheduling an immediate check
>>>>> of the host...
>>>>>
>>>>> I'm running nagios 3.0.6 with ndo2db.  The system has under 1000
>>>>> services, most of which are nrpe checks to remote hosts.
>>>>>
>>>>> The nagios system was not terribly loaded at the time (about 50% idle)
>>>>> and mysql did not show any errors at the time.  Typically, the number
>>>>> of buffers used is only 2-3 out of the 4096.
>>>>>
>>>>> Any ideas as to what this could have been, or how I can detect this
>>>>> condition or log to gain more info?  I wouldn't think that this is
>>>>> normal, but my Google searches aren't turning up a lot.
>>>>>
>>>>> Thanks!
>>>>>           


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list