Inconsistent service states

Doug Ritchie dritchie at advanstar.com
Tue May 27 16:04:25 CEST 2003


Hey all,

Has anybody seen any strangeness with service states being inconsistent in
the nagios.log and archives?  We monitor about 300 services with Nagios, and
at the end of every month when we run availability reports, I find a handful
of services that appear to have gone down and never come back up.  When I
dig through the nagios.log and archives, I find on all of them a situation
where:

1) the service failed legitimately and caused a 'CRITICAL;HARD' entry
2) the host was rebooted, causing 'host down' and 'host up' events
3) the very next service check fails (perhaps NSClient hasn't started yet,
or somesuch)
4) the failure is now oddly recorded as 'CRITICAL;SOFT', attempt #1
4) the subsequent service check succeeds, and is recorded as 'OK;SOFT',
attempt #2

In the end, the archives contain a CRITICAL;HARD entry from when the service
failed, but no corresponding OK;HARD entry for when it was fixed.  While it
doesn't cause any problems in terms of notifications, it does throw the
availability reports way off, since from the point of the initial failure
all the way up until whenever the service fails/recovers properly next,
everything is recorded as critical time.

I can fix this manually by modifying the archive files to make those OK;SOFT
entries OK;HARD, but this does seem like a weird bug.  Again, the only time
I've seen this is with the following sequence of events:

a) service was critical
b) host was rebooted
c) first service check failed immediately following the reboot

Any ideas or prior experience with this kind of thing?

Thanks much,
Doug Ritchie



-------------------------------------------------------
This SF.net email is sponsored by: ObjectStore.
If flattening out C++ or Java code to make your application fit in a
relational database is painful, don't do it! Check out ObjectStore.
Now part of Progress Software. http://www.objectstore.net/sourceforge
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list