BUG: Passive host check results are always in HARD state.

jan.david at agfa.com jan.david at agfa.com
Tue Jul 4 17:44:58 CEST 2006


Hi,

We have a distributed Nagios set-up with three (slave) check engines 
performing active checks and sending their results to a master server 
which collects all results and sends out alarms if need be.

Our department had a lot of complaints regarding remote hosts connected 
over a WAN link that give out a lot of false positives.

Because  WAN links are more prone to packet loss than LAN links, we've set 
the number of host retries to 10, figuring that this would avoid any false 
alerts about hosts being down while in fact it is just a temporary glitch 
in the line.

This setup did not work however. Further investigation about the cause 
revealed what I believe to be a bug.

While receiving host check results in PASSIVE mode, the number of retries 
is not taken into account and a negative response will immediately results 
in a HARD state, which in turn sends out alerts.

This is a very annoying bug because it can create a lot of unnecessary 
notifications if you're monitoring a machine over a WAN link.

I've first experienced this bug while running nagios 2.2 and have recently 
upgraded to 2.4 to no avail.

In our normal setup, a slave machine would perform an active host check 
and send the result through nsca to the master server. But it is not 
necessary to reproduce the buggy behaviour. You can easily do it as 
follows:

1) Pick a machine in your nagios configuration that you can play with.

As you can see from the first screenshot, the machine is currently in 
attempt 1/10, state type HARD and last result was passive:



2) Click on "Submit passive check result for this host"



3) Commit and wait a minute:



As can be seen, the passive check immediately results in a HARD state, 
even though the attempt is only 1/10. 

Note that PASSIVE services checks work as expected, it's only host checks 
that exhibit this behaviour.

Would it be possible to post a patch for this bug or could a fix be 
incoporated in a next release?

Best Regards,

Jan David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060704/770865f4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 47452 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060704/770865f4/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 35289 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060704/770865f4/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 42080 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060704/770865f4/attachment-0002.gif>
-------------- next part --------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list