distributed host checks: freshness checking issues

Pascal Vandeputte nagios at asmodeus.be
Wed Jun 1 17:51:10 CEST 2011


Okay, I got some new information: when I look at the Scheduling Queue, I 
see that the master is still scheduling active checks for some reason.

In nagios.cfg I specified "execute_host_checks=0" and each host stanza has 
"check_interval 0" which should prevent any scheduled active host checking, 
right?

It's also weird that the queue is falling behind fast... Half a day after 
the last nagios reload, the "next check" of a host is scheduled several 
hours earlier than the time shown in the "last check" column :s
In the attached screenshot, it's just a couple of minutes behind but Nagios 
was reloaded 10 minutes earlier.


Things I tried in the mean time in nagios.cfg:

use_retained_program_state=0
use_retained_scheduling_info=0
    (in case something from the state file was keeping Nagios from using the 
new settings)

check_result_reaper_frequency=2
    (this last change was suggested when running "nagios -s nagios.cfg")

But nothing seems to fix this.


Now, after reading
http://nagios.sourceforge.net/docs/3_0/hostchecks.html
one more time, I'm beginning to fear that it's impossible to make the 
master only run the check_command when doing freshness checks:

"If you set the check_interval option in your host definition to zero (0), 
Nagios will not perform checks of the hosts on a regular basis. It will, 
however, still perform on-demand checks of the host as needed for other 
parts of the monitoring logic."

Those other parts of the monitoring logic are e.g. the "host reachability 
logic" and some more things. If those still cause on-demand checks, which 
only result in a "stale" warning, then it looks quite bad for anyone trying 
to monitor hosts in remote private networks.


And then I started fiddling with host check caching for on-demand host 
checks. http://nagios.sourceforge.net/docs/3_0/cachedchecks.html

After increasing cached_host_check_horizon to 300 seconds (the biggest host 
check_interval we use), all of these on-demand checks should get their data 
from the last cached check.

And indeed, no more wild mood swings in the host states! Yay! After a quick 
test it seems that my problem is now solved. Touch wood.

I'd rather just have an option to *really* disable host checks altogether, 
after all that's what you think you're doing with "execute_host_checks=0", 
according to the documentation at 
http://nagios.sourceforge.net/docs/3_0/configmain.html

On the other hand, letting nagios look at old results in the cache is 
probably not that different from doing no checks at all. I'm only worried 
that the caching may delay notifications in some cases, but we'll have to 
experience this I guess.


Can anyone confirm that my reasoning is correct? That the master will 
*always* keep on doing *some* host checks no matter what you configure?


Best regards,

Pascal
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios.png
Type: image/png
Size: 89969 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20110601/d7f7103b/attachment.png>
-------------- next part --------------
------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger. 
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today. 
http://p.sf.net/sfu/quest-sfdev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list