Host check timing

Marc Powell lists at xodus.org
Thu Mar 18 16:42:37 CET 2010


On Mar 18, 2010, at 9:53 AM, David Dyer-Bennet wrote:

> I'm monitoring some far-away remote hosts, that we connect to via the
> public internet (well, there's an encrypted VPN involved).  I'm trying not
> to send notifications until an outage persists for a while.
> 
> In an example I looked at this morning, I see that it was repeating the
> host check every 10 seconds until it hit the retry count.

With nagios-2, when a host enters a non-OK state, nagios switches to a serial mode where the host check is retried until max_check_attempts is reached. The fact that it's 10 seconds has more to do with the check that you are performing than any nagios timing. That's apparently about how long it takes each check run to complete (are you issuing 10 pings perhaps?).

> Where does that 10 seconds time come from?  The manual is remarkably vague
> about host check scheduling; about all it says is that it does them on
> demand,

With nagios-2, host checks are only run when a service on the host fails. Nagios will then run the host's check_command, up to max_check_attempts, to determine if the host is down or just the service.

> and "If the first host check returns a non-OK state, Nagios will
> keep pounding out checks of the host until either (a) the maximum number
> of host checks (specified by the max_attempts option in the host
> definition) is reached or (b) a host check results in an OK state."

Yup, the host check_command is run, one immediately after the previous, until max_check_attempts is reached. During this time nagios is doing *nothing* else besides checking this host.

> Does this mean I have no control over the timing?  

Depends on your version of nagios. With nagios-2, yes, you have no control over the timing.

> Can I treat the 10 second observed delay as real (and then control total time delay by
> setting max_attempts high)?

Setting max_attempts is a way to deal with that but if you're using nagios-2, you're stopping *all* other checks and processing until max_attempts is reached. If you set it so that it's 6 then for about the next 60 seconds nagios is doing nothing besides checking this single host. This may be important to you if you have lots of other checks you are doing.

If you need to have more control over that then I'd suggest upgrading to nagios-3. Host check logic was greatly improved and more in line with how service checks are done.

--
Marc


------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list