Parent/child behaviour, WAS:Re: check_ping vs. check_icmp?

Greg Vickers g.vickers at qut.edu.au
Mon Oct 17 01:58:53 CEST 2005


Andreas,

Andreas Ericsson wrote:
> Andrew Laden wrote:
> 
>> How does using check_icmp compare to using check_fping?
>>
>> It seems that check_fping will return a down answer much faster. Since 
>> host checks are most often run when the host is down, that seems to be the
>> performance that we are concerned with.
> 
> This might seem to be the case, but it actually isn't. A hostcheck is 
> run each time a service changes from whatever to any non-OK state. In a 
> (somewhat) healthy network hostchecks are being run when the host is up 
> more often than when they're down. The opposite is of course true if 
> there are hosts being down for a long time or if a whole segment of the 
> network goes to lunch,

I thought that if parents were set up correctly that Nagios would not 
run any service or host checks on hosts that are children of the 
blocking outage? So there would be a delay while Nagios figures out 
which is the parent host that is down (i.e. the service checks failing 
'up' the parent dependencies and the subsequent delays on the host 
checks until the 'top' parent host is checked) but once the top-most 
parent is host checked, no host or service checks will be run on the 
children until that parent becomes good. Subsequently you would only see 
a delay in check scheduling/processing when the host check is run on 
that 'top' parent host.

Is this the expected and correct behavior or is it too early on Monday 
morning for me?

<snipity-snip-snip>

Ah-ha - RTFM prior to inserting foot in mouth. The networkoutages.html 
states:

"If all of the immediate child hosts of one of these flagged hosts is 
DOWN or UNREACHABLE and has no immediate parent host that is up, the 
flagged host is the cause of a network outage. If even one of the 
immediate children of a flagged host does not pass this test, then the 
flagged host is not the cause of a network outage."

So from this statement, I understand that all children will be host 
checked to determine fully which host is the cause of a network outage, 
and that could cause a large delay if there are a lot of hosts to check.
However I don't understand the statement "... has no immediate parent 
host that is up..." Shouldn't that read "... has a parent host up..." 
otherwise how would Nagios reach that blocking host to test it???

It really could be too early...

Thanks,
-- 
Greg Vickers
Project Manager, IT Security
Information Technology Services
Queensland University of Technology
L12, 126 Margaret St, Brisbane

Phone: (07) 3864 9536
Email: g.vickers at qut.edu.au
IT Security web site: http://www.its.qut.edu.au/itsecurity/

CRICOS No. 00213J


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list