Host down, still doing active checks, causing multiple unwanted service failures

Toussaint OTTAVI t.ottavi at medi.fr
Tue Dec 9 15:17:22 CET 2008


Toussaint OTTAVI a écrit:
>
> Following this idea, I will investigate the following :
> - Hosts associated themselves with parent/child relationship according 
> to WAN topology (already working)
> - For each host, I will create a "parent" service with only a 
> check_alive command
> - Every other service will be a child of this parent service

Answer to myself... After some investigations and doc readings :-) it 
seems I made a little confusion between "parent/child" and "dependency" :

- Parent/Child relationship is for hosts only, and should map network 
topology. When a host is DOWN, all the children are set to UNREACHABLE. 
But this parent/child relationship does not exist for services.

- Dependency can be either for hosts or services. When a dependant 
object is down, the "depended upon" object is not checked. But no 
assumption is made to the "depended upon" object status. Thus, it is not 
set to UNREACHABLE or UNKNOWN, such as for parent/child relationship.


Here's the actual situation :

- Creating a dependancy solves my problem of not checking services when 
hosts are unreachable due to WAN failure. This is a smarter solution 
than my previous attempt using event_handlers and DISABLE_ALL_SVC_CHECKS 
external command. Using wildcards, I just have to declare one dependency 
for all services on several hosts like this :

  define servicedependency{
    host_name                          Remote_WAN_Router                
    
    service_description                Remote WAN router ping test
    dependent_host_name                REMOTE_HOST1, REMOTE_HOST2, ..., 
REMOTE_HOSTn
    dependent_service_description      *
    inherits_parent                    1
    execution_failure_criteria         w,u,c
      }

- Doing that, when the WAN fails, the checks are not executed, and they 
keep their previous status. That's a good thing. But I would have 
prefered they get the status UNKNOWN or UNREACHABLE. In fact, I would 
like to have the same parent/child behavior that exists for hosts, but 
for services.

- I'm not sure it will solve the "latency" problem : if a service check 
attempt on remote_host occurs before the remote_wan_router is declared 
DOWN and the dependency does its job, then I'll still get critical 
failures for those services. The console will display a mix of FAILED 
services (those executed before the WAN router check) and some OK 
services (Previous state of services that will not be checked due to 
dependency). This display would be completely wrong !

Again, in such a situation, I think the right display for services whose 
status could not be determined should be "UNKNOWN". Same as hosts that 
are "UNREACHABLE"

Comments and ideas welcome.

Kind regards,
-- 

*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
***Mail:* t.ottavi at medi.fr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081209/e8f990be/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list