Dependency problem

Andreas Ericsson ae at op5.se
Wed Apr 7 22:55:32 CEST 2004


> ===================================================
> My Topology:
> ===================================================
>  
> Nagios machine --- RT1 -- RT2 -- RT3
>  
> 
> ====================================================
> The problem
> ====================================================
>  
> When RT1 goes down, or the RT1-RT2 Link goes down, Nagios will notice 
> that at random, while he is checkong a service or
>  
> HOST_ALIVE function to any part of the network that is down. Let's 
> assume that the first Host that Nagios found dead was RT3.
>  
> Nagios didn't get any reply from RT3, so RT3 will be kept in SOFT down 
> state.
>  
> Next the RETRY proccess will take place. The max_check_attempts are 30 
> for each host. That's because the links are not
> reliable at all so we want to be a little elastic with the Notifications.
>  
This is where your problem is. max_check_attempts of 30 is more than 
just a little elastic. Set it to 10 or something instead, and things 
might run a bit smoother.
Also, if the network really is in such a crappy state, you might want to 
just stop monitoring it, since it's obviously not mission-critical for you.

> At the time that we reach the Retry #30, Nagios assumes that RT3 IS 
> DOWN, puts it in HARD DOWN state and looks to find any
> dependencies associated with the RT3. If you look below, RT3 is 
> dependent upon RT2. So it will continue with try pinging RT2.
>  
> While Nagios is trying to determine whether the RT2 is alive or not, 
> suddendly, the RT1-RT2 link comes up and all the network
>  
> is now reachable by Nagios. I notice here that the max_checks_attempts 
> havent timed out. So Nagios will take a response from
>  
> RT2 and it will put it in A HARD OK State.
>  
> The result will be NOT to check RT3 again to see if he is up as RT2. So, 
> a notification will be sent reporting that RT3 is
>  
> down. This is FAKE. The whole network was down!
>  
> Below I provide you my configuration. Maybe sth goes wrong with my conf 
> files.
>  
> Thanks in advance guys
>  
> ====================================================
> My dependecies.cfg file
> ====================================================
>  
> define hostdependency{
>  host_name   RT2
>  dependent_host_name  RT3
>  notification_failure_criteria d,u
>  }
>  
> define hostdependency{
>  host_name   RT1
>  dependent_host_name  RT2
>  notification_failure_criteria d,u
>  }
>  
> 
> ===================================================
> My hosts.cfg
> ===================================================
>  
> define host{
>  use   generic-host
>  host_name  RT1
>  alias   Wireless 1
>  address   213.5.0.34
>  check_command  check-host-alive
>  max_check_attempts  30
>  notification_interval 0
>  notification_period 24x7
>  notification_options d,u
>  }
>  
> 
> define host{
>  use   generic-host
>  host_name  RT2
>  alias   tsapi.twmn
>  address   10.107.13.1
>  parents   RT1
>  check_command  check-host-alive
>  max_check_attempts  30
>  notification_interval 0
>  notification_period 24x7
>  notification_options d,u
>  }
>  
> 
> define host{
>  use   generic-host
>  host_name  RT3
>  alias   Wireless Internet
>  address   212.34.23.4
>  parents   RT2
>  check_command  check-host-alive
>  max_check_attempts  30
>  notification_interval 0
>  notification_period 24x7
>  notification_options d,u
>  }
>  
>  
> 
> ____________________________________________________________________
> http://www.freemail.gr - δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου.
> http://www.freemail.gr - free email service for the Greek-speaking.

-- 
Mvh
Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list