Hierarchical host schedule queuing

Shawn Iverson shawn at nccsc.k12.in.us
Fri Mar 11 14:57:29 CET 2005




On Thursday, March 10, 2005 10:38 PM, Marc Powell wrote,
>
On Thursday, March 10, 2005 6:39 PM Shawn Iverson wrote,
>> 
>> Greetings!
>> 
>> While simulating a network failure to test my nagios setup, 
>I noticed 
>> that nagios (using version 1.2) does not hierarchically proceed to
>check
>> upstream hosts following when it concludes that a host is down hard.
>> 

<snip>

>
>It sounds to me that you're describing a feature that has been 
>in Nagios, and Netsaint prior, for years. If your idea is 
>somehow different, can you clarify?

Read below.


>
>http://nagios.sourceforge.net/docs/1_0/networkreachability.html
>
>"Monitoring Remote Hosts 
>
>Checking the status of remote hosts is a bit more complicated 
>that for local hosts. If Nagios cannot monitor services on a 
>remote host, it needs to determine whether the remote host is 
>down or whether it is unreachable. Luckily, the <parent_hosts> 
>option allows Nagios to do this. 
>
>If a host check command for a remote host returns a non-OK 
>state, Nagios will "walk" the depency tree (as shown in the 
>figure above) until it reaches the top (or until a parent host 
>check results in an OK state). By doing this, Nagios is able 
>to determine if a service problem is the result of a down 
>host, an down network link, or just a plain old service failure.

It appears that Nagios is not "walking" properly on my setup then.  My
notification options are as follows from contacts.cfg:

define contact {
contact_name                   Shawn_email
alias                          Shawn Iverson Email
host_notification_period       24x7
service_notification_period    24x7
service_notification_options   u,w,c,r
host_notification_options      d,r
host_notification_commands     host-notify-by-email
service_notification_commands  notify-by-email
email                          shawn at nccsc.k12.in.us
}

define contact {
contact_name                   Shawn_pager
alias                          Shawn Iverson Pager
host_notification_period       24x7
service_notification_period    24x7
service_notification_options   u,w,c,r
host_notification_options      d,r
host_notification_commands     host-notify-by-email
service_notification_commands  notify-by-email
email                          ##########@paging.acswireless.com

Nagios is not "walking" my dependency tree for some reason.  Something
has gone astray for me.

Here is a sample of a parent tree of my hosts.cfg for reference (ip
addresses not included):

<hosts.cfg START>

define host {
name                           generic-host     ; The name of this host
template - referenced in other host definitions, used for template
recursion/resolution
notifications_enabled          1        ; Host notifications are enabled
event_handler_enabled          1        ; Host event handler is enabled
flap_detection_enabled         1        ; Flap detection is enabled
process_perf_data              1        ; Process performance data
retain_status_information      1        ; Retain status information
across program restarts
retain_nonstatus_information   1        ; Retain non-status information
across program restarts
register                       0        ; DONT REGISTER THIS DEFINITION
- ITS NOT A REAL HOST, JUST A TEMPLATE!

max_check_attempts             10
notification_interval          360
notification_period            24x7
notification_options           d,r
check_command                  check-host-alive
}

define host {
use                            generic-host
host_name                      6509
alias                          6509 Core Router/Switch
address                        #.#.#.#
}

define host {
use                            generic-host
host_name                      vlan_switch
alias                          VLAN Radio Switch
address                        #.#.#.#
parents                        6509
}

define host {
use                            generic-host
host_name                      eastomniA
alias                          East Omni A
address                        #.#.#.#
parents                        vlan_switch
}

define host {
use                            generic-host
host_name                      eastomniB
alias                          East Omni B
address                        #.#.#.#
parents                        vlan_switch
} 

define host {
use                            generic-host
host_name                      rileyA
alias                          Riley Radio A
address                        #.#.#.#
parents                        eastomniA
}

define host {
use                            generic-host
host_name                      rileyB
alias                          Riley Radio B
address                        #.#.#.#
parents                        eastomniB
}

define host {
use                            generic-host
host_name                      rileyrouter
alias                          Riley Router
address                        #.#.#.#
parents                        rileyA, rileyB
}

define host {
use                            generic-host
host_name                      rileySW2
alias                          Riley Switch 2
address                        #.#.#.#
parents                        rileyrouter
}

define host {
use                            generic-host
host_name                      rileyserver
alias                          Riley Server
address                        #.#.#.#
parents                        rileySW2
}

<hosts.cfg END>

This is what is happening with my setup.  Say that the vlan_switch goes
down and cannot route packets, and the scheduling queue is as follows at
that moment:

rileyserver PING 03-11-2005 08:32:35 03-11-2005 08:37:35 ENABLED   
rileyrouter PING 03-11-2005 08:33:35 03-11-2005 08:38:35 ENABLED   
rileyB PING 03-11-2005 08:34:35 03-11-2005 08:39:35 ENABLED   
eastomniA PING 03-11-2005 08:35:35 03-11-2005 08:40:35 ENABLED   
rileySW2 PING 03-11-2005 08:36:35 03-11-2005 08:41:35 ENABLED   
6509 PING 03-11-2005 08:37:35 03-11-2005 08:42:35 ENABLED
vlan_switch PING 03-11-2005 08:38:35 03-11-2005 08:43:35 ENABLED
rileyA PING 03-11-2005 08:39:35 03-11-2005 08:44:35 ENABLED   
eastomniB PING 03-11-2005 08:40:35 03-11-2005 08:45:35 ENABLED

Here is the order of events that is occuring on my box:

1) I receive an email alert that the rileyserver is in a DOWN state.
2) I receive an email alert that the rileyrouter is in a DOWN state.
3) I receive an email alert that the rileyB is in a DOWN state.
4) I receive an email alert that the eastomniA is in a DOWN state.
5) I receive no email alert for rileySW2 because it is an UNREACHABLE
state.
6) The 6509 is UP, no email
7) I finally receive an alert after the previous five that the
vlan_switch in a DOWN state.
8) I receive no email alert for rileyA because it is in an UNREACHABLE
state.
9) I receive no email alert for eastomniB because it is in an
UNREACHABLE state.

>DOWN vs. UNREACHABLE Notification Types 
>
>I get lots of email from people asking why Nagios is sending 
>notifications out about hosts that are unreachable. The answer 
>is because you configured it to do that. If you want to 
>disable UNREACHABLE notifications for hosts, modify the 
>notification_options argument of your host definitions to not 
>include the u (unreachable) option. More information can be 
>found in this FAQ."
>

I am not receiving any alerts for hosts in an UNREACHABLE state, but I
am receiving false alerts for hosts that should be in an UNREACHABLE
state, not a DOWN state.   

>--
>Marc
>

--

Shawn 


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list