nrpe timeouts, dependencies, no connectivity

Andreas Ericsson ae at op5.se
Wed Oct 4 14:57:49 CEST 2006


David Miller wrote:
> Hi All;
> 
> I'm pulling out what little is left over my hair on this one:(
> 
> I've got a setup where a nagios host at data center A is monitoring 
> services on 30+ hosts at data center B.  The bulk of the monitoring is 
> via nrpe.
> 
> Once every blue moon or so I lose connectivity between the two data 
> centers.  I then get 30 hosts * avg_number_services_monitored pages 
> about problems, and a similar number of recovery messages.
> 
> I'm on debian stable (sarge), running their package (1.3-cvs.200504).  I 
> setup a simple service to test connectivity between data centers:
> 
> define service{
>         use                             generic-service         ; Name 
> of service template to use
>         hostgroup_name                  check-inap
>         service_description             Check INAP Connection
>         is_volatile                     0
>         check_period                    24x7
>         max_check_attempts              5
>         normal_check_interval           5
>         retry_check_interval            2
>         contact_groups                  dmiller
>         notification_interval           120
>         notification_period             24x7
>         notification_options            w,u,c,r
>         check_command                   check-inap
>         }
> 
> Added it to hostgroups:
> 
> define hostgroup {
>         hostgroup_name  check-inap
>         alias           Check INAP Connection
>         contact_groups  dmiller
>         members         css.int
>         }
> 
> Here is the actual check command:
> 
> define command{
>         command_name    check-inap
>         command_line    /usr/lib/nagios/plugins/check_icmp 192.168.120.100
> }
>        
> (yes, that's a valid IP address over our VPN)
> 
> This is a typical dependency entry:
> 
> define servicedependency{
>         host_name                       css.int
>         service_description             Check INAP Connection
>         dependent_host_name             groupware.int
>         dependent_service_description   Check Disk Utilization
>         execution_failure_criteria      w,u,c   ; These are the criteria 
> for which check execution will be supressed
>         notification_failure_criteria   w,u,c   ; These are the criteria 
> for which notifications will be supressed
>         }
> 
> 
> What's happening is that even if "Check INAP Connection" gets an 
> NRPE_timeout, which should be a condition "unknown", the check of disk 
> utilization for groupware.int is executed, as is the notification.
> 
> What am I missing?  Is this something fixed in more recent versions of 
> nagios?
> 

You're making it overly complicated. Since traffic is going to the 
monitored hosts through your nrpe "proxy" at the data-center, you'd be 
better off by setting up parent/child relations which takes this into 
account and disabling "unreachable" notifications either globally or for 
the hosts and services monitored through the proxy.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list