Problem with "retry check interval"

Andreas Ericsson ae at op5.se
Mon Sep 6 14:36:40 CEST 2004


Alexander Schaefer wrote:
> Hello,
> 
> i am using Nagios 1.2 for monitoring in my network. I have about 90 hosts and
> 130 services in the nagios configuration. The problem, i am confrontatet now is
> following. Per default each host will be checkes for availibility with
> check_host_alive plugin. The check occures each 2 minutes and if the state of
> this service is changed from OK to CRITICAL or WARNING, then "retry check
> interval" for this service should check it 3 times with interval of 1 min.
> befor sending notifications. That is defined ( at the host group level ) for
> all hosts in my monitored environment:
> 
> define service {
>     use    generic-service
>     hostgroup_name    firewalls,mail-server,routers,switchs,win-server,WLAN
>     service_description    PING
>     contact_groups    nagios
>     check_period    24x7
>     notification_interval    120
>     notification_options    w,u,c,r
>     notification_period    24x7
>     check_command    check-host-alive
>     max_check_attempts    3
>     normal_check_interval    2
>     retry_check_interval    1
> #   comment:    Check hosts availability
> }
> 
> My problem-host X1_voicegate is defined in the group routers:
> define hostgroup {
>     hostgroup_name    routers
>     alias    routers
>     contact_groups    router-admins
>     members X1_ras,X1_voicegate,rou2,rou3,rou-internet,rou-internet2 #
> comment    Router host-group
> }
> 
> 
> But i can not understand why, the particular host X1_voicegate "retry check
> interval" works with check delay of only 3 seconds??? This information i can
> see in Nagios event log:
> 
>  [09-06-2004 13:46:12] HOST ALERT: X1_voicegate;DOWN;HARD;3;PING CRITICAL -
> Host Unreachable (10.4.11.13)
> [09-06-2004 13:46:09] HOST ALERT: X1_voicegate;DOWN;SOFT;2;PING CRITICAL - Host
> Unreachable (10.4.11.13)
> [09-06-2004 13:46:06] HOST ALERT: X1_voicegate;DOWN;SOFT;1;PING CRITICAL - Host
> Unreachable (10.4.11.13)
> 
> 
> It is a bug in nagios or there is another invisible configuration posibilities
> in nagios?
> 

Nagios does host checks in a serialized manner, to prevent service 
checks from running when their targeted at an unreachable host. That 
means host check 2 will execute as soon as host check 1 is complete, and 
that explains why you're seeing the logentries above. It's not a bug, 
it's a feature. Nagios is much less pecky about services, so for those 
the retry check interval will work smoothly, but the logic implies that 
the host needs to be up for services to be checked, so it can't put them 
off and risk having service checks being executed in the meantime.


> Thanks for you ideas
> 

You're welcome.

> Alex
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list