Problem with time between soft down checks

Josh Van As JVanas at finncorp.com
Thu Mar 18 15:00:29 CET 2004


We just installed Nagios 1.2 as an upgrade to 1.1.   We had the same
problem I am about to describe in 1.1, and was hoping that 1.2 fixed it.
It did not.

Our desired behavior is that when a service or host soft fails, we want
Nagios to wait 1 minute then re-check.  Repeat this a total of 5 failed
checks (5th one being HARD) before sending out notification.

The problem we are having, as you can se from the sample below, is that
Nagios is only waiting 3 seconds in-between soft fail checks.  Instead
of a host / service taking 4 minutes to fail 4 additional times (before
notification) it only takes about 12 seconds.

We are getting a lot of false pages because just about any network
glitch can last 12 seconds.

Has anyone seen this before?  Can you please help!  We love this
product, but this is driving us crazy with pages!  Is this a problem
with our perl installation?  Are we missing a module or something?  Or
do we have the config files setup wrong?


TIA!
-Josh


Sample problem:

[03-18-2004 08:44:02] HOST NOTIFICATION:
rich;fcprt0013;DOWN;host-notify-by-epager;/bin/ping -n -U -c 1
172.16.1.86
[03-18-2004 08:44:02] HOST ALERT: fcprt0013;DOWN;HARD;5;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:59] HOST ALERT: fcprt0013;DOWN;SOFT;4;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:56] HOST ALERT: fcprt0013;DOWN;SOFT;3;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:53] HOST ALERT: fcprt0013;DOWN;SOFT;2;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:50] HOST ALERT: fcprt0013;DOWN;SOFT;1;/bin/ping -n -U
-c 1 172.16.1.86


Here is the service definition for this service:

define service{
        name                            generic-service
        active_checks_enabled           1 
        passive_checks_enabled          1       
        parallelize_check               1       
        obsess_over_service             0       
        check_freshness                 1       
        freshness_threshold             0
        notifications_enabled           1       
        event_handler_enabled           1       
        flap_detection_enabled          1       
        process_perf_data               1       
        retain_status_information       1       
        retain_nonstatus_information    0       
        max_check_attempts      5
        normal_check_interval   1
        retry_check_interval    1
        check_period            24x7
        notification_interval   60
        notification_period     wakinghours
        notification_options    w,c,r
        register                        0       
}

define service{
        use                             generic-service
        host_name                       fcprt0013
        service_description     ping
        check_command           check-host-alive
        contact_groups          finncontacts
}


Here is the host definition for this host:

define host{
        name                            generic-host
        checks_enabled                  1
        notifications_enabled           1       
        event_handler_enabled           1       
        flap_detection_enabled          1       
        low_flap_threshold              0
        high_flap_threshold             0
        process_perf_data               1       
        retain_status_information       1       
        retain_nonstatus_information    0       
        max_check_attempts              5
        notification_interval   60
        notification_period     wakinghours
        notification_options    d,r
        register                        0       
}

define host{
        use                             generic-host
        host_name                       fcprt0013
        alias                           fcprt0013.finncorp.com
        address                 172.16.1.86
        check_command           check-host-alive
        parents                 fcnet0007
}


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list