Need help with service failure recheck interval

Marc Powell marc at ena.com
Wed Mar 1 22:48:11 CET 2006



> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of prosolutions at gmx.net
> Sent: Wednesday, March 01, 2006 3:02 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Need help with service failure recheck
interval
> 
> 
> Using nagios 2.0b3 and have the following host and related service
> defined:
> 

[removed host and service definitions. Thanks!]
 
> test_client is a simple script that tests the connectivity of the
custom
> service that at listening at 172.16.0.50.  It is set to timeout and
fail
> at 35 seconds.
> 
> interval_length is set to 1, therefore normal_check_interval=1500
should
> be 1500s=25m and retry_check_interval=20 should be 20s.  However last
> night test_client failed and gave initial notification of state
CRITICAL
> at 00:37:21.  Manually testing by me immediately after receiving this
> notification showed that the service was indeed back up therefore I am
> almost 100% certain that subsequent test_client checks by nagios would
> have succeeded.  However, I did not receive an RECOVERY alert until
> 01:26:46, which is almost exactly 50 minutes after the initial
CRITICAL
> alert.

I would expect this to have been ~25 minutes.

> Also, I have another test for a similar service with identical
> parameters, and in that case it didn't send a RECOVERY alert until
> almost exactly 75 minutes after the CRITICAL alert.

Have you checked nagios.log to verify that your assumptions about the
check results in both cases are accurate? We'd need those and related
entries to know what's really going on, else we're just speculating. If
you're not seeing sufficient log entries, make sure you've enabled log_
options in nagios.cfg. You may also want to consider enabling debug
options at compile time for additional logging. As noted in nagios.cfg,
using an interval_length other than 60 isn't common and isn't thoroughly
tested.

I for one have not experienced notification issues such as this using
many versions of Netsaint/Nagios.

Also, you should be using the 2.0 release now to make sure you're not
hitting a bug that's already been fixed.

--
Marc


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list