max_check_attempts, retry_check_interval, and notifications: confusion

John P. Rouillard rouilj at cs.umb.edu
Mon Mar 6 02:09:22 CET 2006


In message <20060305051646.GB3860 at think.alaya.net>,
prosolutions at gmx.net writes:
>I am trying to configure the following behavior from nagios:
>1. check a service every normal_check_interval
>2. if service check fails, up the check rate to retry_check_interval
>3. if 2 successive service checks fail, send notification
>4. continue to check at retry_check_interval until service check
>   succeeds and send notification

Yup 4 is the tough one.

>my understanding is that once max_check_attempts is reached the service
>check interval returns to normal_check_interval even if the service
>is still down.

Correct.

>but this does not make sense to me.

The idea for the shortened retry check interval is to allow faster
checks while the service is in a soft error state. This way you can
run multiple (soft) checks within the time it would take to perform a
single normal check. Once the hard state resumes, the "normal" check
interval will resume. It would probably have been better if the
intervals were called "hard_check_interval" and "soft_check_interval"
rather than normal/retry since normal makes it sound like it should be
used for the "normal" state which one hopes is "ok" 8-).

>if a service is
>down - it seems logical to up the check interval and try a couple more
>checks before sending an alert.  but if the service has not recovered i
>don't want the check interval to go back to normal.

You don't say what version of nagios you are using but there are a
couple of ways to handle this. I believe I saw a patch for nagios 1.x
that added another check_interval option. I want to say
error_check_interval, but that's not getting any hits on google.  I
think it was on the nagios developer's list, but it could have been
nagios-users.

For nagios 2.x you can use the adaptive monitoring (see manual)
command: 'CHANGE_NORMAL_SVC_CHECK_INTERVAL:interval' to change the
interval from an event handler. I would suggest using the
objects.cache file to determine the configured normal_check_interval
and retry_check_interval. You may have to cache that info for your
event handler as I am not sure if that file is re-written when the
intervals change.

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list