Recovery not being fired off under certain circumstances (repost)

srunschke at abit.de srunschke at abit.de
Wed Nov 30 09:44:21 CET 2005


(this is a repost from nagios-devel as noone answered)

Hi,

lately I stumbled over a few discrepancies in our network monitoring,
that is we were getting Warnings, but never received a Recovery,
even though it was pretty obvious that the service recovered.
I finally was able to pin down the reason for it.

Sadly I am unsure if it has to be seen as "working as intended" or
if it is unexpected behaviour really. Personally I'd call it
"broken as intended".

Excerpt from the config that reproduces the problem:

define service {
host_name                       RMS
use                             generic-SNMP
service_description             RZ_TEMPERATUR
servicegroups                   SMS-SERVICEGROUP
register                        1
check_command 
check_snmp!abit-management!1.3.6.1.4.1.2769.10.4.1.1.3.1!1!30!35
notification_interval           10
stalking_options                c,w,u
notification_options            c,w,u,r
}

define serviceescalation {
host_name                       RMS
service_description             RZ_TEMPERATUR
first_notification              1
last_notification               0
contact_groups                  HOST-CONTACTGROUP-SMS
escalation_period               24x7
escalation_options              c,r,u
}

As this is the temperature check of our monitoring system for our main 
datacenter, I do want it to mail me a warning state - but I do not care
that much about warnings that I want a SMS yet, the contact-groups of
RZ_TEMPERATUR are mail-only groups.
I escalate c,r,u into another contactgroup which has the relevant contacts 

with their pagers in it. Now if the service throws a Warning, we get the
mail. But if it recovers, we neither get mail nor SMS.

Reason for that is, that the recovery is falling into the territory of the 

escalation which then checks who received the notification for this 
recovery
in first place - and this check yields no information for the escalation -
therefor not firing off a recovery at all.
Even IF the check for that info would be tweaked, it would still fire the 
recovery via SMS, which is not my intended behaviour.

How do you guys see this particular problem?
Should Nagios be able to act more differenciated (sp?) on these kind of 
problems or is it my burden to find a hacky-hack solution via nested
contacts/escalations for this? ;)

I'm up for some insights to this matter.

regards
        sash

--------------------------------------------------
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:SRunschke at abit.de

http://www.abit.net
http://www.abit-epos.net
---------------------------------
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list