BUG: Recovery notifications sent to contacts which never received the initial problem notification

Sascha.Runschke at gfkl.com Sascha.Runschke at gfkl.com
Wed Aug 20 16:25:08 CEST 2008


Greetings,

it seems I triggered a bug with our new nagios instance, as it shows quite 
a strange behaviour.
Quoting from the nagios 3.x documentation: 
http://nagios.sourceforge.net/docs/3_0/notifications.html
Service and Host Filters:

"Note: Notifications about host or service recoveries are only sent out if 
a notification was sent out
for the original problem. It doesn't make sense to get a recovery 
notification for something you never
knew was a problem... "

This is what happened:

1. Service went CRITICAL -> Notifications to the contacts user1-mail, 
user2-mail
2. Service went WARNING -> Notifications to the contacts user1-mail, 
user2-mail
3. Service went OK -> Notifications to the contacts 
user1-mail,user2-mail,user1-sms,user2-sms

vmctx02 CPU     CRITICAL        18-08-2008 16:24:50     user1-mail 
mail-notification       CRITICAL: 15m: average load 100% critical
vmctx02 CPU     CRITICAL        18-08-2008 16:24:50     user2-mail 
mail-notification       CRITICAL: 15m: average load 100% critical
vmctx02 CPU     WARNING 18-08-2008 16:31:50     user1-mail 
mail-notification       WARNING: 15m: average load 99% warning
vmctx02 CPU     WARNING 18-08-2008 16:31:50     user2-mail 
mail-notification       WARNING: 15m: average load 99% warning
vmctx02 CPU     OK      18-08-2008 16:32:50     user1-sms sms-notification 
        OK: 15m: average load 92%
vmctx02 CPU     OK      18-08-2008 16:32:50     user2-sms sms-notification 
        OK: 15m: average load 92%
vmctx02 CPU     OK      18-08-2008 16:32:50     user1-mail 
mail-notification       OK: 15m: average load 92%
vmctx02 CPU     OK      18-08-2008 16:32:50     user2-mail 
mail-notification       OK: 15m: average load 92%

I do not understand why the 2 sms contacts were notified, they never 
received a
problem notification in first place. It was an escalation which triggered 
those sms -
but it shouldn't have in my opinion. It seems it only happens in our 
environment, if
exactly 2 notifications were sent before a recovery.

These are the relevant configs:


Contacts and Templates (user1 and user2 are identical):


define contact {
        name                            generic-contact-mail
        host_notification_period        24x7
        service_notification_period     24x7
        host_notification_options       d,r
        service_notification_options    u,c,w,r
        host_notification_commands      mail-notification
        service_notification_commands   mail-notification
        register                        0
}

define contact {
        contact_name user1-mail
        use                             generic-contact-mail
        alias User1
        email user1 at firma.com
}

define contact {
        name                            generic-contact-sms
        host_notification_period        24x7
        service_notification_period     24x7
        host_notification_options       d,r
        service_notification_options    u,c,r
        host_notification_commands      sms-notification
        service_notification_commands   sms-notification
        register                        0
}

define contact { 
        contact_name                    user1-sms
        use                             generic-contact-sms
        alias                           S R
        pager                           +49-DONT-CALL-ME
} 


Service Templates and Service:


define service {
        name                            generic-service
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           1
        retry_check_interval            3
        active_checks_enabled           1
        passive_checks_enabled          1
        parallelize_check               1
        obsess_over_service             0
        check_freshness                 1
        freshness_threshold             120
        notifications_enabled           1
        notification_interval           60
        notification_period             24x7
        notification_options            u,c,w,r
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        register                        0
}

define service {
        service_description             CPU
        use                             generic-service
        host_name                       vmctx01
        check_command                   check_nrpe_cpu!99%!100%
}


Service Escalation Templates and Escalations: (the escalation_period at 
that time was workhours)


define serviceescalation {
        name                            service-minor-nonworkhours
        first_notification              4
        last_notification               4
        notification_interval           60
        escalation_period               nonworkhours
        escalation_options              r,c
        register                        0
} 
 
 
define serviceescalation {
        name                            service-minor-workhours
        first_notification              2
        last_notification               2
        notification_interval           60
        escalation_period               workhours
        escalation_options              r,c
        register                        0
}

define serviceescalation {
        use                             service-minor-nonworkhours
        host_name 
essctxsir06,essctx10,essctx04,essctxulg04,essctx11,essctxulg03,essctxsir03,essctxj0
1,essctxsir02,essctx03,essctxb06,essctxulg02,essctxsir05,essctxb01,essctxulg05,essctxulg01,essctx07,essctxulg06,essctxtest0
1,essctxtest01a,vmctx01,vmctx02,vmctx03,vmctx05,vmnrzctxulg03,vmnrzctxulg02,vmnrzctxulg01,nrzctxsir02,nrzctxsir01,nrzctxpps
02,nrzctxpps01,nrzctxpcs01,nrzctxpcs02,vmnrzctxpcs02
        service_description             *
        contact_groups                  citrixadmins,citrixadmins-sms
}


define serviceescalation {
        use                             service-minor-workhours
        host_name 
essctxsir06,essctx10,essctx04,essctxulg04,essctx11,essctxulg03,essctxsir03,essctxj0
1,essctxsir02,essctx03,essctxb06,essctxulg02,essctxsir05,essctxb01,essctxulg05,essctxulg01,essctx07,essctxulg06,essctxtest0
1,essctxtest01a,vmctx01,vmctx02,vmctx03,vmctx05,vmnrzctxulg03,vmnrzctxulg02,vmnrzctxulg01,nrzctxsir02,nrzctxsir01,nrzctxpps
02,nrzctxpps01,nrzctxpcs01,nrzctxpcs02,vmnrzctxpcs02
        service_description             *
        contact_groups                  citrixadmins,citrixadmins-sms
}

-- 
Sascha Runschke
Netzwerk-  und  Systemmanagement
Telefon : +49 (201) 102-1879 Mobil : +49 (173) 5419665 Fax : +49 (201) 
102-1102105



GFKL Financial Services AG
Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom Haverkamp
Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma
Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20080820/2051b165/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list