BUG: Recovery notifications sent to contacts which never received the initial problem notification

CHRIS TSENG (ULI-HK) CHRISTSENG at UnitedLuminous.com
Wed Aug 20 16:30:39 CEST 2008


Hello,

I am having the notification issue which I using is 3.03.
The email alert is hard to setup. Do you have any idea on it?

Many thanks,

Chris
Sent from my BlackBerry® wireless device

----- Original Message -----
From: nagios-devel-bounces at lists.sourceforge.net <nagios-devel-bounces at lists.sourceforge.net>
To: Nagios Developers List <nagios-devel at lists.sourceforge.net>
Sent: Wed Aug 20 22:25:08 2008
Subject: [Nagios-devel] BUG: Recovery notifications sent to contacts which never received the initial problem notification


Greetings, 

it seems I triggered a bug with our new nagios instance, as it shows quite a strange behaviour. 
Quoting from the nagios 3.x documentation: http://nagios.sourceforge.net/docs/3_0/notifications.html 
Service and Host Filters: 

"Note: Notifications about host or service recoveries are only sent out if a notification was sent out 
for the original problem. It doesn't make sense to get a recovery notification for something you never 
knew was a problem... " 

This is what happened: 

1. Service went CRITICAL -> Notifications to the contacts user1-mail, user2-mail 
2. Service went WARNING -> Notifications to the contacts user1-mail, user2-mail 
3. Service went OK -> Notifications to the contacts user1-mail,user2-mail,user1-sms,user2-sms 

vmctx02        CPU        CRITICAL        18-08-2008 16:24:50         user1-mail        mail-notification         CRITICAL: 15m: average load 100% critical 
vmctx02        CPU        CRITICAL        18-08-2008 16:24:50        user2-mail        mail-notification         CRITICAL: 15m: average load 100% critical 
vmctx02        CPU        WARNING        18-08-2008 16:31:50        user1-mail        mail-notification         WARNING: 15m: average load 99% warning 
vmctx02        CPU        WARNING        18-08-2008 16:31:50        user2-mail        mail-notification         WARNING: 15m: average load 99% warning 
vmctx02        CPU        OK        18-08-2008 16:32:50        user1-sms        sms-notification         OK: 15m: average load 92% 
vmctx02        CPU        OK        18-08-2008 16:32:50        user2-sms        sms-notification         OK: 15m: average load 92% 
vmctx02        CPU        OK        18-08-2008 16:32:50        user1-mail        mail-notification         OK: 15m: average load 92% 
vmctx02        CPU        OK        18-08-2008 16:32:50        user2-mail        mail-notification         OK: 15m: average load 92% 

I do not understand why the 2 sms contacts were notified, they never received a 
problem notification in first place. It was an escalation which triggered those sms - 
but it shouldn't have in my opinion. It seems it only happens in our environment, if 
exactly 2 notifications were sent before a recovery. 

These are the relevant configs: 


Contacts and Templates (user1 and user2 are identical): 


define contact { 
            name                                generic-contact-mail 
            host_notification_period            24x7 
            service_notification_period         24x7 
            host_notification_options           d,r 
            service_notification_options        u,c,w,r 
            host_notification_commands          mail-notification 
            service_notification_commands   mail-notification 
            register                            0 
} 

define contact { 
            contact_name user1-mail 
            use                                 generic-contact-mail 
            alias User1 
            email user1 at firma.com 
} 

define contact { 
        name                            generic-contact-sms 
        host_notification_period        24x7 
        service_notification_period     24x7 
        host_notification_options       d,r 
        service_notification_options    u,c,r 
        host_notification_commands      sms-notification 
        service_notification_commands   sms-notification 
        register                        0 
} 

define contact {                         
        contact_name                    user1-sms 
        use                             generic-contact-sms 
        alias                           S R 
        pager                           +49-DONT-CALL-ME 
}       


Service Templates and Service: 


define service { 
            name                                generic-service 
            is_volatile                         0 
            check_period                        24x7 
            max_check_attempts                  3 
            normal_check_interval               1 
            retry_check_interval                3 
            active_checks_enabled               1 
            passive_checks_enabled              1 
            parallelize_check                   1 
            obsess_over_service                 0 
            check_freshness                     1 
            freshness_threshold                 120 
            notifications_enabled               1 
            notification_interval               60 
            notification_period                 24x7 
            notification_options                u,c,w,r 
            event_handler_enabled               1 
            flap_detection_enabled              1 
            process_perf_data                   1 
            retain_status_information           1 
            retain_nonstatus_information        1 
            register                            0 
} 

define service { 
            service_description                 CPU 
            use                                 generic-service 
            host_name                           vmctx01 
            check_command                       check_nrpe_cpu!99%!100% 
} 


Service Escalation Templates and Escalations: (the escalation_period at that time was workhours) 


define serviceescalation { 
            name                                service-minor-nonworkhours 
            first_notification                  4 
            last_notification                   4 
            notification_interval               60 
            escalation_period                   nonworkhours 
            escalation_options                  r,c 
            register                            0 
}           
            
            
define serviceescalation { 
            name                                service-minor-workhours 
            first_notification                  2 
            last_notification                   2 
            notification_interval               60 
            escalation_period                   workhours 
            escalation_options                  r,c 
            register                            0 
} 

define serviceescalation { 
            use                                 service-minor-nonworkhours 
            host_name                           essctxsir06,essctx10,essctx04,essctxulg04,essctx11,essctxulg03,essctxsir03,essctxj0 
1,essctxsir02,essctx03,essctxb06,essctxulg02,essctxsir05,essctxb01,essctxulg05,essctxulg01,essctx07,essctxulg06,essctxtest0 
1,essctxtest01a,vmctx01,vmctx02,vmctx03,vmctx05,vmnrzctxulg03,vmnrzctxulg02,vmnrzctxulg01,nrzctxsir02,nrzctxsir01,nrzctxpps 
02,nrzctxpps01,nrzctxpcs01,nrzctxpcs02,vmnrzctxpcs02 
            service_description                 * 
            contact_groups                      citrixadmins,citrixadmins-sms 
} 


define serviceescalation { 
            use                                 service-minor-workhours 
            host_name                           essctxsir06,essctx10,essctx04,essctxulg04,essctx11,essctxulg03,essctxsir03,essctxj0 
1,essctxsir02,essctx03,essctxb06,essctxulg02,essctxsir05,essctxb01,essctxulg05,essctxulg01,essctx07,essctxulg06,essctxtest0 
1,essctxtest01a,vmctx01,vmctx02,vmctx03,vmctx05,vmnrzctxulg03,vmnrzctxulg02,vmnrzctxulg01,nrzctxsir02,nrzctxsir01,nrzctxpps 
02,nrzctxpps01,nrzctxpcs01,nrzctxpcs02,vmnrzctxpcs02 
            service_description                 * 
            contact_groups                      citrixadmins,citrixadmins-sms 
} 

-- 
Sascha Runschke
Netzwerk-  und  Systemmanagement
Telefon : +49 (201) 102-1879 Mobil : +49 (173) 5419665 Fax : +49 (201) 102-1102105 


GFKL Financial Services AG
Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom Haverkamp
Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma
Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20080820/07f845b9/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list