Overlapping escalations not working as documented?

Gwyn Connor gwyn.connor at googlemail.com
Thu Aug 6 16:35:10 CEST 2009


Hi,

I am trying to have Nagios 3.1.2 alert me every morning at 8 am of all current
service failures by sms.

Services are currently checked 24/7, but notifications are only sent during
work hours (08:00-20:00) and only every 3 hours. Now if a service goes down
shortly before the notification_period starts, it takes 3 hours until the next
notification is sent, which is too long.

I have tried using escalations to get notified at 8 am, but it is not working:

# 'workhours' timeperiod definition
define timeperiod{
        timeperiod_name workhours
        alias           "Normal" Working Hours
        monday          08:00-20:00
        tuesday         08:00-20:00
        wednesday       08:00-20:00
        thursday        08:00-20:00
        friday          08:00-20:00
        }

# 'morningchecktime' timeperiod definition
define timeperiod{
        timeperiod_name morningchecktime
        alias           Morning Check Time
        monday          07:49-08:00
        tuesday         07:49-08:00
        wednesday       07:49-08:00
        thursday        07:49-08:00
        friday          07:49-08:00
        }

define contact{
        contact_name                    c-sms-morning
        alias                           Morning alert via SMS
        service_notification_period     morningchecktime
        host_notification_period        morningchecktime
        service_notification_options    c,r
        host_notification_options       d,r
        service_notification_commands   notify-service-by-sms
        host_notification_commands      notify-host-by-sms
        email                           <email-address>
        }
define contactgroup{
        contactgroup_name       sms-morning
        alias                   morning SMS
        members                 c-sms-morning
        }

define service{
        name                            test-service
        use                             service
        check_period                    24x7
        max_check_attempts              6
        normal_check_interval           5
        retry_check_interval            2
        contact_groups                  admins
        notification_options            w,u,c,r
        notification_interval           180
        notification_period             24x7
        register                        0
        }

# Test
define service{
        use                             test-service
        host_name                       test
        service_description             Disk /
        check_command                   check_snmp_disk!/!10!20
        }
define serviceescalation{
        host_name                       test
        service_description             Disk /
        contact_groups                  admins
        first_notification              1
        last_notification               0
        notification_interval           180
        escalation_period               24x7
        escalation_options              c,r
        }
define serviceescalation{
        host_name                       test
        service_description             Disk /
        contact_groups                  sms-morning
        first_notification              1
        last_notification               0
        notification_interval           5
        escalation_period               morningchecktime
        escalation_options              c,r
        }

In the documentation it says about overlapping service escalations:
"In any case where there are multiple valid escalation definitions for a
particular notification, Nagios will choose the smallest notification interval."

However, in my case it seems to use the biggest interval. Example:
1. The service goes CRITICAL into HARD state at 6:00 am.
2. The admins are not notified, because it is not yet workhours.
3. Time passes until 07:49.
4. Since the service is checked every 5 minutes, it will also be checked
   at least once within the morningchecktime escalation period (07:49-08:00).
   The sms-morning contact group should be notified now (its
   notification_period is also morningchecktime). But it isn't notified.

When I changed the morningchecktime period to cover more time (07:49-11:00),
then at 9:00 am - exactly 180 minutes after failure - notifications are sent
both to admins AND sms-morning. It looks like Nagios is using the bigger
notification_interval of both overlapping escalations.

Any ideas how I can fix it to make it work? Maybe I still have an error in my
config file that I overlooked?

Gwyn

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list