nagios on call schedule w/ escalations?

Jon Angliss jon at netdork.net
Thu Oct 2 10:13:35 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 30 Sep 2008 16:22:16 -0500, Charlie Reddington
<charlie.reddington at gmail.com> wrote:

>Hi guys / gals,
>
>I am working on the final stages of my nagios setup, but I'm entering  
>territory which I haven't been before and can use some guidance.

I'm sure you've probably taken a peek at the "On Call Rotations"
details in the documentation:

  http://nagios.sourceforge.net/docs/3_0/oncallrotation.html

There are plenty of examples to get a good idea.

>Here's what I'm trying to achieve. We have a team of 3 admins, where  
>we rotate weeks who is on call. Of course, they aren't every other 3rd  
>week , because of people having vacation time, etc. So some weeks  
>people are on call for 2 weeks, or every 2 weeks, etc.
>
>What we'd like is, to have a schedule setup where the primary guy gets  
>woken up first. But if he doesn't answer his call after an hour, it  
>drops down to the rest of us admins. No matter if your just at home  
>sleeping, or if your on vacation, you get pinged. After that it goes  
>up to our manager.

>I can figure out the setting of people's initial schedule, as I have  
>it looking something like this....
>
># contacts
>
>define contact{
>         contact_name                    user1
>         use                             generic-contact
>         alias                           user1
>         email                           user1
>         host_notification_period        user1_oncall
>         service_notfication_period      user1_oncall
>         }
>
>define contact{
>         contact_name                    user2            
>         use                             generic-contact
>         alias                           user2
>         email                           user2
>         host_notification_period        user2_oncall
>         service_notfication_period     user2_oncall
>         }
>
>define contact{
>         contact_name                    user3
>         use                             generic-contact
>         alias                           user3
>         email                           user3
>         host_notification_period        user3_oncall
>         service_notfication_period    user3_oncall
>         }
>define contact{
>    contact_name        manager1
>    use                    generic-contact
>    email                manager1
>    }
>
># groiups
>
>define contactgroup{
>    contact_groupname admins
>    members user1,user2,user3
>}
>define contactgroup{
>    contact_groupname managers
>    members manager1
>}
>
># Time periods
>
>define timeperiod{
>         timeperiod_name user1_oncall
>         Sept 29 - Oct 5 00:00-24:00
>         Oct 20 - Oct 26 00:00-24:00
>         Nov 17 - Nov 23 00:00-24:00
>         Dec 1 - Dec 7 00:00-24:00
>         Dec 15 - Dec 21 00:00-24:00
>}
>
>define timeperiod{
>         timeperiod_name user2_oncall
>         Oct 6 - Oct 12 00:00-24:00
>         Nov 3 - Nov 9  00:00-24:00
>         Nov 24 - Nov 30 00:00-24:00
>         Dec 22 - Dec 23 00:00-24:00
>}
>
>define timeperiod{
>         timeperiod_name user3_oncall
>         Oct 13 - Oct 19 00:00-24:00
>         Oct 27 - Nov 2  00:00-24:00
>         Nov 10 - Nov 16 00:00-24:00
>         Dec 8 - Dec 14  00:00-24:00
>}

>Would / Does escalations trump the initial contacts?
>
># First escalations
>define serviceescalation{
>         hostgroup_name          Servers
>         service_description     *
>         first_notification      2
>         last_notification       3
>         notification_interval   30
>         contact_groups          admins
>}
>
># Second escalations
>define serviceescalation{
>         hostgroup_name          Servers
>         service_description     *
>         first_notification      3
>         last_notification       8
>         notification_interval   60
>         contact_groups          admins,managers
>}
>
>So I know this isn't quite right, as our admins are part of the admin  
>group, but also trying to restrict when they get contacted. So I'm not  
>really sure how to proceed with this.

You might want to read up on notifications, and serviceescalations,
too... Looking at the time stuff you've got, what'll happen is at any
one point, only 1 of the admins will be reachable by notifications at
any time.  This is because the "timeperiods" stop nagios from sending
notifications to a user that is outside their timeperiod.  For
example, a host goes down at 2100 on Oct 15th, only user3 will be
notified, even after the escalations kick in.  There will be a period
of 0-3 notifications where user3 is the only recipient.  It'll only
get to another person when the 3rd notification goes out, and it
engages the "managers" contact group.

Depending on how many users/admins you're looking at, you could use a
trick with templating, and inheritence. Keeping your base users as you
have above, then build escalation users' and groups.

define timeperiod {
    timeperiod_name    AllTimes
    alias                All Times
    sunday            00:00-24:00
    monday            00:00-24:00
    tuesday            00:00-24:00
    wednesday        00:00-24:00
    thursday        00:00-24:00
    friday            00:00-24:00
    saturday        00:00-24:00
}

define contact {
    contact_name                disable_times
    host_notification_period        AllTimes
    service_notification_period    AllTimes
    register                        0
}

define contact{
         contact_name                    user1
         use                             generic-contact
         alias                           user1
         email                           user1
         host_notification_period        user1_oncall
         service_notfication_period      user1_oncall
}

define contact{
         contact_name                    user2
         use                             generic-contact
         alias                           user2
         email                           user2
         host_notification_period        user2_oncall
         service_notfication_period      user2_oncall
}

define contact {
    use                    disable_times,user1
    contact_name        user1_esc
}

define contact {
    user                disable_times,user2
    contact_name        user2_esc
}

define contactgroup {
    contactgroup_name        admins
    members                user1,user2
}

define contactgroup {
    contactgroup_name        admins_esc
    members                user1_esc,user2_esc
}

Then your service escalations use admins_esc instead of just admins. 
I've not tested it, but looking at the way inheritence works, you
should be OK.

- -- 
Jon Angliss


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32) - GPGshell v3.64

iEYEARECAAYFAkjkgqMACgkQK4PoFPj9H3MthQCg4XgD5eNyl190umm7Ew8OouKK
kCoAoNsRdPjpTMX/tO/eC00ejVb3MjzF
=XHky
-----END PGP SIGNATURE-----


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list