[Nagios-devel] FW: Timeperiods and oncall rotation with UK Public holidays

Daniel Rich drich at employees.org
Thu Apr 22 18:45:24 CEST 2010


This is one of my pet peeves of Nagios -- that notifications are not particularly flexible.  That is why we have two sets of contacts for everyone, one that notifies via pager and e-mail, and one that only notifies via. e-mail; as there isn't a way to have Nagios do that for us.

We do something very similar to what you are trying to do.  We have a contact "groupname-oncall" that gets assigned to hosts and services a particular group is responsible for.  That alias exists both in our mail system and our paging system, and is updated automatically by a script to point to the correct individual.  The hardest part of this is writing the script to do the updates, as it has to be able to parse a configuration file with the dates/times your individuals are on-call and update the mail and paging systems with the correct information.  In our case, the aliases are stored in LDAP, so it is trivial for us to make the updates.

For years I have wanted to find the time to write a back-end notification script for Nagios that would make notifications more flexible, I just haven't had the time.  I want to be able to do things like:
  o Notify via. e-mail on warnings but send a text message for critical or for some services/hosts send text-messages for everything
  o Only notify the on-call individuals after hours but notify everyone during business hours
  o Remove duplicate messages
  o Allow for two-way messages so an reply can be sent via. e-mail or SMS to ack an issue (I almost have this in place today)
  o Have a configuration file that drives all of the above
 
On Apr 22, 2010, at 02:42, Deborah Martin wrote:

>  
> Is anybody able to help with this ?
>  
> Thanks,
> Deborah
> 
> From: Deborah Martin [mailto:Deborah.Martin at Kognitio.com] 
> Sent: 21 April 2010 12:25
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Timeperiods and oncall rotation with UK Public holidays
> Importance: High
> 
> Folks,
> 
> I'm using SLES 10 and Nagios 3.2.0.
> 
> We have 4 oncall engineers which rotate over a 4 week period, each being oncall one week at a time. 
> The oncall period is 17:30 - 08:00 each working day and then the whole period for any weekend or UK public holiday.
> 
> My definitions are :-
> 
> define timeperiod{ 
>         timeperiod_name 24x7 
>         alias           24 Hours A Day, 7 Days A Week 
>         sunday          00:00-24:00 
>         monday          00:00-24:00 
>         tuesday         00:00-24:00 
>         wednesday       00:00-24:00 
>         thursday        00:00-24:00 
>         friday          00:00-24:00 
>         saturday        00:00-24:00 
>         }
> 
> This is for all normal monitoring of our systems.
> 
> Each oncall engineer is defined :-
> 
> define timeperiod{ 
>         timeperiod_name person1-oncall 
>         alias           person1-oncall 
>         2010-03-29 / 28 17:30-24:00             ; Monday 
>         2010-03-30 / 28 00:00-08:00,17:30-24:00 ; Tuesday 
>         2010-03-31 / 28 00:00-08:00,17:30-24:00 ; Wednesday 
>         2010-04-01 / 28 00:00-08:00,17:30-24:00 ; Thursday 
>         2010-04-02 / 28 00:00-08:00,17:30-24:00 ; Friday 
>         2010-04-03 / 28 00:00-24:00             ; Saturday 
>         2010-04-04 / 28 00:00-24:00             ; Sunday 
>         2010-04-05 / 28 00:00-08:00             ; Monday 
>         }
> 
> define timeperiod{ 
>         timeperiod_name person2-oncall 
>         alias          person2-oncall 
>         2010-04-05 / 28 17:30-24:00             ; Monday 
>         2010-04-06 / 28 00:00-08:00,17:30-24:00 ; Tuesday 
>         2010-04-07 / 28 00:00-08:00,17:30-24:00 ; Wednesday 
>         2010-04-08 / 28 00:00-08:00,17:30-24:00 ; Thursday 
>         2010-04-09 / 28 00:00-08:00,17:30-24:00 ; Friday 
>         2010-04-10 / 28 00:00-24:00             ; Saturday 
>         2010-04-11 / 28 00:00-24:00             ; Sunday 
>         2010-04-12 / 28 00:00-08:00             ; Monday 
>         }
> 
> define timeperiod{ 
>         timeperiod_name person3-oncall 
>         alias           person3-oncall 
>         2010-04-12 / 28 17:30-24:00             ; Monday 
>         2010-04-13 / 28 00:00-08:00,17:30-24:00 ; Tuesday 
>         2010-04-14 / 28 00:00-08:00,17:30-24:00 ; Wednesday 
>         2010-04-15 / 28 00:00-08:00,17:30-24:00 ; Thursday 
>         2010-04-16 / 28 00:00-08:00,17:30-24:00 ; Friday 
>         2010-04-17 / 28 00:00-24:00             ; Saturday 
>         2010-04-18 / 28 00:00-24:00             ; Sunday 
>         2010-04-19 / 28 00:00-08:00             ; Monday 
>         }
> 
> define timeperiod{ 
>         timeperiod_name person4-oncall 
>         alias           person4-oncall 
>         2010-04-19 / 28 17:30-24:00             ; Monday 
>         2010-04-20 / 28 00:00-08:00,17:30-24:00 ; Tuesday 
>         2010-04-21 / 28 00:00-08:00,17:30-24:00 ; Wednesday 
>         2010-04-22 / 28 00:00-08:00,17:30-24:00 ; Thursday 
>         2010-04-23 / 28 00:00-08:00,17:30-24:00 ; Friday 
>         2010-04-24 / 28 00:00-24:00             ; Saturday 
>         2010-04-25 / 28 00:00-24:00             ; Sunday 
>         2010-04-26 / 28 00:00-08:00             ; Monday 
>         }
> 
> I have escalations set for one particular client which will happen during oncall hours only and depending on the notification number, (4,5,6) will send an SMS alert to the relevant person oncall.
> 
> ## Escalation ONE: 
> define serviceescalation { 
>         host_name               dbhost1 
>         service_description     DB Conn Check 
>         first_notification      4 
>         last_notification       6 
>         notification_interval   15 
>         escalation_options      c       ; Only escalate for CRITICAL alerts 
>         escalation_period       oncall 
>         contact_groups          wx2-sms-oncall-group 
>         }
> 
> define timeperiod{ 
>         timeperiod_name oncall 
>         alias           Oncall Hours 
>         sunday          00:00-24:00 
>         monday          00:00-08:00,17:30-24:00 
>         tuesday         00:00-08:00,17:30-24:00 
>         wednesday       00:00-08:00,17:30-24:00 
>         thursday        00:00-08:00,17:30-24:00 
>         friday          00:00-08:00,17:30-24:00 
>         saturday        00:00-24:00 
>         }
> 
> And the sms-oncall-group defined for the service escalation includes all 4 oncall engineers but only the person actually oncall should get the sms alert based on their oncall timeperiods.
> 
> 
> define contactgroup{ 
>       contactgroup_name       wx2-sms-oncall-group 
>       alias                   WX2 Oncall 
>       members                 person1-oncall, person2-oncall, person3-oncall, person4-oncall 
>       }
> 
> However, I've now hit a snag - how do I define UK public holidays periods as being 24 hours (particularly if they fall on a weekday) and put that timeperiod into each oncall engineers timeperiod so whoever is oncall on a particular UK public holiday will get the escalation alerts for the entire 24 hour period rather than the usual defined oncall period of "00:00-08:00 and 17:30-24:00"
> 
> I'd rather not explicitly define a UK holiday date to an oncall engineer as this would need to be maintained. I'd rather just have to update
> 
> the timeperiod if the person rota'ed cannot cover that particular timeperiod as this will be few and far between in comparison.
> 
> If anymore info is required please let me know. I'm probably missing something obvious but I've read the docs over a few times and can't seem to see what I want to do in there.
> 
> Any pointers, help would be really appreciated.
> 
> Thanks, 
> Deborah
> 
> 
> 
> 
> 
> 
> ***************************************************************************
> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. 
> 
> Any unauthorised distribution or copying is strictly prohibited. 
> Whilst Kognitio Limited takes steps to prevent the transmission of viruses via e-mail, we can not guarantee that any email or attachment is free from computer viruses and you are strongly advised to undertake your own anti-virus precautions. 
> 
> Kognitio grants no warranties regarding performance, use or quality of any e-mail or attachment and undertakes no liability for loss or damage, howsoever caused. 
> 
> Kognitio Limited, a company registered in England and Wales. Registered number 0212 7833. Registered Office: 3a Waterside Park, Cookham Road, Bracknell, Berks, RG12 1RB. VAT number 864 4378 92.
> 
> Kognitio Inc, a company incorporated in Delaware, principal office 180 North Stetson, Suite 3500, Chicago, IL 60601, USA
> ***************************************************************************
> 
> ***************************************************************************
> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. 
> 
> Any unauthorised distribution or copying is strictly prohibited. 
> Whilst Kognitio Limited takes steps to prevent the transmission of viruses via e-mail, we can not guarantee that any email or attachment is free from computer viruses and you are strongly advised to undertake your own anti-virus precautions. 
> 
> Kognitio grants no warranties regarding performance, use or quality of any e-mail or attachment and undertakes no liability for loss or damage, howsoever caused. 
> 
> Kognitio Limited, a company registered in England and Wales. Registered number 0212 7833. Registered Office: 3a Waterside Park, Cookham Road, Bracknell, Berks, RG12 1RB. VAT number 864 4378 92.
> 
> Kognitio Inc, a company incorporated in Delaware, principal office 180 North Stetson, Suite 3500, Chicago, IL 60601, USA
> ***************************************************************************
> ------------------------------------------------------------------------------
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null------------------------------------------------------------------------------
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel



Dan Rich <drich at employees.org> |   http://www.employees.org/~drich/
                               |  "Step up to red alert!"  "Are you sure, sir?
                               |   It means changing the bulb in the sign..."
                               |          - Red Dwarf (BBC)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100422/aac66342/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Developers mailing list