Dumb Escalation Question

Chris Stankaitis chris.stankaitis at datawire.net
Thu Jan 16 16:48:33 CET 2003


Hey All;

I have Nagios installed and working about 99% with lots of active and 
passive checks going on and such and it's all happy except for one 
thing.. I am supose to have an escalation matrix happening when a box or 
service goes down and it's not escalating.. I am sure it's something 
dumb on my part but with the complexity of the escalation/notify I am 
having a hard time getting my head around it..

What I want is...

1) When BOX or SERVICE goes down/into changed state for it to take 5-6 
mins to be in a SOFT state, to recheck during the soft state multiple 
times and if it all recovers from the soft state before the interval 
timeout then no one gets paged..  if the box goes into a hard state it 
needs to do the following.  Page Level one,  give him/her 15 mins to 
acknowledge the problem re-paging him/her every 5 mins, after the first 
15 mins it escaltes to Level 2 and gives Level 1 and 2 another 15 mins 
to acknowledge the problem again paging again every 5 mins. if after 30 
mins from the start of the hard state no one acknowledges the issue then 
page Level 1 Level 2 and the Manager a couple of times.

Below are examples of my current configs with just some names edited 
out.. please help me if you can :)


define host{
         use                     generic-host
         host_name               host1
         alias                   A Server
         address                 0.0.0.0
         parents                 gatewayrouter
         check_command           check-host-alive
         max_check_attempts      3
         notification_interval   120
         notification_period     24x7
         notification_options    d,r
         }

define service{
         use                             generic-service
         host_name                       host1,host2,,host3,host4,host5
         service_description             SSH
         is_volatile                     0
         check_period                    24x7
         max_check_attempts              3
         normal_check_interval           2
         retry_check_interval            1
         contact_groups                  poor-oncall-guy
         notification_interval           120
         notification_period             24x7
         notification_options            w,u,c,r
         check_command                   check_ssh
         }

define hostgroupescalation{
         hostgroup_name                  hostgroupnamehere
         first_notification              2
         last_notification               5
         contact_groups                  poor-oncall-guy
         notification_interval           5
         }

define hostgroupescalation{
         hostgroup_name                  hostgroupnamehere
         first_notification              5
         last_notification               6
         contact_groups                  poor-oncall-guy,rest-of-unix-admins
         notification_interval           15
         }

define hostgroupescalation{
         hostgroup_name                  hostgroupnamehere
         first_notification              7
         last_notification               10
         contact_groups 
poor-oncall-guy,rest-of-unix-admins,manager
         notification_interval           15
         }






-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache 
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en




More information about the Users mailing list