Recovery Notifications and Escalation

Brett Henrich bhenrich at gmail.com
Tue Feb 28 20:35:07 CET 2006

Previous message: Securing Linux
Next message: Recovery Notifications and Escalation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Group,

I've got Nagios 2.0 installed on a Solaris10/x86 machine and am having an
issue when the following conditions exist:

1. Services being monitored by Nagios go down
2. Nagios sends an initial notification to the contact group associated with
the service.
3. Services being monitored by Nagios are still down long enough to cause
the service to escalate.
4. The service escalation sends out a notification to a pager
5. The service escalation sends out another notification to a pager.
6. The services being monitored by Nagios recover
7. Nagios sends a notification to the pager
8. Nagios does not send a notification to the original contact group
associated with the service.

In a nutshell, the primary contact group is an internal mailing list I use
to track problems associated with a clients systems and is designed
primarily to act as a paper trail.

The pager is there to alert me to critical system failures outside of work
hours.

I get the pager recovery message but the internal list does not receive the
recovery notification leading to some other technicians believing that the
systems are still down when they check the list at 8am the next morning.

Can anyone shed some insight? I have searched the list but cannot find
anything about this specific problem.

Below is a cut-down version of my configuration:

Regards,

Brett Henrich

[From services.cfg]

# AnImportantClient Webmail Server

define service{
use generic-service ; Name of
service template to use
host_name animportantclient.webmail
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 60
retry_check_interval 10
contact_groups internal-list-animportantclient
notification_interval 1200
notification_period 24x7
notification_options w,u,c,r
check_command check_http
}

[from escalations.cfg]

define serviceescalation {
host_name animportantclient.webmail
service_description HTTP
contact_groups pager
first_notification 2
last_notification 0
notification_interval 1200
escalation_period normalbusinesshours
}

[from timeperiods.cfg]

define timeperiod{
timeperiod_name normalbusinesshours
alias "Normal" Working Hours
sunday 00:00-24:00
monday 08:00-17:00
tuesday 08:00-17:00
wednesday 08:00-17:00
thursday 08:00-17:00
friday 08:00-17:00
saturday 00:00-24:00

}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060228/7e6773ee/attachment.html>

Previous message: Securing Linux
Next message: Recovery Notifications and Escalation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Users mailing list