Recovery Notifications and Escalation

Brett Henrich bhenrich at gmail.com
Tue Feb 28 20:35:07 CET 2006


Hi Group,

I've got Nagios 2.0 installed on a Solaris10/x86 machine and am having an
issue when the following conditions exist:

1. Services being monitored by Nagios go down
2. Nagios sends an initial notification to the contact group associated with
the service.
3. Services being monitored by Nagios are still down long enough to cause
the service to escalate.
4. The service escalation sends out a notification to a pager
5. The service escalation sends out another notification to a pager.
6. The services being monitored by Nagios recover
7. Nagios sends a notification to the pager
8. Nagios does not send a notification to the original contact group
associated with the service.

In a nutshell, the primary contact group is an internal mailing list I use
to track problems associated with a clients systems and is designed
primarily to act as a paper trail.

The pager is there to alert me to critical system failures outside of work
hours.

I get the pager recovery message but the internal list does not receive the
recovery notification leading to some other technicians believing that the
systems are still down when they check the list at 8am the next morning.

Can anyone shed some insight? I have searched the list but cannot find
anything about this specific problem.

Below is a cut-down version of my configuration:

Regards,

Brett Henrich


[From services.cfg]

# AnImportantClient Webmail Server

define service{
        use                             generic-service         ; Name of
service template to use
        host_name                       animportantclient.webmail
        service_description             HTTP
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           60
        retry_check_interval            10
        contact_groups                  internal-list-animportantclient
        notification_interval           1200
        notification_period             24x7
        notification_options            w,u,c,r
        check_command                   check_http
        }



[from escalations.cfg]

define serviceescalation {
        host_name               animportantclient.webmail
        service_description     HTTP
        contact_groups          pager
        first_notification      2
        last_notification       0
        notification_interval   1200
        escalation_period       normalbusinesshours
}



[from timeperiods.cfg]

define timeperiod{
        timeperiod_name normalbusinesshours
        alias           "Normal" Working Hours
        sunday          00:00-24:00
        monday          08:00-17:00
        tuesday         08:00-17:00
        wednesday       08:00-17:00
        thursday        08:00-17:00
        friday          08:00-17:00
        saturday        00:00-24:00

        }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060228/7e6773ee/attachment.html>


More information about the Users mailing list