No notification on hard state change after an acknowledgement

Scott Gwartney scott.gwartney at nwea.org
Wed Apr 30 23:06:21 CEST 2008


Sorry for the long subject and post. We're running 2.10 on CentOS 5.
When we acknowledge a service alert that goes into warning, we're not
receiving an alert when it goes into critical. 

 

For example: we're monitoring the E drive on a file server. The drive
goes into a warning state, Nagios sends an alert, and an acknowledgement
is entered. Later the drive goes critical, but an alert is never sent.
Following are the relevant log entries and config files. Thanks for the
help!

 

Log File:

E drive goes into warning

Apr 29 15:10:38 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;WARNING;notify-by-epager;e:\ - total: 263.99 Gb - used:
243.89 Gb (92%) - free 20.10 Gb (8%) 

 

E drive is acknowledged

Apr 29 15:11:26 DataCenterMon nagios: EXTERNAL COMMAND:
ACKNOWLEDGE_SVC_PROBLEM;X;Disk Usage E Drive;2;1;1;Nagios Admin;jf 

 

Acknowledge is sent

Apr 29 15:11:26 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-email;e:\ - total:
263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf 

Apr 29 15:11:27 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-epager;e:\ - total:
263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf 

 

E drive goes critical no alert sent

Apr 30 10:07:16 DataCenterMon nagios: SERVICE ALERT: X;Disk Usage E
Drive;CRITICAL;HARD;3;e:\ - total: 263.99 Gb - used: 251.33 Gb (95%) -
free 12.67 Gb (5%) 

Apr 30 11:04:16 DataCenterMon nagios: EXTERNAL COMMAND:
SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578654 

 

Acknowledgement is removed and alert is sent.

Apr 30 11:05:19 DataCenterMon nagios: EXTERNAL COMMAND:
REMOVE_SVC_ACKNOWLEDGEMENT;X;Disk Usage E Drive 

Apr 30 11:05:49 DataCenterMon nagios: EXTERNAL COMMAND:
SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578747 

Apr 30 11:05:57 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;CRITICAL;notify-by-email;e:\ - total: 263.99 Gb - used:
254.71 Gb (96%) - free 9.29 Gb (4%)

 

 

# Host Template for Critical Hosts -- [E]Pager and Email Notification to
x 27x7

define host{

                name
Critical_Host       ; The name of this host template - referenced in
other host definitions, used for template recursion/resolution

                notifications_enabled                    1
; Host notifications are enabled

                event_handler_enabled              1              ; Host
event handler is enabled

                flap_detection_enabled               1              ;
Flap detection is enabled

                process_perf_data                         1
; Process performance data

                retain_status_information          1              ;
Retain status information across program restarts

                retain_nonstatus_information  1              ; Retain
non-status information across program restarts

                notification_period                         24x7
; Notifies 24x365

                notification_options                       d,u,r
;Down, Up, Recovery

                notification_interval                       5
;Sends Page/Email every 5 minutes

                check_command
check_ping!1000.0,20%!30000.0,100%    ;Warns at 20% packet loss or round
trip time > 1000 MS Critical at 100% packet loss or 30000 MS roun trip 

                max_check_attempts                    5
;Checks host 5 times before generating an alert

                contact_groups                                x

                register
0              ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST,
JUST A TEMPLATE!

                }

 

# 'NWWEBNAS' host definition

define host{

                use
Critical_Host                       ; Name of host template to use

                host_name                         X

                alias                                       Production
File Server

                address                                x.x.x.x

                parents                                X

                }

 

# Crtitical Service definition template

define service{

                name
Critical_Service ; The 'name' of this service template, referenced in
other service definitions

                active_checks_enabled                                1
; Active service checks are enabled

                passive_checks_enabled                             1
; Passive service checks are enabled/accepted

                parallelize_check
1              ; Active service checks should be parallelized (disabling
this can lead to major performance problems)

                obsess_over_service
1              ; We should obsess over this service (if necessary)

                is_volatile
0

                check_freshness
0              ; Default is to NOT check service 'freshness'

                notifications_enabled                    1
; Service notifications are enabled

                event_handler_enabled                              1
; Service event handler is enabled

                flap_detection_enabled                               1
; Flap detection is enabled

                process_perf_data
1              ; Process performance data

                retain_status_information                          1
; Retain status information across program restarts

                retain_nonstatus_information  1              ; Retain
non-status information across program restarts

                event_handler_enabled                              1
;Event handler is enabled

                check_period
24x7_With_Maintenance_Window         ;Checks 24x7x365

                normal_check_interval                 10           ;When
service is OK it will be checked every 10 minutes

                max_check_attempts                                    3
;When service is not OK it will check 3 times before sending an alert

                retry_check_interval
1              ;Retries every 1 minute once service is not OK. After
max_check_attempts has bee reached it rechecks at normal_check_interval

                notification_interval                       10
;Sends notifications every 10 minutes

                notification_period         24x7
; Notifies 24x365

                notification_options
w,u,c,r  ;Sends alerts at Warning, Unreachable, Critical and Recovery

                contact_groups
x              ;Email ISOpsOnCall and pages ISOnCallCell

                register
0              ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE,
JUST A TEMPLATE!

                }

 

# Service definition

define service{

                use
Critical_Service                 ; Name of service template to use

                host_name
X

                service_description
Disk Usage E Drive

                check_command
check_nt_disk!e!80!95  

                }

 

Log File:

E drive goes into warning

Apr 29 15:10:38 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;WARNING;notify-by-epager;e:\ - total: 263.99 Gb - used:
243.89 Gb (92%) - free 20.10 Gb (8%) 

 

E drive is acknowledged

Apr 29 15:11:26 DataCenterMon nagios: EXTERNAL COMMAND:
ACKNOWLEDGE_SVC_PROBLEM;X;Disk Usage E Drive;2;1;1;Nagios Admin;jf 

 

Acknowledge is sent

Apr 29 15:11:26 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-email;e:\ - total:
263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf 

Apr 29 15:11:27 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;ACKNOWLEDGEMENT (WARNING);notify-by-epager;e:\ - total:
263.99 Gb - used: 243.89 Gb (92%) - free 20.10 Gb (8%);Nagios Admin;jf 

 

E drive goes critical no alert sent

Apr 30 10:07:16 DataCenterMon nagios: SERVICE ALERT: X;Disk Usage E
Drive;CRITICAL;HARD;3;e:\ - total: 263.99 Gb - used: 251.33 Gb (95%) -
free 12.67 Gb (5%) 

Apr 30 11:04:16 DataCenterMon nagios: EXTERNAL COMMAND:
SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578654 

 

Acknowledgement is removed and alert is sent.

Apr 30 11:05:19 DataCenterMon nagios: EXTERNAL COMMAND:
REMOVE_SVC_ACKNOWLEDGEMENT;X;Disk Usage E Drive 

Apr 30 11:05:49 DataCenterMon nagios: EXTERNAL COMMAND:
SCHEDULE_FORCED_SVC_CHECK;X;Disk Usage E Drive;1209578747 

Apr 30 11:05:57 DataCenterMon nagios: SERVICE NOTIFICATION: XX;X;Disk
Usage E Drive;CRITICAL;notify-by-email;e:\ - total: 263.99 Gb - used:
254.71 Gb (96%) - free 9.29 Gb (4%)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080430/ceccac95/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list