checks, notifications don't work after time period exception

Seth Simmons ssimmons at cymfony.com
Mon Aug 25 15:05:09 CEST 2008


We have a qa group overseas that will work on our customer sites during
the US overnight.  To avoid false alerts, I added a time exception so
notifications are not sent out between 4am and 5:30am.  The problem is,
after the exception, Nagios (3.0.3) won't send notifications, neither
are checks performed for any sites with an exception.  If a site is in a
critical state either shortly after 4 or (if they start early) right
before 4, checks do not continue after 5:30.  When I look at Nagios
later, it shows it in critical and the last check was done at 3:58am
with the next check at midnight the next day.

 

Let me give some more specific examples:

Server-A is running abc.customer.com for us and our qa group takes the
site down at 3:55am, before the 4am exception.  Nagios will show as
critical until either midnight the next day, or you force a check on the
service.  So, say at 8am I look at it, the service is critical with last
check at 3:55am and next scheduled check at 12am tomorrow.  When I force
a check, it will continue on normal check schedule and send notice that
the service is ok.

 

Server-B is also running a site and tomcat is stopped at 4:10am.  This
service has notification period with the same time period with
exceptions from 4am - 5:30am.  After that it will not send
notifications.  At 8am it is still doing checks and saying is critical,
but when looking at the details it says it has not sent any
notifications.  When I force a check it still won't do it.  If I restart
Nagios then it does a check it will send first notice.  I don't see
anything wrong with my time period so not sure where the issue is.  Not
sure if anyone else has noticed this before.

 

Here is what I have for that time period and checks for the above
examples:

 

define timeperiod{

                timeperiod_name           url-monitor

                alias                       url-monitor

                sunday                 00:00-23:59

                monday               00:00-23:59

                tuesday                00:00-23:59

                wednesday        00:00-23:59

                thursday              00:00-23:59

                friday                    00:00-23:59

                saturday              00:00-23:59

                exclude                recycle

                }

 

define timeperiod{

                timeperiod_name           recycle

                alias                       recycle

                sunday                 04:00-05:30

                monday               04:00-05:30

                tuesday                04:00-05:30

                wednesday        04:00-05:30

                thursday              04:00-05:30

                friday                    04:00-05:30

                saturday              04:00-05:30

                }

 

define command{

        command_name    check_http_abc

        command_line    $USER1$/check_http -H abc.company.com

        }

 

define service{

                use
generic-service                 

                host_name
Server-A

                service_description                        site abc

                is_volatile                                            0

                check_period
url-monitor

                max_check_attempts                    2

                normal_check_interval                 5

                retry_check_interval                      5

                contacts
nagiosadmin

                notification_interval                       30

                notification_period                         url-monitor

                notification_options                       w,c,r

                check_command                             check_http_abc

                }

 

define service{

use
local-service         

host_name                                         Server-B

service_description                        HTTP

                check_period                                    24x7

                max_check_attempts                    2

                normal_check_interval                 3

                retry_check_interval                      5

                contacts
nagiosadmin

                notification_interval                       60

                notification_period                         url-monitor

                notification_options                       w,c,r

                check_command                             check_http

        }

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080825/f782c336/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list