recurring CRITICAL alerts

Daniel dstewardson at btinternet.com
Sun Sep 10 12:27:09 CEST 2006


I am running nagios 1.2 and have a passive check that is sent results every 5 minutes from a monitored server. The check is also configured to only send out notifications between 04:00 and 22:30 as the service being monitored is not actually running outside those hours.

As expected, the service goes into a CRITICAL state during the non-notification period and then resumes an OK state when the service is running again. But then there is a strange situation where it toggles between the previous CRITICAL state 
and an OK state. Here is an example of what I mean (reading from bottom up). Notice how the CRITICAL alert saying "last updated 2006/09/06 22:21:09" keeps recurring, even though there has been a valid (OK) in between :


--------------------------------------------------------------------------
     September 07, 2006 05:00  
--------------------------------------------------------------------------
     


[07-09-2006 05:26:04] Nagios 1.2 starting... (PID=3385)
[07-09-2006 05:22:28] Caught SIGTERM, shutting down...
[07-09-2006 05:20:21] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;CRITICAL;HARD;1;QAT File last updated 2006/09/06 22:21:09



--------------------------------------------------------------------
           September 07, 2006 04:00  
--------------------------------------------------------------------
           



      [07-09-2006 04:55:19] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;OK;HARD;1;QAT File last updated 2006/09/07 04:38:26
      [07-09-2006 04:36:38] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;CRITICAL;HARD;1;QAT File last updated 2006/09/06 22:21:09
      [07-09-2006 04:35:13] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;OK;HARD;1;QAT File last updated 2006/09/07 04:29:20
      [07-09-2006 04:34:07] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;CRITICAL;HARD;1;QAT File last updated 2006/09/06 22:21:09
      [07-09-2006 04:19:49] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;OK;HARD;1;QAT File last updated 2006/09/07 04:14:04



--------------------------------------------------------------------
           September 06, 2006 22:00  
--------------------------------------------------------------------
           



      [06-09-2006 22:58:58] SERVICE ALERT: BTLT-RADGOLDEN;QAT File Age;CRITICAL;HARD;1;QAT File last updated 2006/09/06 22:21:09
     
        

Not sure if this has anything to do with supressing notifications...if something in the nagios logic isn't getting cleared down? Has anyone any ideas what may be causing this?   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060910/8ff1c551/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list