question about recovery messages

Paul Lynch Paul_Lynch at lenox.com
Tue Jun 14 17:50:43 CEST 2011


Hi Everyone,
 
I just joined the forum today, so I will appologize up front for the
somewhat basic nature of my question.  I've not been able to find
anything about it yet, it's possible I haven't spent enough time
searching for my answer, but if someone can point me in the right
direction it would be appreciated.
 
So I have been running Nagios for well over a year in a very limited
capacity in my environment.  I basically installed it originally as
3.0.3 and set up about a dozen windows servers to monitor CPU, memory
and disk utilization.  For this it has been great.
 
I knew there was so much more Nagios could help with so I've been
looking for opportunities that it can add value to support of our
infrastructure.  A few weeks ago someone in our web group complained
about constantly having to monitor our website to see if it is up or
not, as there have been some stability issues with it, and it runs on
six load balanced web servers.  I suggested a nagios service check.  
 
So I am using check_website_response by Chris Freeman from the exchange,
and every now and then I get critical messages, but then I never get a
recovery on the critical message, or I would expect based on my current
settings that I would get a reminder an hour later and I don't.
 
I am just curious to know if anyone else has inconsistencies with email
alerts on state changes?  
 
Thanks in advance.
 
-Paul
 
-------------------------- IP addresses and names have been changed to
protect the innocent.....
 
RESPONSE: CRITICAL - http://10.1.0.131 does not contain any data

My template looks like this:

#=======================================================================
=======
# Service Templates
#-----------------------------------------------------------------------
-------
define service{
  name    NWKService
  register                     0 ; DONT REGISTER THIS DEFINITION - ITS
NOT A REAL SERVICE, JUST A TEMPLATE!
  active_checks_enabled        1 ; Active service checks are enabled
  passive_checks_enabled       1 ; Passive service checks are
enabled/accepted
  parallelize_check            1 ; Active service checks should be
parallelized
  obsess_over_service          1 ; We should obsess over this service
(if necessary)
  check_freshness              0 ; Default is to NOT check service
'freshness'
  notifications_enabled        1 ; Service notifications are enabled
  event_handler_enabled        1 ; Service event handler is enabled
  flap_detection_enabled       1 ; Flap detection is enabled
  failure_prediction_enabled   1 ; Failure prediction is enabled
  process_perf_data            1 ; Process performance data
  retain_status_information    1 ; Retain status information across
program restarts
  retain_nonstatus_information 1 ; Retain non-status information across
program restarts
  is_volatile                  0 ; The service is not volatile
  check_period              24x7 ; The service can be checked at any
time of the day
  max_check_attempts           3 ; Re-check the service up to 3 times in
order to determine its final (hard) state
  normal_check_interval        1 ; Check the service every 10 minutes
under normal conditions
  retry_check_interval         1 ; Re-check the service every two
minutes until a hard state can be determined
  contact_groups  websiteresponse; Notifications get sent out to
everyone in the 'admins' group
  notification_options     u,c,r ; Send notifications about warning,
unknown, critical, and recovery events
  notification_interval      120 ; Re-notify about service problems
every hour
  notification_period       24x7 ; Notifications can be sent out at any
time
}


############################## Episode3

 define host{
  use   NwkHost
  host_name  Episode3
  alias   Episode3
  address  Episode3
  parents  3524,3524B
}

define service{
  use                   NWKService
  host_name             Episode3
  service_description   Response Time - Homepage
  servicegroups         www-response-time
  check_command
check_website_response!"http://10.1.0.131/"!5000!30000
}


 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20110614/14fb2305/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list