False alerts on http service

Andreas Ericsson ae at op5.se
Wed Sep 12 20:05:01 CEST 2012


On 09/12/2012 07:03 PM, francis picabia wrote:
> We have used nagios successfully for many years and never seen
> a case like this.  I cannot get nagios sevice to see the remote
> http service is up, although the check command indicates it is up
> and the remote apache log shows nagios visited with no error.
> 
> The site to monitor runs webwork, a math quiz system.  I have it
> set to redirect / to /webwork and also redirect insecure to https.
> 
> At first I did a plain check_http.
> 
> I switched to -S option and added -u with the full URL to avoid hitting
> the redirects, so I can get a clean code 200 returned, in case that
> was muddling things.  No difference.
> 
> When I look at the apache log, I can see the visits from nagios,
> For the early morning visits, there is no one
> using the system, so it can't be unresponsive.
> 
> Here is my check command:
> 
> 
> # 'check_www_ssl' command definition
> define command{
>          command_name    check_www_ssl
>          command_line    $USER1$/check_http -S -I $HOSTADDRESS$ -f
> follow -w 5 -c 20 -t 60 -u $ARG1$
>          }
> 
> Here is my service:
> 
> 
> define service{
>          use                             generic-service
>          host_name                       webwork
>          is_volatile                     0
>          service_description             Webwork Web Service
>          check_command
> check_www_ssl!'https://webwork.example.com/webwork/'
>          check_period                    24x7
>          contact_groups                  unix-admins
>          max_check_attempts              3
>          normal_check_interval           3
>          retry_check_interval            1
>          notification_interval           120
>          notification_period             24x7
>          notification_options            w,u,c,r
>          }
> 

This is the service definition (will be relevant later)...

> Of course I have changed the actual domain to example.com in the above.
> 

But you forgot to change it in the apache log ;)

> The alert report:
> 
> ***** Nagios 3.2 *****
> 
> Notification Type: PROBLEM
> Host: webwork
> State: DOWN
> Address: 131.162.201.91
> Info: Server answer:
> 
> Date/Time: Wed Sept 12 06:59:04 ADT 2012
> 
> 
> Here is a sample visit from nagios in the webwork apache log file
> before this time.
> 
> XXX.YYY.2.50 - - [12/Sep/2012:06:58:50 -0300] "GET
> https://webwork.acadiau.ca/webwork/ HTTP/1.0" 200 5015 "-"
> "check_http/v1.4.14 (nagios-plugins 1.4.14)"
> 
> Our apache logs show nagios is visiting every 3 minutes, 24 hours a day.  None
> of these visits results in an error.
> 
> In a nagios log, this is all that appears for webwork for the day:
> 
> # grep webwork nagios-09-11-2012-00.log
> [1347246000] CURRENT HOST STATE: webwork;DOWN;HARD;1;Server answer:
> [1347246000] CURRENT SERVICE STATE: webwork;Webwork Web
> Service;OK;HARD;1;HTTP OK: HTTP/1.1 200 OK - 4053 bytes in 0.274
> second response time

So according to these two, the service (which you're saying never
turns OK) is OK, but the host itself appears to be down. I think
you need to rethink

> 
> If I do the check_http manually, I seem to get through fine:
> 
> # /usr/lib/nagios3.2/libexec/check_http 0-S -I webwork -f follow -w5
> -c 20 -t 60 -u https://webwork.example.com/webwork
> HTTP OK: HTTP/1.1 200 OK - 5162 bytes in 0.025 second response time
> |time=0.024700s;5.000000;20.000000;0.000000 size=5162B;;;0
> 
> Can anyone spot a reason why this alert is not set up properly or
> there is a better way to do it?
> 

Examine the *host* check, not the service check, if you want to figure
out why the host appears to be down.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list