Critical Plugin Timed Out

Andreas Ericsson ae at op5.se
Fri Aug 31 13:12:26 CEST 2007


Patrick M. wrote:
> Hi all,
> 
> I've been running Nagios 2.6 for about 6 months now, and every now and 
> then we get critical pages about a machine being down, or at least 
> Nagios can't connect to it.  It causes the CEO to freak out and believe 
> something is up with our network.
> 
> To me, it seems like the box is getting stressed out during the tests 
> and is causing the plugins to time out.
> 
> Here's some of the alerts from this morning:
> 
> #######################################
> [08-30-2007 09:24:10] HOST ALERT: tu.xyz.com;DOWN;SOFT;1;CRITICAL - 
> Plugin timed out after 10 seconds
> Service Warning[08-30-2007 09:23:40] SERVICE ALERT: 
> pule.xyz.com;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 44%, RTA = 
> 3.64 ms
> #######################################
> 

Are you noticing any slowdown in normal network traffic while all this is
happening?

Most of the checks that have timed out are ICMP-based. Assuming you're
doing some wonky QoS-stuff (windows has that stuff built in...), it's
not too hard to guess that ICMP is probably right at the bottom of the
priority list.

> 
> The machine is a p4 2.4 ghz with 1gb ram.
> 

How many checks are you running / minute? It should be
capable of handling 500 - 800 / minute without any problems
at all.

> I'm not sure how to troubleshoot this - any ideas?

Check QoS settings in the network. If it's not that, try
removing half the checks and see if that solves it. If it does,
you've got either a really bad network or underdimensioned
hardware.


If it's more checks than ICMP-based ones that are acting up and
you primarily see lots of false alarms within a short (10-30 seconds)
window, make sure you haven't got your network card set to auto-
negotiate transfer speed and duplex.

I assume you haven't set the nagios server to obtain a dhcp-address,
as renewing such a one can sometimes have funny impact on montoring,
but while you're at it, make sure (by triple-checking) that there's
only one machine with the IP of the monitoring machine.


>  What can I provide 
> you folks in order to help me out?
> 

Money, or evidence of having tried things on your own. Both are
hard currency when asking for help in a tech-savvy forum.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list