surge in "Service Check Timed Out" messages

Miles O'Neal meo at intrinsity.com
Fri Jul 7 21:13:50 CEST 2006


I've been running nagios 1.2 for some time with
no problems.  It's on fairly lightly loaded system.
In the last month or so, I've started seeing an
odd problem.  Once or twice a day, anywhere from 10
to 30 chekcs will suddenly have "Servce Check Timed
Out" problems, resulting in a bunch of critical flags.
These will hang around until I either restart or
reload nagios; then everything goes back to green.
It's a consistent fix.  But it's rather annoying.

Eventually the conditions will clear if left alone,
but it can take 10 - 60 minutes.  They tend to clear
in batches.  If, for example, I suddenly have 29
critical messages, 10-15 minutes later I suddenly
have 19.  A couple of minutes later I'll have 4, and
a couple of minutes later, back to 0.

Some of the monitored systems are swamped, but some
are just under a normal, medium load.  I *believe*
all the timeouts are related to nagios-statd checks,
but I can't swear to that.  Certainly most are.  And,
no, we haven't changed the nagios-statd version lately.

We haven't expanded the scope of what's monitored,
we've actually reduced system and service count in
nagios as we've upgraded to fewer, faster systems in
our compute farm and infrastructure.  The network
isn't any busier than it was.  I'm not seeing any
evidence of NIC problems.

Even though the nagios server is lightly loaded, I
have tried backing off the timings.  No change.

The results do show up in the status.log file;
this isn't a GUI issue.

Anyone know what's going on?


I searched the archives and found nothing useful.
(The sourceforge search interface is really annoying;
is there a way to get results in threaded view,
instead of every post in a thread having its own
link in the results?)


Thanks,
Miles

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list