Scheduled checks falling far behind

Frost, Mark {PBC} mark.frost1 at pepsico.com
Sat Oct 23 03:53:33 CEST 2010


Matthew,

You don't say, but my guess would be that you have high latencies.  That is for one of several reasons, Nagios is not able to run checks when it thinks it should.  You can see this information and other stats by looking at the Performance item near the bottom of the Nav pane in the Nagios web interface.

You can also run, if memory serves, the "nagiostats" command located in your Nagios "bin" directory to see this information as well.  I actually use that nagiostats data in a custom check and graph a lot of those latencies and other Nagios performance related info.

>From my own experience, I found that I did not pay attention to this information when I started using Nagios, then read about it, made a few tweaks to make it better then forgot about it.  Then as our installation grew and grew, I found that some things got worse again and I had to consider different tuning options.

I would recommend that you first read the "Tuning Nagios For Maximum Performance" section of the docs:

http://nagios.sourceforge.net/docs/3_0/tuning.html

If nothing else, this will give you an idea of some things that can affect latencies.

Additionally, you may find that you see your average latencies, but then see something with a whopping huge max latency.  It can be hard to track down what that is in the UI.  I've just looked up that max latency and then quickly looked in the status.dat file to find the service that had that same matching latency and dug into that.  You could, for example, have a few checks that aren't really timing out so the check may take 10 minutes or more to complete which would really screw up your overall latencies.  Like the checks wouldn't have finished before the next time they were supposed to be run.

Mark

________________________________________
From: Litwin, Matthew [mlitwin at stubhub.com]
Sent: Friday, October 22, 2010 8:29 PM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Scheduled checks falling far behind

I have been chasing my tail trying to figure out why my RRD files were very sparsely populated, and I am realizing that my checks are falling behind of their scheduled times up to 3 times their set check interval. For example a service that should be checking every 5 minutes. In the example below, the time is 00:19:02, the last check was 00:10:30 and the next scheduled check time is 00:13:28. This means it is almost 6 minutes behind schedule and almost 9 minutes since the last check!

I find even if I shorten the check interval to say 3 minutes it still behaves about the same. The server has very low load and nagios is hardly working at all. (usually below 4% cpu) I haven't touch any of the tuning on this and from what I have read the default settings appear unthrottled. Is there any way to make it "work harder"?

--Service information--
Last Updated: Sat Oct 23 00:19:02 UTC 2010

--Service State Information--
Current Status:
  OK
 (for 7d 16h 14m 46s)
Status Information:     CPU STATISTICS OK : user=0.12% system=0.00% iowait=0.00% idle=99.88%
Performance Data:       0.12;0.00;0.00;99.88;80;90
Current Attempt:        1/3  (HARD state)
>>> Last Check Time:    10-23-2010 00:10:30  <<<<
Check Type:     ACTIVE
Check Latency / Duration:       612.633 / 2.052 seconds
>>> Next Scheduled Check:       10-23-2010 00:13:28 <<<
Last State Change:      10-15-2010 08:04:16
Last Notification:      N/A (notification 0)
Is This Service Flapping?
  NO
 (0.00% state change)
In Scheduled Downtime?
  NO
Last Update:    10-23-2010 00:18:33  ( 0d 0h 0m 29s ago)



------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list