Scheduled checks falling far behind

Litwin, Matthew mlitwin at stubhub.com
Sat Oct 23 18:48:29 CEST 2010


Here are my stats... definitely have a problem if latencies are between 5-10 minutes!

check_reaper_frequency was set at 10, which seems high. I am going to try 5 as used in the core nagios guide and see what that does.

Nagios Stats 3.2.1
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 03-09-2010
License: GPL

CURRENT STATUS DATA
------------------------------------------------------
Status File:                            /usr/local/nagios/var/status.dat
Status File Age:                        0d 0h 0m 29s
Status File Version:                    3.2.1

Program Running Time:                   0d 0h 4m 9s
Nagios PID:                             17295
Used/High/Total Command Buffers:        0 / 0 / 4096

Total Services:                         4987
Services Checked:                       4987
Services Scheduled:                     4970
Services Actively Checked:              4987
Services Passively Checked:             0
Total Service State Change:             0.000 / 16.970 / 0.007 %
Active Service Latency:                 0.034 / 526.244 / 351.201 sec
Active Service Execution Time:          0.013 / 17.745 / 0.393 sec
Active Service State Change:            0.000 / 16.970 / 0.007 %
Active Services Last 1/5/15/60 min:     205 / 1353 / 3568 / 4970
Passive Service Latency:                0.000 / 0.000 / 0.000 sec
Passive Service State Change:           0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:    0 / 0 / 0 / 0
Services Ok/Warn/Unk/Crit:              4969 / 11 / 1 / 6
Services Flapping:                      0
Services In Downtime:                   0

Total Hosts:                            241
Hosts Checked:                          241
Hosts Scheduled:                        241
Hosts Actively Checked:                 241
Host Passively Checked:                 0
Total Host State Change:                0.000 / 0.000 / 0.000 %
Active Host Latency:                    0.000 / 487.501 / 216.928 sec
Active Host Execution Time:             0.149 / 4.310 / 3.780 sec
Active Host State Change:               0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:        38 / 131 / 199 / 241
Passive Host Latency:                   0.000 / 0.000 / 0.000 sec
Passive Host State Change:              0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min:       0 / 0 / 0 / 0
Hosts Up/Down/Unreach:                  241 / 0 / 0
Hosts Flapping:                         0
Hosts In Downtime:                      0

Active Host Checks Last 1/5/15 min:     49 / 135 / 135
   Scheduled:                           48 / 131 / 131
   On-demand:                           1 / 4 / 4
   Parallel:                            48 / 131 / 131
   Serial:                              0 / 0 / 0
   Cached:                              1 / 4 / 4
Passive Host Checks Last 1/5/15 min:    0 / 0 / 0
Active Service Checks Last 1/5/15 min:  313 / 1353 / 1353
   Scheduled:                           313 / 1353 / 1353
   On-demand:                           0 / 0 / 0
   Cached:                              0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 0 / 0

External Commands Last 1/5/15 min:      0 / 0 / 0

On Oct 22, 2010, at 6:53 PM, Frost, Mark {PBC} wrote:

> Matthew,
> 
> You don't say, but my guess would be that you have high latencies.  That is for one of several reasons, Nagios is not able to run checks when it thinks it should.  You can see this information and other stats by looking at the Performance item near the bottom of the Nav pane in the Nagios web interface.
> 
> You can also run, if memory serves, the "nagiostats" command located in your Nagios "bin" directory to see this information as well.  I actually use that nagiostats data in a custom check and graph a lot of those latencies and other Nagios performance related info.


> 
>> From my own experience, I found that I did not pay attention to this information when I started using Nagios, then read about it, made a few tweaks to make it better then forgot about it.  Then as our installation grew and grew, I found that some things got worse again and I had to consider different tuning options.
> 
> I would recommend that you first read the "Tuning Nagios For Maximum Performance" section of the docs:
> 
> http://nagios.sourceforge.net/docs/3_0/tuning.html
> 
> If nothing else, this will give you an idea of some things that can affect latencies.
> 
> Additionally, you may find that you see your average latencies, but then see something with a whopping huge max latency.  It can be hard to track down what that is in the UI.  I've just looked up that max latency and then quickly looked in the status.dat file to find the service that had that same matching latency and dug into that.  You could, for example, have a few checks that aren't really timing out so the check may take 10 minutes or more to complete which would really screw up your overall latencies.  Like the checks wouldn't have finished before the next time they were supposed to be run.
> 
> Mark
> 
> ________________________________________
> From: Litwin, Matthew [mlitwin at stubhub.com]
> Sent: Friday, October 22, 2010 8:29 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Scheduled checks falling far behind
> 
> I have been chasing my tail trying to figure out why my RRD files were very sparsely populated, and I am realizing that my checks are falling behind of their scheduled times up to 3 times their set check interval. For example a service that should be checking every 5 minutes. In the example below, the time is 00:19:02, the last check was 00:10:30 and the next scheduled check time is 00:13:28. This means it is almost 6 minutes behind schedule and almost 9 minutes since the last check!
> 
> I find even if I shorten the check interval to say 3 minutes it still behaves about the same. The server has very low load and nagios is hardly working at all. (usually below 4% cpu) I haven't touch any of the tuning on this and from what I have read the default settings appear unthrottled. Is there any way to make it "work harder"?
> 
> --Service information--
> Last Updated: Sat Oct 23 00:19:02 UTC 2010
> 
> --Service State Information--
> Current Status:
>  OK
> (for 7d 16h 14m 46s)
> Status Information:     CPU STATISTICS OK : user=0.12% system=0.00% iowait=0.00% idle=99.88%
> Performance Data:       0.12;0.00;0.00;99.88;80;90
> Current Attempt:        1/3  (HARD state)
>>>> Last Check Time:    10-23-2010 00:10:30  <<<<
> Check Type:     ACTIVE
> Check Latency / Duration:       612.633 / 2.052 seconds
>>>> Next Scheduled Check:       10-23-2010 00:13:28 <<<
> Last State Change:      10-15-2010 08:04:16
> Last Notification:      N/A (notification 0)
> Is This Service Flapping?
>  NO
> (0.00% state change)
> In Scheduled Downtime?
>  NO
> Last Update:    10-23-2010 00:18:33  ( 0d 0h 0m 29s ago)
> 
> 
> 
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
> Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
> http://p.sf.net/sfu/nokia-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
> Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
> http://p.sf.net/sfu/nokia-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null


------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list