Check load plugin configuration on a local machine.

Eric Stanley estanley at nagios.com
Fri Jul 27 14:01:32 CEST 2012


Bryan,

You're on the right track understanding check_load. There are 3 values 
for warning level and 3 values for the critical level, one each for the 
1-minute, 5-minute, and 15-minute load averages. For the check_load 
plugin, a warning or critical state is achieved if any one (not all 
three) of the load average thresholds is exceeded.

Depending on what you're trying to measure, you may want to change your 
thresholds. Since the load is the number of processes ready to run 
(including those running), the ideal situation is that you have one 
process ready to run on each core at all times. In other words, on a 24 
core box, if your 1-, 5- and 15-minutes load averages are all 24, you're 
perfectly utilizing all of your CPU capacity.

Assuming you're monitoring for excessive load, you'll probably want to 
set your thresholds higher than the number of cores. Based on 
experience, I've set warning thresholds for systems I monitor to 9n, 6n, 
and 3n for 1-, 5-, and 15-minute load averages respectively and the 
critical thresholds to 15n, 10n, and 5n, where n is the number of cores. 
These may seem like very high thresholds, especially for the shorter 
duration averages, but I can tolerate short spikes in load. It's long 
term excessive loads that concern me. Again, this is based on 
experience; prior to implementing these settings, I was getting a lot of 
alerts and much less sleep. :-)

Hope that helps.

Eric

On 7/26/2012 3:08 PM, bryan hunt wrote:
> I've got a 24 core box over here, obviously I need to tweak the
> configuration of the check_load plugin as it seems designed for a single
> core machine by default.
>
> define service{
> use                             generic-service
> host_name                       localhost
> service_description             Current Load
> check_command                   check_load!20!18!16!22!19!18
> }
>
>
>
> My understanding is that this breaks down as follows
>
> 1, 5, 15 minute load average.
>
> I've set it to the following.
>
> Warning thresholds. (17 is 70% of 24)
> 20!18!16
>
> So warn if it is currently 20, or averaging 17.
>
> Critical thresholds.
> 22!19!18
>
> Only one core, not maxed out, bad. Average above 22, bad.
>
> Anyhow, my question is. Is this a sane configuration. It's pretty
> generous with load. My usual load average is actually:
>
> 1.88 2.08 2.16
>
> Any advice appreciated,
>
> Bryan Hunt
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null


-- 
Eric Stanley
___
Developer
Nagios Enterprises, LLC
Email:  estanley at nagios.com
Web:    www.nagios.com


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list