check_load --- divide by number of cpus?

Mike Emigh maemigh at gmail.com
Fri May 9 15:40:42 CEST 2008


On Fri, May 9, 2008 at 9:31 AM, Mike Emigh <maemigh at gmail.com> wrote:
> On Fri, May 9, 2008 at 4:48 AM, Thomas Guyot-Sionnest <dermoth at aei.ca> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 08/05/08 10:31 AM, Mike Emigh wrote:
>>> On Thu, May 8, 2008 at 9:49 AM, Terry <td3201 at gmail.com> wrote:
>>>> I am sitting here racking my brain on this one.  Is dividing the load
>>>> average by the number of CPUs a smart thing to do?  the 'uptime'
>>>> command on a box does not do this.  So, if I have an 8 core box
>>>> sitting with a load average of 8, its the same as a single core box
>>>> sitting with a load average of 1?  I'll see the same type of server
>>>> response?  Thoughts?
>>>>
>>>
>>> It's not necessarily the same.  If you have a single-threaded process
>>> pegging one of the cores, your load average can hit 8 even if the 7
>>> other cores are sitting idle.
>>
>> Not possible. The definition of load average is simply "the number of
>> processes in the run queue". If you have only one process running (no
>> multi-threading) it can only run on a single CPU at any given time, so
>> the run queue can only be 1. A multi-threaded application will, on the
>> other hand, be able to run on multiple CPUs (obviously depending on its
>> design) and cause higher loads.
>>
>> When comparing load averages between servers you should divide it by CPU
>> because the more CPUs you have, the faster the run queue is processed.
>> Think of it like a reservoir with pipes: if you have one with 8 pipes,
>> and another with only one, the 8-pipe reservoir will be able to take 8
>> time as much water and still be able to empty it as fast as the one-pipe
>> one.
>>
>
> You're right, it wouldn't work with just a single-threaded process.
> I'm not sure of the specifics of how this would happen, but with
> Oracle we've seen it maxing two CPUs and raising the load to 56 while
> the 6 other cores in the 8 core machine sat idle.  As these situations
> are possible simply dividing by the number of cores wouldn't provide
> precise insight into what's going on.
>

I guess I should have also mentioned that this becomes more likely in
the case of virtual machines/zones/etc which have the capability of
binding to a single processor or subsets of the number of processors.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list