check_load gone crazy

Mike Chesnut mikec at aggregateknowledge.com
Wed Sep 8 02:39:28 CEST 2010


I'm wondering if this is a known bug, and/or if anybody else has seen 
similar behavior...

We're using Nagios 3.2.1 on Linux, monitoring several Linux systems.  We 
run the check_load probe against every system.  Occasionally (at 
non-regular intervals), Nagios will freak out and alert on the load 
average of many (sometimes *all*) systems.  When this occurs, it reports 
the *same* load averages for each system, and the weirdest part is that 
these load averages are completely bogus.

Then, over the course of the next 20 or so minutes, the load averages 
being reported gradually decrease (they go from CRITICAL to WARNING to 
OK), always staying in sync across *every* system.

Again, when this happens, the load averages being reported are 
completely unrelated to the actual load averages on any of the systems 
that are actually being checked.

Any ideas for what I can do to get to the bottom of why this happens?

Thanks,
Mike

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list