Strange load average with Nagios 3

Andreas Ericsson ae at op5.se
Tue Apr 21 16:15:21 CEST 2009


Yann Jouanin wrote:
>>> Has anyone the same observation ?
> 
>> I've never seen anything like it before. Does the 4-hour interval coincide
>> with your check-interval? Does it align with some performance-data
> processing?
> 
> No, we process performance data every 15 seconds (and even when stopping
> processing pattern still reproduced).
> Our check-interval are 1min and 5 min, it doesn't seem to be correlated with
> the 4hours slot.
> 
>>> Can something in Nagios behavior explain
>>> this load ?
> 
>> Not really, no. I suppose a database logging application could display a
> load
>> pattern such as this if it manages its tables really poorly and then
> vacuums
>> them at the peak of the load, but since you mentioned nothing about
> NDOUtils
>> or anything similar I'll just assume you have no such things installed.
> 
> 
> There is no mysql nor NDOUtils running on these servers.
> Only Nagios and PNP (NPCD + BULK)
> 
>>> Our servers are running different Linux distributions and we spot out the
>>> fact that the pattern is certainly due to Nagios.
>>>
> 
>> How did you ascertain this? Sorry for being skeptical, but I've seen
>> "Oh I'm really, really sure" followed by "oops turned out I was wrong"
>> too many times to trust other's eyes ;-)
> 
> I can understand your skepticism, let's say we strongly guess (instead of
> certainly!) this is due to nagios: 
> 	-  stopping NPCD doesn't change the pattern. 
> 	-  we check cronjob, nothing was running with a 4 hour periodicity
> 	-  the different servers don't run the same services (E.G : some
> have backup with bacula, some not)
> 	-  We can unfortunately not stop the nagios process (because it's
> production!) but, the amplitude of lobes seems to be quite correlated with
> the number of services.
> 

That seems to rule out the basics and some of the esoterics at least.

Does this state persist if you restart Nagios, or does it sort of grow
into place after it's been running a while?

It would be nice to be able to see average run-time of plugins over the
time of that graph. I could imagine long-running checks to sort of pile
up until they spill over and miss one of their check-windows, but that
*should* mean load slowly increased and then stayed at a small plateau.

In short; I have no idea what causes this behaviour.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p




More information about the Developers mailing list