Strange load average with Nagios 3

Andreas Ericsson ae at op5.se
Tue Apr 21 19:13:41 CEST 2009


Thomas Guyot-Sionnest wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Andreas Ericsson wrote:
>> Yann Jouanin wrote:
>>>>> Has anyone the same observation ?
>>>> I've never seen anything like it before. Does the 4-hour interval coincide
>>>> with your check-interval? Does it align with some performance-data
>>> processing?
>>>
>>> No, we process performance data every 15 seconds (and even when stopping
>>> processing pattern still reproduced).
>>> Our check-interval are 1min and 5 min, it doesn't seem to be correlated with
>>> the 4hours slot.
>>>
>>>>> Can something in Nagios behavior explain
>>>>> this load ?
>>>> Not really, no. I suppose a database logging application could display a
>>> load
>>>> pattern such as this if it manages its tables really poorly and then
>>> vacuums
>>>> them at the peak of the load, but since you mentioned nothing about
>>> NDOUtils
>>>> or anything similar I'll just assume you have no such things installed.
>>> There is no mysql nor NDOUtils running on these servers.
>>> Only Nagios and PNP (NPCD + BULK)
>>>
>>>>> Our servers are running different Linux distributions and we spot out the
>>>>> fact that the pattern is certainly due to Nagios.
>>>>>
>>>> How did you ascertain this? Sorry for being skeptical, but I've seen
>>>> "Oh I'm really, really sure" followed by "oops turned out I was wrong"
>>>> too many times to trust other's eyes ;-)
>>> I can understand your skepticism, let's say we strongly guess (instead of
>>> certainly!) this is due to nagios: 
>>> 	-  stopping NPCD doesn't change the pattern. 
>>> 	-  we check cronjob, nothing was running with a 4 hour periodicity
>>> 	-  the different servers don't run the same services (E.G : some
>>> have backup with bacula, some not)
>>> 	-  We can unfortunately not stop the nagios process (because it's
>>> production!) but, the amplitude of lobes seems to be quite correlated with
>>> the number of services.
>>>
>> That seems to rule out the basics and some of the esoterics at least.
>>
>> Does this state persist if you restart Nagios, or does it sort of grow
>> into place after it's been running a while?
>>
>> It would be nice to be able to see average run-time of plugins over the
>> time of that graph. I could imagine long-running checks to sort of pile
>> up until they spill over and miss one of their check-windows, but that
>> *should* mean load slowly increased and then stayed at a small plateau.
> 
> For the records, I noticed this on the very first day I switched to
> 3.0.1-cvs (close to 3.0.2) on two nagios servers and this behavior has
> persisted since then (Sent an email to the mailing list back then - can
> retrieve it if you like). It has absolutely nothing to do with
> check_interval, cpu usage, IO usage or anything, and is very consistent
> across restarts, server reboots, etc. Everything is running fine though
> (and CPU usage was consistent between the two versions); I just had to
> adjust the load thresholds on these servers to cope with it.
> 


Right. In that case it's at least bisectable, and the range is nicely
short.

   git log -p nagios-3-0-2..nagios-3-0-1 -- base common

has nothing that seems even remotely in the same area though.

Are you sure it was between 3.0.1 and 3.0.2?

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p




More information about the Developers mailing list