Nagios 3.1.1 eats cpu like mad

Ricardo Maraschini ricardo.maraschini at opservices.com.br
Wed Jul 15 23:07:33 CEST 2009


Hi,

----- "Hiren Patel" <hir3npatel at gmail.com> escreveu:
> could you provide simple configuration that can be used to replicate
> the 
> problem on 3.1.2? thanks.

It's a little hard to provide a configuration to replicate the problem.
The "bug" just occurs on timeperiod(or timezone) changes, but you can take a look to a description on [1] and figure out a way to simulate the problem.

I think it's very strange that the patch cause this overhead, so I look for the code and simulate the problem, with [2] and without [3] the patch and nothing strange has occured.
I configured nagios to 600 services, 200 of these were configured with a timeperiod that I change to simulate the problem, and the load remains stable.
What I see is that in the function add_event(this function is the responsable to put the new event in an appropriate position regarding the other events) the event_list is analised from the end to begin and this can cause an overhead in a case of a hundred services became rescheduled to begin of event_list. Maybe this can be the root cause of the problem?

What kind of test do you want I do? Just let me know.

-rm


[1] http://tiny.cc/ugmXy
[2] http://pastebin.com/f4d3b8bea
[3] http://pastebin.com/f34e66d22

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge




More information about the Developers mailing list