[PATCH] common/macros.c:2185:grab_standard_servicegroup_macro() speed up & Service check execution problem report

Andreas Ericsson ae at op5.se
Tue Jan 4 11:54:18 CET 2011


On 01/04/2011 11:25 AM, Stephane LAPIE wrote:
> On 01/04/2011 06:38 PM, Andreas Ericsson wrote:
>> http://www.op5.org/community/plugin-inventory/op5-projects/merlin
>> http://git.op5.org/git/?p=nagios/merlin.git;a=blob;f=HOWTO;hb=master
>> http://git.op5.org/git/?p=nagios/merlin.git;a=blob;f=README;hb=HEAD
>>
>> Make especially sure you read the first paragraph of the README.
> 
> Oh, I see.
> 
> I have been working with Nagios 3.2.0 since around Nov 2009 for our
> monitoring setup, so I went and implemented my own thing using ssh keys,
> a control shell script with specific commands, tuned configurations to
> limit duplication of files, and enhancements to NSCA client/server to
> make them workable, and such.
> 

Sounds like you made the detour more pleasant, whereas Merlin cuts a
new path. If your way works to satisfaction, I guess that should keep
on working for you.

> I never tried to touch the DB side of things with a vanilla Nagios base,
> because it wouldn't be proper to handle that as a side-hack, and would
> most likely kill any measure of performance.
> 

You needn't bother with the db parts of Merlin if you don't want to,
but its distributed/loadbalanced nature has some perks that you just
can't get without an eventbroker module (such as command forwarding
and automagic loadbalancing).

> I'll have to give this a look :) Thanks a lot.
> 

You're welcome. Let me know how it pans out. We've had some troubles
on *BSD systems in the past, but they should all be ironed out by now.
I have limited testing capabilities though, so feedback is most welcome.

>> Disable environment macros instead. If you're not using that macro on
>> the command-line, your checks will continue to work. It's not a bug in
>> Nagios, as such, it's just that environment variables and command line
>> shares memory space, and that space is limited. For your 300k+ list of
>> servicegroup members, you exhaust that space very quickly, and check
>> execution fails.
> 
> Oh, so THIS is why in most cases the script would not even be executed.
> I would have expected the error to be more straightforward, or have a
> hint pointing to it. :)
> 
> Anyhow, thanks for the explanation, it now makes perfect sense, I should
> have realized environment space was not unlimited. I had never stumbled
> upon a case where I used up all of the space provided for ENV before.
> 
>>> 2) A performance problem : The MACRO_SERVICEGROUPMEMBERS code is
>>> painfully slow and extremely costly in CPU performance. The attached
>>> patch file is my attempt at fixing the most obvious issues :
>>>    - Repetitive malloc/realloc (I initially caught on this by ktrace-ing
>>> the processes and realizing Nagios was mapping/unmapping a lot of memory).
>>>    - Repetitive string duplications and length calculations
>>>
>>> The above code has been tested for a few hours on a busy Nagios setup
>>> and performs much faster, as expected. (Reduction of several thousands
>>> of malloc/realloc calls to 1, by initally calculating the memory size to
>>> be allocated, thus avoiding unneeded system calls and memory areas
>>> duplication)
>>>
>>
>> Nice patch. I'll apply it tomorrow when it's my Nagios day. Any chance
>> you could whip up something similar for HOSTGROUPMEMBERS until then?
> 
> Sure, please check out the attached file. It works on the same principle
> as my previous patch, which means that short of the sprintf() arguments,
> it's nearly a copy/paste. I ran it through my configuration for a test
> run for an hour or so, and it seems to be doing fine so far.
> 

Excellent. Many thanks :)

> 
> Again, thanks a lot for your time.

Likewise. Getting patches is always a big thumbs up :)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl




More information about the Developers mailing list