Memory leak in Nagios head

Andreas Ericsson ae at op5.se
Tue Nov 30 13:45:23 CET 2004


The "repeated SIGHUP" crash occurs in 
find_host(temp_hostextinfo->host_name), called from pre_flight_check().

The attached patch makes nagios at least survive the HUPs (even though 
the memory leak is still there, so it should crash eventually when it 
hits the memory limit). I haven't tested wether this affects the GUI or not.

Note that it's only tested using Matthews patch as well (which din't fix 
the problem), so I don't know if it will work solo or if both of them 
have to be combined to do the trick.

Andreas Ericsson wrote:
> Matthew Kent wrote:
> 
>> On Mon, 2004-11-29 at 15:34, Andreas Ericsson wrote:
>>
>>> Matthew Kent wrote:
>>>
>>>> Forwarding this on in case anyone else has seen this behaviour and has
>>>> some suggestions. I'll give it a run through valgrind and see if I can
>>>> spot anything this evening.
>>>>
>>>
>>> Thanks, Matt.
>>>
>>> A small update;
>>>
>>> After having run the daemon about 10 hours at a test system, memory 
>>> consumption has escalated from roughly 1MB to around 24MB. Not very 
>>> nice figures. It seems that sending a HUP makes memory consumption 
>>> make a small jump (usually around 20K).
>>
>>
>>
>> Well I may have trapped the HUP problem after some passes through
>> valgrind. Seems reset_variables was getting called twice, right after
>> receiving a sighup and immediately after at the start of the main do()
>> loop in nagios.c
> 
> 
> I'll get to testing right away.
> 
>> I've removed the call to it from cleanup() as it's only called when
>> erroring out anyway, and resetting the variables at this point is a bit
>> of a lost cause ;)
>>
>> I also fixed a couple other minor items reported by valgrind. Although I
>> couldn't figure out this last one
>>
>> 64 bytes in 8 blocks are definitely lost in loss record 66 of 118
>>    at 0x1B904EDD: malloc (vg_replace_malloc.c:131)
>>    by 0x808F4D4: xodtemplate_add_host_to_hostlist (xodtemplate.c:10665)
>>    by 0x808F456: xodtemplate_add_hostgroup_members_to_hostlist
>> (xodtemplate.c:10640)
>>    by 0x808EF0E: xodtemplate_expand_hostgroups (xodtemplate.c:10434)
>>
> 
> This shouldn't be the longstanding problem though, since NSCORE doesn't 
> use xodtemplate_expand_hostgroups() on a regular basis. I'm leaning 
> towards a very small and subtle in-struct leak in base/checks.c or 
> common/statusdata.c (and their underlying functions, naturally). 
> Particularly since the problem seems to present itself more rapidly when 
> hosts and services changes status a lot (or possibly just change their 
> plugin output).
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: nagios-cvs-HUP-crash.diff
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20041130/81cd76bf/attachment.ksh>


More information about the Developers mailing list