Memory leak in Nagios head

Ethan Galstad nagios at nagios.org
Tue Nov 30 18:23:18 CET 2004


Looks like the problem for this was actually in add_hostextinfo(), 
where a dangling pointer was causing problem.  Fix is in CVS now.


On 30 Nov 2004 at 13:45, Andreas Ericsson wrote:

> The "repeated SIGHUP" crash occurs in 
> find_host(temp_hostextinfo->host_name), called from
> pre_flight_check().
> 
> The attached patch makes nagios at least survive the HUPs (even though
> the memory leak is still there, so it should crash eventually when it
> hits the memory limit). I haven't tested wether this affects the GUI
> or not.
> 
> Note that it's only tested using Matthews patch as well (which din't
> fix the problem), so I don't know if it will work solo or if both of
> them have to be combined to do the trick.
> 
> Andreas Ericsson wrote:
> > Matthew Kent wrote:
> > 
> >> On Mon, 2004-11-29 at 15:34, Andreas Ericsson wrote:
> >>
> >>> Matthew Kent wrote:
> >>>
> >>>> Forwarding this on in case anyone else has seen this behaviour
> >>>> and has some suggestions. I'll give it a run through valgrind and
> >>>> see if I can spot anything this evening.
> >>>>
> >>>
> >>> Thanks, Matt.
> >>>
> >>> A small update;
> >>>
> >>> After having run the daemon about 10 hours at a test system,
> >>> memory consumption has escalated from roughly 1MB to around 24MB.
> >>> Not very nice figures. It seems that sending a HUP makes memory
> >>> consumption make a small jump (usually around 20K).
> >>
> >>
> >>
> >> Well I may have trapped the HUP problem after some passes through
> >> valgrind. Seems reset_variables was getting called twice, right
> >> after receiving a sighup and immediately after at the start of the
> >> main do() loop in nagios.c
> > 
> > 
> > I'll get to testing right away.
> > 
> >> I've removed the call to it from cleanup() as it's only called when
> >> erroring out anyway, and resetting the variables at this point is a
> >> bit of a lost cause ;)
> >>
> >> I also fixed a couple other minor items reported by valgrind.
> >> Although I couldn't figure out this last one
> >>
> >> 64 bytes in 8 blocks are definitely lost in loss record 66 of 118
> >>    at 0x1B904EDD: malloc (vg_replace_malloc.c:131)
> >>    by 0x808F4D4: xodtemplate_add_host_to_hostlist
> >>    (xodtemplate.c:10665) by 0x808F456:
> >>    xodtemplate_add_hostgroup_members_to_hostlist
> >> (xodtemplate.c:10640)
> >>    by 0x808EF0E: xodtemplate_expand_hostgroups
> >>    (xodtemplate.c:10434)
> >>
> > 
> > This shouldn't be the longstanding problem though, since NSCORE
> > doesn't use xodtemplate_expand_hostgroups() on a regular basis. I'm
> > leaning towards a very small and subtle in-struct leak in
> > base/checks.c or common/statusdata.c (and their underlying
> > functions, naturally). Particularly since the problem seems to
> > present itself more rapidly when hosts and services changes status a
> > lot (or possibly just change their plugin output).
> > 
> 
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Lead Developer
> 



Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/




More information about the Developers mailing list