Memory leak in Nagios head
Ethan Galstad
nagios at nagios.org
Tue Nov 30 18:23:18 CET 2004
Looks like the problem for this was actually in add_hostextinfo(),
where a dangling pointer was causing problem. Fix is in CVS now.
On 30 Nov 2004 at 13:45, Andreas Ericsson wrote:
> The "repeated SIGHUP" crash occurs in
> find_host(temp_hostextinfo->host_name), called from
> pre_flight_check().
>
> The attached patch makes nagios at least survive the HUPs (even though
> the memory leak is still there, so it should crash eventually when it
> hits the memory limit). I haven't tested wether this affects the GUI
> or not.
>
> Note that it's only tested using Matthews patch as well (which din't
> fix the problem), so I don't know if it will work solo or if both of
> them have to be combined to do the trick.
>
> Andreas Ericsson wrote:
> > Matthew Kent wrote:
> >
> >> On Mon, 2004-11-29 at 15:34, Andreas Ericsson wrote:
> >>
> >>> Matthew Kent wrote:
> >>>
> >>>> Forwarding this on in case anyone else has seen this behaviour
> >>>> and has some suggestions. I'll give it a run through valgrind and
> >>>> see if I can spot anything this evening.
> >>>>
> >>>
> >>> Thanks, Matt.
> >>>
> >>> A small update;
> >>>
> >>> After having run the daemon about 10 hours at a test system,
> >>> memory consumption has escalated from roughly 1MB to around 24MB.
> >>> Not very nice figures. It seems that sending a HUP makes memory
> >>> consumption make a small jump (usually around 20K).
> >>
> >>
> >>
> >> Well I may have trapped the HUP problem after some passes through
> >> valgrind. Seems reset_variables was getting called twice, right
> >> after receiving a sighup and immediately after at the start of the
> >> main do() loop in nagios.c
> >
> >
> > I'll get to testing right away.
> >
> >> I've removed the call to it from cleanup() as it's only called when
> >> erroring out anyway, and resetting the variables at this point is a
> >> bit of a lost cause ;)
> >>
> >> I also fixed a couple other minor items reported by valgrind.
> >> Although I couldn't figure out this last one
> >>
> >> 64 bytes in 8 blocks are definitely lost in loss record 66 of 118
> >> at 0x1B904EDD: malloc (vg_replace_malloc.c:131)
> >> by 0x808F4D4: xodtemplate_add_host_to_hostlist
> >> (xodtemplate.c:10665) by 0x808F456:
> >> xodtemplate_add_hostgroup_members_to_hostlist
> >> (xodtemplate.c:10640)
> >> by 0x808EF0E: xodtemplate_expand_hostgroups
> >> (xodtemplate.c:10434)
> >>
> >
> > This shouldn't be the longstanding problem though, since NSCORE
> > doesn't use xodtemplate_expand_hostgroups() on a regular basis. I'm
> > leaning towards a very small and subtle in-struct leak in
> > base/checks.c or common/statusdata.c (and their underlying
> > functions, naturally). Particularly since the problem seems to
> > present itself more rapidly when hosts and services changes status a
> > lot (or possibly just change their plugin output).
> >
>
> --
> Andreas Ericsson andreas.ericsson at op5.se
> OP5 AB www.op5.se
> Lead Developer
>
Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
More information about the Developers
mailing list