Nagios3.0b4 : shutdown after a few minutes

hindrek murdsalu hindrek.murdsalu at tallink.ee
Mon Oct 1 10:13:14 CEST 2007


Sorry for the delay, didn't feel like going to work in the weekend.

Running unstripped Nagios binary as foreground:

Nagios 3.0b4 starting... (PID=18291)
...
<some warnings>
...
*** glibc detected *** /home/admin/nagios-3.0b4/base/nagios: realloc():
invalid next size: 0x00000000006df450 ***

======= Backtrace: =========
/lib64/libc.so.6[0x2ad5dc80f8fe]
/lib64/libc.so.6[0x2ad5dc81260d]
/lib64/libc.so.6(realloc+0x128)[0x2ad5dc8138b8]
/home/admin/nagios-3.0b4/base/nagios(grab_servicegroup_macros+0x92)[0x43
0192]
/home/admin/nagios-3.0b4/base/nagios(grab_service_macros+0x1a1)[0x4304d1
]
/home/admin/nagios-3.0b4/base/nagios(run_async_service_check+0x21c)[0x41
75cc]
/home/admin/nagios-3.0b4/base/nagios(run_scheduled_service_check+0xb8)[0
x4192b8]
/home/admin/nagios-3.0b4/base/nagios(handle_timed_event+0x139)[0x427379]
/home/admin/nagios-3.0b4/base/nagios(event_execution_loop+0x57e)[0x427b8
e]
/home/admin/nagios-3.0b4/base/nagios(main+0x3ec)[0x41059c]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2ad5dc7c0ae4]
/home/admin/nagios-3.0b4/base/nagios[0x410119]

Last debug of host/service checks:

[1191224192.104436] [016.0] [pid=18291] Attempting to run scheduled
check of service '1 min Load' on host 'citrixserver115': ch
eck options=0, latency=0.104000
[1191224192.104553] [016.0] [pid=18291] Checking service '1 min Load' on
host 'TSEETLLSTS115'...
[1191224193.293978] [016.2] [pid=18321] Moving temp check result file
'/usr/local/nagios/var/spool/checkresults/checkX0rcks'
to queue file '/usr/local/nagios/var/spool/checkresults/cuFFVSV'...
[1191224193.298331] [016.2] [pid=18315] Moving temp check result file
'/usr/local/nagios/var/spool/checkresults/checkJFr3dS'
to queue file '/usr/local/nagios/var/spool/checkresults/cRjGyNl'...
[1191224193.300104] [016.2] [pid=18318] Moving temp check result file
'/usr/local/nagios/var/spool/checkresults/checko0yKfa'
to queue file '/usr/local/nagios/var/spool/checkresults/cGDhzPD'...
[1191224199.233582] [016.2] [pid=18312] Moving temp check result file
'/usr/local/nagios/var/spool/checkresults/checkvwsReA'
to queue file '/usr/local/nagios/var/spool/checkresults/cYwUzJk'...
[1191224200.224156] [016.2] [pid=18302] Moving temp check result file
'/usr/local/nagios/var/spool/checkresults/checkcDkz6W'
to queue file '/usr/local/nagios/var/spool/checkresults/c6KwoqK'...

There were other service checks before it, this was the last, then the
debug ends.
Service definitions and servicegroup associated with it:
define command{
        command_name    check_snmp_load
        command_line    $USER1$/check_snmp_load.pl -H $HOSTADDRESS$
$USER7$ -T $ARG1$ -w $ARG2$ -c $ARG3$ $ARG4$
        }
########################################################################
####
#CPU LOAD
########################################################################
####
define service{
        use                     generic-service
        host_name               !citrixserver100 <-- (cause of the bug,
was fixed in 0b4)
        hostgroup_name          !citrix-servers,windows-servers
        service_description     1 min Load
        notifications_enabled   1
        check_command           check_snmp_load!stand!75!90!-C
rocommunity -f
        }
define service{
        use                     inventory-service
        host_name               !SIN_Dummy_Cluster
        hostgroup_name          citrix-servers (citrixserver115 is part
of this group)
        service_description     1 min Load
        notifications_enabled   1
        normal_check_interval   5
        check_command           check_snmp_load!stand!60!90!-C
rocommunity -f
        notification_options    w,c,r
        }

define servicegroup{
        servicegroup_name       win_cpu_load
        alias                   win_cpu_load
        members         <pretty much every windows server, I wish you
could also add hostgroups for this ../ ,citrixserver115,1 min Load, /..>
        }

Note: server names and snmp community name altered.

 
-----Original Message-----
From: nagios-devel-bounces at lists.sourceforge.net
[mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of Ethan
Galstad
Sent: 28. september 2007. a. 19:44
To: Nagios-Devel
Subject: Re: [Nagios-devel] Nagios3.0b4 : shutdown after a few minutes

It looks like the problem is on line 581 of common/macros.c.  Most like 
the pointer is getting set to an invalid address at some point.

I've been running the b4 code with the new servicegroup macros for about

two weeks without problems thus far.  I'll try running it under a 
debugger with various service/servicegroup variations to see if I can 
get it to segfault.

Can you run Nagios with the debug log enabled for service checks to find

out what service is being checked immediately before the segfault?  If 
you could send (offline if necessary) the corresponding service 
definition, as well as defs for any servicegroups its a member of, that 
would be helpful as well.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/




More information about the Developers mailing list