Nagios 3.2.0 process dies silently - help!

Tony Johansson tony.johansson at svenskakyrkan.se
Fri Feb 5 19:39:47 CET 2010



Marc Powell skrev:
> On Feb 5, 2010, at 10:41 AM, Tony Johansson wrote:
>
>   
>> Hello,
>>  
>> Our nagios 3.2.0 installation is having major problems.
>> The nagios process dies silently about 10-60 seconds after beeing started. No record as to why in any logfiles.
>>  
>> Have tried setting max debug (debug_level=-1 and debug_verbosity=2) in nagios.cfg - nothing.
>>  
>> System is a CentOS release 5.4 which has been running fine for months.
>>  
>> Any ideas on how to troubleshoot this or what is going on?
>>     
>
>
> Try running it in the foreground (without -d). If you don't see anything interesting when it dies, run it in the foreground through strace (strace -fFs512 /path/to/nagios -c /path/to/nagios.cfg).
>
> Verify you haven't run out of disk space or anything simple like that. If you're running SElinux, verify that there are no errors related to it in /var/log/messages.
>
> Is there anything special about the install or the machine it's running on? Are any of the nagios directories mounted from remote machines?
>
> --
> Marc
>
>   
Hello all,

Nothing special with the install, everything in the same machine.
Ran strace as suggested:
strace -fFs512 /usr/local/nagios/bin/nagios 
/usr/local/nagios/etc/nagios.cfg

[pid 32731] write(3, "[1265393566.503713] [016.2] [pid=32731] Processed 
service performance data file output: 1265393559||AHS||C: Drive 
Space||c:\\ - total: 15.86 Gb - used: 7.60 Gb (48%) - free 8.26 Gb 
(52%)||c:\\ Used Space=7.60Gb;14.27;15.54;0.00;15.86\n", 232) = 232
[pid 32731] _llseek(3, 0, [657557], SEEK_CUR) = 0
[pid 32731] write(6, "1265393559||AHS||C: Drive Space||c:\\ - total: 
15.86 Gb - used: 7.60 Gb (48%) - free 8.26 Gb (52%)||c:\\ Used 
Space=7.60Gb;14.27;15.54;0.00;15.86\n", 144) = -1 EFBIG (File too large)
[pid 32731] --- SIGXFSZ (File size limit exceeded) @ 0 (0) ---
[pid 32732] +++ killed by SIGXFSZ +++

"File size limit exceeded" seems to be the cause
Disk space is plenty:
df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      68G   28G   38G  43% /
/dev/sda1              99M   30M   65M  32% /boot
tmpfs                 506M     0  506M   0% /dev/shm

Also, I did try renaming retention.dat, status.dat and moving files out 
of checkresults earlier with no result.

Seems like /var/spool/nagios/perfdata.log is 2G while 
/var/spool/nagios/perfdata.log is a mere 11K
I've tried renaming the file and started nagios which now seems to run ok.
Looks like I need to set up log rotation or what is the best way to 
handle perfdata.log?

Many thanks, Tony





------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list