Nagios hangs on startup

Eric Cables ecables at gmail.com
Fri Jul 2 00:24:30 CEST 2010


So I found an obscure "su hangs" message board posting that recommended
restarting syslogd.  I am running syslog-ng, and after restarting the daemon
I was able to start Nagios without any problems.  Local 'su - nagios'
commands also work without any delay.  I suspect that some interaction
between Nagios and syslog-ng is causing Nagios to stop working, and the
subsequent restart fails as a result of that original problem.

I'll probably move this problem to the syslog-ng mailing list, but has
anyone ever seen this before?  At the very least maybe this will provide
someone who has this problem in the future with more information.

-- Eric Cables


On Thu, Jul 1, 2010 at 2:36 PM, Eric Cables <ecables at gmail.com> wrote:

> Well, I tried to duplicate the command that is showing up in the 'ps -xw'
> output, and it just hangs.
>
> [nagios at psdbsd01 (~)]$ whoami
> nagios
> [nagios at psdbsd01 (~)]$ su - nagios -c touch
> /usr/local/nagios/var/nagios.log /usr/local/nagios/var/retention.dat
>
> ^^ hangs here.
>
> In fact, if I just try to 'su - nagios' the process hangs as well.
>
> Using su with other parameters works, however, so the binary seems to
> function:
> [nagios at psdbsd01 (~)]$ su -
> Password:
> [root at psdbsd01 (~)]#
>
> And su - nagios from the root user appears to work fine.
> [root at psdbsd01 (~)]# su - nagios
> [nagios at psdbsd01 (~)]$
>
> But su - nagios does not (as the nagios user):
> [nagios at psdbsd01 (~)]$ su - nagios
>
> ^^ hangs
>
> Sorry for all the noise.
>
> -- Eric Cables
>
>
>
> On Thu, Jul 1, 2010 at 2:15 PM, Eric Cables <ecables at gmail.com> wrote:
>
>> Here are a few more details I've been able to gather.
>>
>> Here's the output of a truss on the init script w/ the start statement:
>> Starting nagios:write(1,"Starting nagios:",16)                   = 16
>> (0x10)
>> fork(0x90,0xbfbfe9f8,0xa,0x8062a35,0x0,0x0)      = 55445 (0xd895)
>> getpgrp(0x0,0x0,0xd895,0x0,0x2831c0c0,0x0)       = 55444 (0xd894)
>> wait4(0xffffffff,0xbfbfe9d8,0x2,0x0,0x213,0x1)   = 55445 (0xd895)
>> stat("/sbin/su",0xbfbfe6f8)                      ERR#2 'No such file or
>> directory'
>> stat("/bin/su",0xbfbfe6f8)                       ERR#2 'No such file or
>> directory'
>> stat("/usr/sbin/su",0xbfbfe6f8)                  ERR#2 'No such file or
>> directory'
>> stat("/usr/bin/su",{ mode=-r-sr-xr-x
>> ,inode=14512669,size=14496,blksize=4096 }) = 0 (0x0)
>> fork(0x0,0x0,0x4b156e10,0x0,0x0,0x0)             = 55446 (0xd896)
>> getpgrp(0x0,0x0,0xd896,0x0,0x2831c0c0,0x0)       = 55444 (0xd894)
>>
>> ^^^ This is where it hangs.
>>
>> ps -ax | grep nagios shows the following:
>> 55443   6  I+     0:00.02 truss /usr/local/etc/rc.d/nagios.sh start
>> 55444   6  IX     0:00.01 /bin/sh /usr/local/etc/rc.d/nagios.sh start
>> 55447   6  S      0:00.07 su - nagios -c touch
>> /usr/local/nagios/var/nagios.log /usr/local/nagios/var/retention.dat
>>
>> Here is retention.dat (not sure why it would hang here):
>> -rw-------  1 nagios  nagios  2008435 Jul  1 12:26 retention.dat
>>
>> These are really the only clues I'm able to find at this point.
>>
>> -- Eric Cables
>>
>>
>>
>> On Thu, Jul 1, 2010 at 2:09 PM, Eric Cables <ecables at gmail.com> wrote:
>>
>>> Thanks for the reply.  I ended up rebooting the box, which fixed the
>>> problem temporarily, but it has resurfaced again.  When I drill down into a
>>> service check it says that the next check will be processed at a time that
>>> has already passed.
>>>
>>> For example:
>>> Last Check: 13:09
>>> Next Check: 13:11
>>>
>>> The current time on, however, is 14:02...
>>>
>>> When I try to stop the process via the init script I get the following:
>>> [nagios at psdbsd01 (~/var)]$ /usr/local/etc/rc.d/nagios.sh stop
>>> Stopping nagios: ..........
>>> Warning - nagios did not exit in a timely manner
>>>
>>> The cmd file does not exist prior to attempting to start, after stopping,
>>> but I back to the problem where Nagios will not start and instead hangs
>>> indefenitely when requested to start.
>>>
>>> [nagios at psdbsd01 (~/var)]$ /usr/local/etc/rc.d/nagios.sh start
>>> Starting nagios: <-- hangs here
>>>
>>> I'm not sure about the lock file, this is a FreeBSD install from source,
>>> and I don't see a /var/lock directory at all.  Everything Nagios related is
>>> installed in /usr/local/nagios as far as I can tell.
>>>
>>> There doesn't seem to be anything of interest in nagios.log, as the last
>>> entry just reports a notification that was sent out prior to Nagios losing
>>> its functionality.
>>>
>>> Any other tips?  I'm not exactly sure why a reboot fixed this before, but
>>> any speculation is appreciated.
>>>
>>> -- Eric Cables
>>>
>>>
>>>
>>> On Thu, Jul 1, 2010 at 6:05 AM, Jim Avery <jim at jimavery.me.uk> wrote:
>>>
>>>> On 1 July 2010 01:18, Eric Cables <ecables at gmail.com> wrote:
>>>> > Sorry to bug the list, but my 3.2.1 installation of Nagios has all of
>>>> a
>>>> > sudden stopped starting.  I noticed a lack of alerts over the last
>>>> day, and
>>>> > when I checked the GUI it indicated that the "next" scheduled check
>>>> for a
>>>> > service was in the past.  I proceeded to stop/start Nagios, but both
>>>> have
>>>> > failed.
>>>> >
>>>> > Currently when I try to start Nagios using the init script it just
>>>> hangs:
>>>> > [nagios at psdbsd01 (~/etc)]$ /usr/local/etc/rc.d/nagios.sh start
>>>> > Starting nagios:
>>>> >
>>>> > I've enabled debug logging (-1 level, 2 verbosity), but this is all
>>>> that
>>>> > shows up in nagios.debug when I issue the above start request (uid
>>>> 1003 =
>>>> > nagios):
>>>> > [1277942532.270096] [001.0] [pid=46503] drop_privileges() start
>>>> > [1277942532.270194] [004.0] [pid=46503] Original UID/GID: 1003/1003
>>>> >
>>>> > I can run nagios -v nagios.cfg, and it reports no errors.
>>>> >
>>>> > Here's the output if I run nagios nagios.cfg manually, without
>>>> invoking
>>>> > daemon mode:
>>>> > [nagios at psdbsd01 (~/etc)]$ ../bin/nagios ./nagios.cfg
>>>> >
>>>> > Nagios Core 3.2.1
>>>> > Copyright (c) 2009-2010 Nagios Core Development Team and Community
>>>> > Contributors
>>>> > Copyright (c) 1999-2009 Ethan Galstad
>>>> > Last Modified: 03-09-2010
>>>> > License: GPL
>>>> >
>>>> > Website: http://www.nagios.org
>>>> >
>>>> > Any tips?  I am not sure what the next steps are since both logging
>>>> and
>>>> > debugging aren't producing output, and Nagios has never taken more
>>>> than a
>>>> > few seconds to start in the past.
>>>>
>>>> What, if anything, shows up in your nagios.log file?
>>>>
>>>> Check you don't already have a nagios daemon running (ps -ef | grep
>>>> nagios) before you start it again.
>>>>
>>>> Check that the lock file isn't there from the previous invocation (if
>>>> you did a standard install from source tarballs the file is
>>>> /var/lock/subsys/nagios).
>>>>
>>>> Check that the Nagios command file /usr/local/nagios/var/rw/nagios.cmd
>>>> doesn't exist before you start nagios.
>>>>
>>>> Use full pathnames when attempting to verify your config, for example:
>>>>
>>>> /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Nagios-users mailing list
>>>> Nagios-users at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>> ::: Please include Nagios version, plugin version (-v) and OS when
>>>> reporting any issue.
>>>> ::: Messages without supporting info will risk being sent to /dev/null
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100701/e1714a6e/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list