2.0 stable stops checking

Eli Stair estair at ilm.com
Fri Mar 17 20:34:21 CET 2006


So you're seeing the scenario where nagios stops _all_ checks 
altogether?  I've had this happen when the nagios parent process dies, 
and logs to nagios.log to this effect "[1139362901] Caught SIGSEGV, 
shutting down... ".  I was getting these very frequently when I went 
above some apparent host/service threshhold (went away when I removed 
about 128 nodes at one point recently).  In these cases the CGI's still 
respond for some reason, which seemed inappropriate...

I've also seen the same symptom, but without a well-advertised nagios 
failure, where the process is still present in memory but checks aren't 
executed and the CGI's are functional.

The third related (and my current bane...) issue is where MOST all 
checks occur, but some (sometimes large) groups of unrelated actions no 
longer occur.  Host/service checks as a whole seem to be working, but 
I'll notice that I haven't gotten an alert for something that failed, 
and then see that whole class of service checks on one hostgroup aren't 
running anymore... and then start to see the same issue with other 
checks/actions as well.

I'd sure love to just have nagios start working again, as I'm strongly 
against having to write an external framework for checking various parts 
of Nagios and alerrt me when it's broken!  Alternately, I've always kept 
up to date on other OS monitor/alert frameworks and still nothing is as 
extensible as Nagios is (yet).

/eli


Terry wrote:
> In just looking at the logs, the status.log is being continuously
> updated as normal but when checks stop, the nagios.log stops gathering
> entries as well.
> 
> On 3/17/06, Eli Stair <estair at ilm.com> wrote:
> 
>>I've been seeing this continuously in 2.0beta/rc/releases.  For details
>>on my situation/posts check the devel/users archives, I'm curious if any
>>similarities exist.  I haven't gotten acknowledgement/resolution on this
>>either, the only thing I've determined is that (in my case) stopping
>>nagios and restarting with the retention file zeroed resolves the issue
>>100%.
>>
>>In the case of having an extra nagios process running that can
>>definitely cause this and other issues.  In my case that's never been
>>present and thus not the cause...
>>
>>/eli
>>
>>Terry wrote:
>>
>>>I am seeing this as well.  I have services that do not get checked
>>>when they are scheduled:
>>>
>>>Last Check Type:      ACTIVE
>>>Last Check Time:      03-17-2006 08:50:47
>>>Status Data Age:      0d 1h 37m 51s
>>>Next Scheduled Active Check:          03-17-2006 10:09:01
>>>Latency:      342.408 seconds
>>>Check Duration:       10.015 seconds
>>>Last State Change:    03-16-2006 11:55:02
>>>Current State Duration:       0d 22h 33m 36s
>>>
>>>It is currently 10:29 and it still hasnt been checked.  This is one of
>>>many examples.
>>>
>>>On 3/15/06, Matthias Eble
>>><matthias.eble at mailing.kaufland-informationssysteme.com> wrote:
>>>
>>>
>>>>hi all!
>>>>
>>>>we are experiencing occassional problems with nagios 2.0 stable. The
>>>>main process was reloaded due to configuration changes yesterday (Mar
>>>>14th). since then ps -ef looks like this:
>>>>
>>>>nagios    1078     1 12 Mar09 ?        16:49:43 /opt/nagios/bin/nagios
>>>>-d /opt/nagios/etc/nagios.cfg
>>>>nagios    9431  1078  0 Mar14 ?        00:00:00 [nagios] <defunct>
>>>>
>>>>and nagios stopped to check. Has anyone an idea what could have happened
>>>>? The nagios.log and status.dat files have not been updated since then.
>>>>
>>>>thanks
>>>>matthias
>>>>
>>>>
>>>>
>>>>-------------------------------------------------------
>>>>This SF.Net email is sponsored by xPML, a groundbreaking scripting language
>>>>that extends applications into web and mobile media. Attend the live webcast
>>>>and join the prime developer group breaking into this new coding territory!
>>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
>>>>_______________________________________________
>>>>Nagios-users mailing list
>>>>Nagios-users at lists.sourceforge.net
>>>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>>>::: Messages without supporting info will risk being sent to /dev/null
>>>>
>>>
>>>
>>>
>>>-------------------------------------------------------
>>>This SF.Net email is sponsored by xPML, a groundbreaking scripting language
>>>that extends applications into web and mobile media. Attend the live webcast
>>>and join the prime developer group breaking into this new coding territory!
>>>http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
>>>_______________________________________________
>>>Nagios-users mailing list
>>>Nagios-users at lists.sourceforge.net
>>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>>::: Messages without supporting info will risk being sent to /dev/null
>>>
>>
>>
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list