2.0 stable stops checking

Terry td3201 at gmail.com
Fri Mar 17 20:39:59 CET 2006


No, not all checks.  I see check_ping processes still firing up:

[root at plaut08 etc]# ps xauwwww -H| grep nagios  | grep -v grep
nagios   26676 11.0  0.1 28620 3852 ?        Ssl  13:35   0:11  
/usr/bin/nagios -d /etc/nagios/nagios.cfg
nagios   26814  0.0  0.1 28624 3852 ?        S    13:36   0:00    
/usr/bin/nagios -d /etc/nagios/nagios.cfg
nagios   26815  0.0  0.0  4684  640 ?        S    13:36   0:00      
/usr/lib/nagios/plugins/check_ping -H 172.28.7.59 -w 3000.0,80%% -c
5000.0,100%% -p 15 -t 30
nagios   26816  0.0  0.0  2580  528 ?        S    13:36   0:00        
/bin/ping -n -U -w 90 -c 15 172.28.7.59


I am seeing the same thing as you where only certain hosts/hostgroups
are being checked and then all of a sudden everything stops BUT pings
based on above but those checks are not being updated in nagios.log. 
Very weird.

On 3/17/06, Eli Stair <estair at ilm.com> wrote:
>
> So you're seeing the scenario where nagios stops _all_ checks
> altogether?  I've had this happen when the nagios parent process dies,
> and logs to nagios.log to this effect "[1139362901] Caught SIGSEGV,
> shutting down... ".  I was getting these very frequently when I went
> above some apparent host/service threshhold (went away when I removed
> about 128 nodes at one point recently).  In these cases the CGI's still
> respond for some reason, which seemed inappropriate...
>
> I've also seen the same symptom, but without a well-advertised nagios
> failure, where the process is still present in memory but checks aren't
> executed and the CGI's are functional.
>
> The third related (and my current bane...) issue is where MOST all
> checks occur, but some (sometimes large) groups of unrelated actions no
> longer occur.  Host/service checks as a whole seem to be working, but
> I'll notice that I haven't gotten an alert for something that failed,
> and then see that whole class of service checks on one hostgroup aren't
> running anymore... and then start to see the same issue with other
> checks/actions as well.
>
> I'd sure love to just have nagios start working again, as I'm strongly
> against having to write an external framework for checking various parts
> of Nagios and alerrt me when it's broken!  Alternately, I've always kept
> up to date on other OS monitor/alert frameworks and still nothing is as
> extensible as Nagios is (yet).
>
> /eli
>
>
> Terry wrote:
> > In just looking at the logs, the status.log is being continuously
> > updated as normal but when checks stop, the nagios.log stops gathering
> > entries as well.
> >
> > On 3/17/06, Eli Stair <estair at ilm.com> wrote:
> >
> >>I've been seeing this continuously in 2.0beta/rc/releases.  For details
> >>on my situation/posts check the devel/users archives, I'm curious if any
> >>similarities exist.  I haven't gotten acknowledgement/resolution on this
> >>either, the only thing I've determined is that (in my case) stopping
> >>nagios and restarting with the retention file zeroed resolves the issue
> >>100%.
> >>
> >>In the case of having an extra nagios process running that can
> >>definitely cause this and other issues.  In my case that's never been
> >>present and thus not the cause...
> >>
> >>/eli
> >>
> >>Terry wrote:
> >>
> >>>I am seeing this as well.  I have services that do not get checked
> >>>when they are scheduled:
> >>>
> >>>Last Check Type:      ACTIVE
> >>>Last Check Time:      03-17-2006 08:50:47
> >>>Status Data Age:      0d 1h 37m 51s
> >>>Next Scheduled Active Check:          03-17-2006 10:09:01
> >>>Latency:      342.408 seconds
> >>>Check Duration:       10.015 seconds
> >>>Last State Change:    03-16-2006 11:55:02
> >>>Current State Duration:       0d 22h 33m 36s
> >>>
> >>>It is currently 10:29 and it still hasnt been checked.  This is one of
> >>>many examples.
> >>>
> >>>On 3/15/06, Matthias Eble
> >>><matthias.eble at mailing.kaufland-informationssysteme.com> wrote:
> >>>
> >>>
> >>>>hi all!
> >>>>
> >>>>we are experiencing occassional problems with nagios 2.0 stable. The
> >>>>main process was reloaded due to configuration changes yesterday (Mar
> >>>>14th). since then ps -ef looks like this:
> >>>>
> >>>>nagios    1078     1 12 Mar09 ?        16:49:43 /opt/nagios/bin/nagios
> >>>>-d /opt/nagios/etc/nagios.cfg
> >>>>nagios    9431  1078  0 Mar14 ?        00:00:00 [nagios] <defunct>
> >>>>
> >>>>and nagios stopped to check. Has anyone an idea what could have happened
> >>>>? The nagios.log and status.dat files have not been updated since then.
> >>>>
> >>>>thanks
> >>>>matthias
> >>>>
> >>>>
> >>>>
> >>>>-------------------------------------------------------
> >>>>This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> >>>>that extends applications into web and mobile media. Attend the live webcast
> >>>>and join the prime developer group breaking into this new coding territory!
> >>>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> >>>>_______________________________________________
> >>>>Nagios-users mailing list
> >>>>Nagios-users at lists.sourceforge.net
> >>>>https://lists.sourceforge.net/lists/listinfo/nagios-users
> >>>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> >>>>::: Messages without supporting info will risk being sent to /dev/null
> >>>>
> >>>
> >>>
> >>>
> >>>-------------------------------------------------------
> >>>This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> >>>that extends applications into web and mobile media. Attend the live webcast
> >>>and join the prime developer group breaking into this new coding territory!
> >>>http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> >>>_______________________________________________
> >>>Nagios-users mailing list
> >>>Nagios-users at lists.sourceforge.net
> >>>https://lists.sourceforge.net/lists/listinfo/nagios-users
> >>>::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> >>>::: Messages without supporting info will risk being sent to /dev/null
> >>>
> >>
> >>
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> > that extends applications into web and mobile media. Attend the live webcast
> > and join the prime developer group breaking into this new coding territory!
> > http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
>
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list