Nagios stopped checking most of my services!

Casey Allen Shobe casey at shobe.info
Tue Nov 3 03:44:47 CET 2009


Was your nagios interface still showing whatever the service status  
was from the last time it ran the checks as though they were current?

I just joined this list over the same problem with 3.2.0.  Nagios was  
showing all green (got 95 services across 12 hosts), then when I went  
to demo it to my boss who just got back from vacation, I killed a box,  
and was surprised to see Nagios continue showing green.  When I looked  
in detail I saw that the checks had actually not been running in  
days...  I restarted Nagios a number of times to no avail.  I finally  
echoed a force check into the nagios.cmd file for a host, and got a  
lot of messages about how the services seemed orphaned.  I do have the  
check for orphans config option enabled (99% of my nagios.cfg is  
default).

I was finally able to get Nagios fixed by stopping it, removing  
retention.dat, and starting it again.  But I don't really want to  
disable retention unless I have to...

I was thinking this might have something to do with my failover setup  
(though I don't see why).  I have two boxes in a Heartbeat+DRBD  
configuration, with Nagios (and all it's configuration, /var files,  
etc.) on the DRBD partition.  It /seems/ to failover just fine.  After  
seeing this, I tested failing over back and forth about a dozen times  
and Nagios did not seem to get hung up in the same way, so I don't  
understand what caused this.

Maybe we have the same problem?

On Nov 2, 2009, at 9:08 PM, Les Fenison wrote:

> I had nagios working great.  Checking 6 hosts and about 85  
> services.  Then suddenly, all services on all hosts except one  
> stopped checking.  The next scheduled check is about 24 hours from  
> the last check.  I had been checking every 5 minutes.
>
> Restarting nagios didn't help.    I am using a gui NagioSQL to edit  
> my configuration files so I suspect it did something to me but I  
> have no clue where to look except where I have already looked.
>
> What can cause nagios to just stop checking everything like that or  
> to randomly switch to every 24 hours rather than the configured  
> every 5 minutes?
>
> I am having to manually do force checks to get it to check.
>
> Here are some things I have checked...
>
> Hosts  check_interval is 5, retry_interval is 1
> Services  check_interval is 10, retry_interval is 2
>
> So where could Nagios be getting the idea that it is suppose to be  
> every 24 hours?

-- 
Casey Allen Shobe
casey at shobe.info


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list