Service check delays in distributed monitor setup

Fred f1216 at yahoo.com
Thu Sep 8 03:20:59 CEST 2005


Unfortunately, setting the increment to a small number only worked to
set the pending state to something that looked reasonable, however, the
services still never get scheduled.

My configuration *was* working at one point, I tweaked something and
now no matter what I do, I can't get it to start monitoring again.  My
passive checks recieved from other monitor nodes all seem to get registered,
its just the active checks that run on the master (head) node never see
the light of day any more.  If I regenerate the configuration to not use
distributed monitoring, it works just fine, however, that puts way too much
pressure on a single node.  I removed the status.sav, but as I type
this I'm thinking I should nuke all the cache files that nagios builds, maybe
there is something that got munged in there ...

We've used both Nagios 1.2 and now 2.0b3 (testing 2.0b4) and I have yet
to need to crack open the source and make any mods ... looks like that time
is coming ;-) 

-FredC

--- misc at viceconsulting.co.nz wrote:

> Hi Fred,
> 
> I have encountered the exact same problem with my central Nagios server.
> It has about 1000 passive services, but only about 10 active services (the
> active services being used for the central Nagios server to self-monitor
> itself).  The 1000 passive services receiving their results from the 5
> distributed servers.
> 
> When I restart the Central Nagios server, the active checks get scheduled
> for 3 hours+ into the future, but they never actually seem to run.  For
> days the active checks have not actually been checking themselves.
> 
> I tried changing the service_inter_check_delay_method to d for dumb, which
> appeared to schedule it when I expected (ie within about 5 mins after the
> restart) but it still didn't run them.
> 
> Your idea of setting service_inter_check_delay_method=0.05 sounds good.  I
> haven't had any luck getting the 10 or so active services checking on my
> central Nagios server.
> 
> Is anyone able to confirm that this is a known problem in Nagios, is there
> a better workaround, is this to be fixed in 2.0 final?
> 
> Fred, keep the list posted if you make further breakthroughs.
> 
> Cheers
> Alex
> 
> On 7 Sep 2005 at 11:03, Fred wrote:
> 
> > I think I have found the source of my issue with distributed monitoring and
> > service checks.
> >
> > It turns out that if you enable distributed monitoring, even passive
> service
> > check definitions seem to get scheduled to run when nagios starts up.  If
> > you have say 10350 services (give or take one) and use smart scheduling of
> > services, you could easily see 3+ hours between the time that the first
> service
> > is scheduled and the last one.   Changing the smart schduling to "n" for
> > no delay causes the services to not be scheduled in the future, but by the
> > time nagios processes the entire configuration file, the start time is in
> > the past and I think nagios forgets about the service so it is never
> scheduled
> > again.
> >
> > I'm currently trying a service_inter_check_delay_method=0.05 which puts me
> > at about 3 minutes for 10,000+ services, which seems to be enough time for
> > nagios to startup and still have its first pending service scheduled in the
> > near future rather then the near past ...
> >
> > Does this make sense to anyone who has been messing with these
> configuration
> > settings?
> >
> > Is there a better way to do this?  I.e., I would like for nagios to *not*
> > consider the passive checks in any scheduling.  I actually only have a
> small
> > number of active checks which when run will populate the rest of the
> passive
> > checks for the entire cluster, the problem is that it seems the node that I
> > run these checks on is alphabetically *after* all of the other nodes so it
> > seems to be scheduled last and has services starting the furthest out.
> >
> > Thanks.
> > -FredC
> >
> >
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
> >
> 
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 







-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list