Service check delays in distributed monitor setup

misc at viceconsulting.co.nz misc at viceconsulting.co.nz
Wed Sep 7 23:05:00 CEST 2005


Hi Fred,

I have encountered the exact same problem with my central Nagios server.
It has about 1000 passive services, but only about 10 active services (the
active services being used for the central Nagios server to self-monitor
itself).  The 1000 passive services receiving their results from the 5
distributed servers.

When I restart the Central Nagios server, the active checks get scheduled
for 3 hours+ into the future, but they never actually seem to run.  For
days the active checks have not actually been checking themselves.

I tried changing the service_inter_check_delay_method to d for dumb, which
appeared to schedule it when I expected (ie within about 5 mins after the
restart) but it still didn't run them.

Your idea of setting service_inter_check_delay_method=0.05 sounds good.  I
haven't had any luck getting the 10 or so active services checking on my
central Nagios server.

Is anyone able to confirm that this is a known problem in Nagios, is there
a better workaround, is this to be fixed in 2.0 final?

Fred, keep the list posted if you make further breakthroughs.

Cheers
Alex

On 7 Sep 2005 at 11:03, Fred wrote:

> I think I have found the source of my issue with distributed monitoring and
> service checks.
>
> It turns out that if you enable distributed monitoring, even passive
service
> check definitions seem to get scheduled to run when nagios starts up.  If
> you have say 10350 services (give or take one) and use smart scheduling of
> services, you could easily see 3+ hours between the time that the first
service
> is scheduled and the last one.   Changing the smart schduling to "n" for
> no delay causes the services to not be scheduled in the future, but by the
> time nagios processes the entire configuration file, the start time is in
> the past and I think nagios forgets about the service so it is never
scheduled
> again.
>
> I'm currently trying a service_inter_check_delay_method=0.05 which puts me
> at about 3 minutes for 10,000+ services, which seems to be enough time for
> nagios to startup and still have its first pending service scheduled in the
> near future rather then the near past ...
>
> Does this make sense to anyone who has been messing with these
configuration
> settings?
>
> Is there a better way to do this?  I.e., I would like for nagios to *not*
> consider the passive checks in any scheduling.  I actually only have a
small
> number of active checks which when run will populate the rest of the
passive
> checks for the entire cluster, the problem is that it seems the node that I
> run these checks on is alphabetically *after* all of the other nodes so it
> seems to be scheduled last and has services starting the furthest out.
>
> Thanks.
> -FredC
>
>
>
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>





-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list