Active host check scheduling in a distributed environment

Marc Powell marc at ena.com
Tue Jul 14 17:30:59 CEST 2009


On Jul 14, 2009, at 9:46 AM, Paul Corcoran wrote:

> HI,
>
> I run a distributed Nagios environment consisting of 1 parent server  
> and 2 child servers.
>
> The child servers perform all the service checking while the parent  
> server should be performing active service checks.

Both the child server and the central server are performing active  
service checks?

> The host definitions are configured to perform host checks every 5  
> minutes. The retry interval is 1 minute and the max attempts is set  
> to 5.

On both or are you submitting passive host checks or are you expecting  
the central machine to initiate it's own active checks of hosts?

> We are monitoring 580 hosts and approx 4000 services.
>
> I noticed when a host down was detected the parent server did not  
> perform any retries of the host. This led to the status of the host  
> being stuck in a SOFT state and therefore no alerts were sent out as  
> required. I noticed that the child server performed the host checks  
> without any problem and the host was logged as being in a HARD down  
> state after 5 failed attempts.

I'm not sure what configuration you could have that would lead to  
this. Can you post the host{} definition and any relevant log entries?  
Are you only sending a single passive host result and have  
'passive_host_checks_are_soft' set in nagios.cfg?

> Is there a specific variable in nagios.cfg that explicitly tells the  
> server to perform active checks?

There are a few --
	- in nagios.cfg - execute_host_checks=<0/1>
	- in your host definition - active_checks_enabled [0/1], an  
appropriate check_period, check_interval and retry_interval set and an  
appropriate check_command set.

> Is it best practice to have the 2 child servers perform passive host  
> checks?

I have no opinion on this other that to say that if you trust the  
remote nagios' to correctly report on services, they can usually be  
trusted to correctly report on hosts.

> Is it possible that processing all the passive service check info is  
> causing the parent server to lag behind in it's own process queue?

Not likely, IMHO, assuming you're using somewhat modern hardware. You  
can see for sure under Performance Info though. Look for high  
latencies (minutes)... This is a measure of how long after a check was  
scheduled to run that it actually it ran.

--
Marc


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list