Distributed Monitoring woes and performance issues.

Rob Moss robmossrm at aol.com
Wed Nov 9 10:53:00 CET 2005


Jason Rojas wrote:

> Here is a good one for you guys.
> I am currently monitoring roughly 4357 services on 700 hosts.
> Now this is not all the hosts/services I need to be monitoring.
> From the output of nagios -s -c nagios.cfg
> it tells me that one complete run checking all mentioned 
> services/hosts will take roughly 885 seconds (14.7 minutes)
> Thats bad.

Yep that's pretty bad..  There's no reason why you shouldn't be able to 
do that within 5 minutes, given the hardware is fast enough.

> Does anyone have any ideas for a solution to this besides an 
> enterprise grade monitoring system?

You will need a few things:
1. Updated nagios config to allow more concurrent service checks
2. Big fast server (dual or quad core)
3. Faster plugins than the standard ones: such as replacing check_ping 
with check_icmp etc  (search archives, many references)
4. Using state retention so that you aren't constantly rechecking all 
hosts whenever you restart nagios
5. Rebuild nagios verison 2.x with optimised compile flags
6. Rebuild nagios with a Perl interpereter builtin and cached

You need to make some changes to the default nagios configuration which 
uses some very polite polling settings, and makes sure that not too many 
processes run at once.  I suggest that you check out the nagios polling 
settings, such as:

host_inter_check_delay_method
service_inter_check_delay_method
max_host_check_spread
max_service_check_spread
host_inter_check_delay_method
max_concurrent_checks
service_interleave_factor
state_retention_file
retain_state_information
use_retained_program_state
sleep_time

All of these settings (and maybe more?) control how fast nagios can 
check hosts and services

Check out the doc page at
http://nagios.sourceforge.net/docs/2_0/configmain.html
for detailed explainations of the settings.

To clear up some confusion, nagios first pings all hosts to make sure 
they are up, then begins checking services.  If the ping checks take 
forever then your service checks will be delayed.  Try building nagios 
with debug level 3 to see exactly what it's doing in terms of polling, 
and from the CGI you can run the Scheduling Queue to see what service 
checks are delayed or taking a long time.


To further help, you'll need to let us know what version you are using, 
which are your standard service checks, what platform you're running, 
and what compile settings you have run and a breif rundown on how you 
are concluding that service checks take 14 minutes


Cheers
rob.



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list