Tweaking Nagios Performance (Checks/Notifications)

Morris, Patrick patrick.morris at hp.com
Wed Oct 7 05:12:12 CEST 2009


Mirza Dedic wrote:
>
> I recently finished moving Nagios from a Virtual machine to bare-bone 
> hardware, on a PowerEdge retired machine (/dual-core, 4GB ram, raid-5 
> 10k RPM HDs/). My goal is to have a 1 minute window between when a 
> host/service goes down and the time that I receive a message that it 
> is down.
>
>  
>
>  We are monitoring a total of *347 Services and 82 Hosts*, mainly 
> using the plug-ins below:
>
>  
>
> -          Check_by_ssh
>
> -          Check_nt (NSClient++ for Win32)
>
> -          Check_http
>
> -          Check_ping
>
> -          Check_esx3
>
> -          Check_mysql
>
>  
>
> Below are my “performance info” for the current setup:
>
>  
>
> *Time Frame*
>
> 	
>
> *Services Checked*
>
> *<= 1 minute*
>
> 	
>
> *65 (18.7%)*
>
> *<= 5 minutes*
>
> 	
>
> *300 (86.5%)*
>
> <= 15 minutes
>
> 	
>
> 347 (100.0%)
>
> <= 1 hour
>
> 	
>
> 347 (100.0%)
>
> Since program start  
>
> 	
>
> 347 (100.0%)
>
>  
>
> *Metric*
>
> 	
>
> *Min.*
>
> 	
>
> *Max.*
>
> 	
>
> *Average*
>
> Check Execution Time
>
> 	
>
> 0.01 sec
>
> 	
>
> 21.91 sec
>
> 	
>
> 1.603 sec
>
> Check Latency
>
> 	
>
> 0.00 sec
>
> 	
>
> 0.00 sec
>
> 	
>
> 0.164 sec
>
> Percent State Change
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
> *Services Passively Checked*
>
>  
>
> *Time Frame*
>
> 	
>
> *Services Checked*
>
> <= 1 minute
>
> 	
>
> 0 (0.0%)
>
> <= 5 minutes
>
> 	
>
> 0 (0.0%)
>
> <= 15 minutes
>
> 	
>
> 0 (0.0%)
>
> <= 1 hour
>
> 	
>
> 0 (0.0%)
>
> Since program start
>
> 	
>
> 0 (0.0%)
>
>  
>
> *Metric*
>
> 	
>
> *Min.*
>
> 	
>
> *Max.*
>
> 	
>
> *Average*
>
> Percent State Change  
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
> *Hosts Actively Checked*
>
> *Time Frame*
>
> 	
>
> *Hosts Checked*
>
> <= 1 minute
>
> 	
>
> 0 (0.0%)
>
> <= 5 minutes
>
> 	
>
> 78 (95.1%)
>
> <= 15 minutes
>
> 	
>
> 82 (100.0%)
>
> <= 1 hour
>
> 	
>
> 82 (100.0%)
>
> Since program start
>
> 	
>
> 82 (100.0%)
>
>  
>
> *Metric*
>
> 	
>
> *Min.*
>
> 	
>
> *Max.*
>
> 	
>
> *Average*
>
> Check Execution Time
>
> 	
>
> 0.29 sec
>
> 	
>
> 4.03 sec
>
> 	
>
> 2.483 sec
>
> Check Latency
>
> 	
>
> 0.15 sec
>
> 	
>
> 0.78 sec
>
> 	
>
> 0.565 sec
>
> Percent State Change
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
>
>
> *Hosts Passively Checked*
>
> *Time Frame*
>
> 	
>
> *Hosts Checked*
>
> <= 1 minute
>
> 	
>
> 0 (0.0%)
>
> <= 5 minutes
>
> 	
>
> 0 (0.0%)
>
> <= 15 minutes
>
> 	
>
> 0 (0.0%)
>
> <= 1 hour
>
> 	
>
> 0 (0.0%)
>
> Since program start
>
> 	
>
> 0 (0.0%)
>
>  
>
> *Metric*
>
> 	
>
> *Min.*
>
> 	
>
> *Max.*
>
> 	
>
> *Average*
>
> Percent State Change  
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
> 	
>
> 0.00%
>
>  
>
>  
>

Oops, just realized you said "host/service," and not just "host."

> # MAXIMUM SERVICE CHECK SPREAD
>
>  
>
> max_service_check_spread=5
Here you're telling Nagios to spread stuff out over a 5-unit interval. 
If you're shooting for 1 minute, this should be 1.

> # MAXIMUM SERVICE CHECK SPREAD
>
>  
>
> max_service_check_spread=5
5 at a time probably won't get you there in a minute. I've had good luck 
setting this to the number of hosts I have.

> # MAXIMUM HOST CHECK SPREAD
>
>  
>
> max_host_check_spread=3
Ditto.

> # HOST AND SERVICE CHECK REAPER FREQUENCY
>
>  
>
> check_result_reaper_frequency=10
>
>  
>
> # MAX CHECK RESULT REAPER TIME
>
>  
>
> max_check_result_reaper_time=30
Set these much lower. I use 2.

> # SLEEP TIME
>
>  
>
> sleep_time=0.25
.1 will cram your checks closer together.

Also, you may want to look at the tuning recommendations in the Nagios 
docs. It looks like you're not doing the basic recommended stuff like 
using RAM disks, etc., all of which help and are very well documented.






------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list