Slow scheduled service checks

Jeff Engstrom jeff.engstrom at fortix.net
Tue Sep 21 01:13:14 CEST 2004


That fixed my recheck times!  Thanks!!

I find it strange however, that the "smart" setting was not working... 
I am running Nagios 1.1 and I wonder if 1.2 fixes this issue?? 

On Mon, 2004-09-20 at 14:30, Tedman Eng wrote:
> Check latency is indeed very high on your system.  It is the time between
> when a check is supposed to run and when it actually gets run.  By
> comparison, it should be between 1-30 seconds, depending on network
> conditions and nagios load.
> 
> If you have a very large number of down hosts, this can also affect your
> latency, since Nagios "pauses" to check a host and thus skews the scheduling
> queue when this happens.  It can usually catch up though if the other checks
> have enough headroom in the scheduling queue.
> 
> Look at your scheduling queue (best done right after a restart).  The checks
> should be spaced out evenly.  If your normal check interval for most
> services is 5 minutes, look to see that all of your services are scheduled
> to complete before that 5 minutes is up.  
> 
> Try manually setting your inter-check-delay.  
> Your value should be just below .5 (every half second per check) if you have
> 600 services actively checked.
> 
> -----Original Message-----
> From: Jeff Engstrom [mailto:jeff.engstrom at fortix.net]
> Sent: Monday, September 20, 2004 2:01 PM
> To: Nagios-Users
> Cc: teng at dataway.com
> Subject: RE: [Nagios-users] Slow scheduled service checks
> 
> 
> Here is the servers performance metrics...
> 
> Time Frame		Checks Completed 
> <= 1 minute:		35 (5.3%)
> <= 5 minutes:		249 (37.5%)
> <= 15 minutes:		664 (100.0%)
> <= 1 hour:		664 (100.0%)
> Since program start:	664 (100.0%)
> 
> Metric			Min.		Max.		Average
> Check Execution Time:	< 1 sec		5 sec		0.396 sec 
> Check Latency:		359 sec		476 sec		415.349 sec 
> Percent State Change:	0.00%		17.04%		0.03%
> 
> I don't have any excessively long check intervals as you might notice
> from the data above.  The check latency seems high to me but I don't
> have a complete understanding of what the value represents.
> 
> Thanks again!
> Jeff
> 
> 
> On Mon, 2004-09-20 at 13:24, Tedman Eng wrote:
> > Please let us know your performance metrics
> > 
> > Check execution times and check lantency (table in the top right).
> > Would also be helpful to see active check completion rate (table in the
> top
> > left)
> > 
> > These should help pinpoint where the slowdown is.
> > 
> > 
> > Also to optimize, if you have some checks that are long-intervalled (run
> > only once every day, etc), you should consider hand calculating the
> > inter-check-delay rather than using the 's' method.  Use the formula from
> > the documentation, but toss out any long-interval checks, since they'll
> > adversely skew the calculations.
> > 
> > 
> > -----Original Message-----
> > From: Jeff Engstrom [mailto:jeff.engstrom at fortix.net]
> > Sent: Monday, September 20, 2004 10:41 AM
> > To: nagios-users at lists.sourceforge.net
> > Subject: [Nagios-users] Slow scheduled service checks
> > 
> > 
> > Hello all,
> > 
> > I have a server monitoring some 1500 points and it seems for the most
> > part to run quite well. However, for one reason or another the "Last
> > Check" times are off when a service is down. That is not the only
> > problem actually... it appears that it can take some 15mins after the
> > service is restored for the update to reach the interface.
> > 
> > The main cfg is detailed below...
> > 
> > check_external_commands=1
> > command_check_interval=-1
> > log_rotation_method=d
> > use_syslog=1
> > log_notifications=1
> > log_service_retries=1
> > log_host_retries=1
> > log_event_handlers=1
> > log_initial_states=1
> > log_external_commands=1
> > log_passive_service_checks=1
> > inter_check_delay_method=s
> > service_interleave_factor=s
> > max_concurrent_checks=18
> > service_reaper_frequency=3
> > sleep_time=1
> > service_check_timeout=60
> > host_check_timeout=60
> > event_handler_timeout=30
> > notification_timeout=30
> > ocsp_timeout=5
> > perfdata_timeout=5
> > retain_state_information=1
> > retention_update_interval=60
> > use_retained_program_state=0
> > interval_length=60
> > use_agressive_host_checking=0
> > execute_service_checks=1
> > accept_passive_service_checks=1
> > enable_notifications=1
> > enable_event_handlers=1
> > process_performance_data=0
> > obsess_over_services=1
> > ocsp_command=submit_check_result
> > check_for_orphaned_services=1
> > check_service_freshness=1
> > freshness_check_interval=60
> > aggregate_status_updates=1
> > status_update_interval=15
> > enable_flap_detection=1
> > low_service_flap_threshold=5.0
> > high_service_flap_threshold=20.0
> > low_host_flap_threshold=5.0
> > high_host_flap_threshold=20.0
> > 
> > Thanks for any help on this!
> > 
> > 
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> > Project Admins to receive an Apple iPod Mini FREE for your judgement on
> > who ports your project to Linux PPC the best. Sponsored by IBM.
> > Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when
> reporting
> > any issue. 
> > ::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list