Performance issues, too

Tobias Klausmann klausman at schwarzvogel.de
Thu Dec 21 11:33:46 CET 2006


Hi! 

On Tue, 19 Dec 2006, Andreas Ericsson wrote:
> >>> SERVICE SCHEDULING INFORMATION
> >>> -------------------------------
> >>> Total services:                     2836
> >>> Total scheduled services:           2836
> >>> Service inter-check delay method:   SMART
> >>> Average service check interval:     2225.56 sec
> >> This is, as you point out below, quite odd. What's your _longest_ 
> >> normal_check_interval for services?
> > 
> > The longest check_interval is 86400 seconds. It's a SSL cert
> > freshness check. I figured it wasn't necesseary to check that
> > more often than once a day. I also have check_intervals of 3, 5,
> > 15, 20, 30 and 1440 seconds. The latter is also a cert freshness
> > check which is lower because the customer wanted it to be that
> > short.
> > 
> 
> Try changing the really long intervals to something shorter or 
> commenting them out completely and see what happens. Checking a 
> certificate is not a particularly heavy operation so it doesn't matter 
> much if you run it ever 5 minutes. On the server side it just gets 
> handed out from cache, so it's not heave there either.

Actually, I was horribly wrong with that statement up there.

As it turned out, the check_interval was set to 86400. From that
I jumped to the conclusion "ah, one day" - familiar numbers do
that to you. But the base unit of check_interval isn't 1s, it's 1
minute. So the check_interval was 60 days. Fortunately, it was
only one such check which we quickly eliminated before producing
the second set of graphs I mentioned elsewhere in the thread.

Now, the longest check_interval truly is one day, 1440 minutes.
The average service check interval reported by -s is now 419
seconds. Still not terribly short, but it proves that the
86400-minute-monster was to blame for the 2200+ seconds.

Changing those once-a-day checks to 5 minutes is an option, but
I'd rather wait a little to give everybody on the list some time
to look at the graphs and come up with nifty ideas.

I have the suspicion that our check latency might converge on 419
seconds - but I'd rather not test it, we'd be well beyond the
300s-interval most of our checks are designed for. 

> > Oops, forgot to mention that. Yes, a server farm is being rebuilt
> > currently. As I didn't want all the host check timeouts to make
> > matters much, much, worse, I disabled them entirely.
> > 
> 
> Ah, that explains it then. It shouldn't matter, but unless the 
> experiment I suggested above turns up anything useful, would you mind 
> commenting them out and testing that?

I'll do that if removing the day-spaced-checks doesn't help.


Regards & Thanks,
Tobias
-- 
Never touch a burning system.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list