high latency

Andreas Ericsson ae at op5.se
Fri Dec 3 17:52:52 CET 2010


On 12/03/2010 04:31 PM, Daniel Wittenberg wrote:
> Pagefaults - 20-30k.  This seems to be the source of most of the cpu
> system time (understandably), which sits about 40-50%.  So if I could
> reduce the pagefaults I think we could gain quite a bit of performance
> back.
> 

Over what period of time? Here's from a program running a mere 1.22s,
showing 13k pagefaults. The majority of that time is *not* spent trying
to load the swapped out mmap regions, but in delta chain lookups inside
the program logic. And so the output:

$ time git repack
Counting objects: 397, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (397/397), done.
Writing objects: 100% (397/397), done.
Total 397 (delta 238), reused 0 (delta 0)
0.28user 0.09system 0:01.22elapsed 30%CPU (0avgtext+0avgdata 20544maxresident)k
6368inputs+464outputs (297major+12959minor)pagefaults 0swaps

I really think you're misunderstanding what pagefaults are and how they
work. Starting an X-server or openoffice.org is likely to generate somewhere
around a million pagefaults each, simply because they use a lot of libraries,
read a lot of config files, invoke a lot of helper programs and in attempt to
access various devices. 20-30k pagefaults is *nothing* for a cpu capable of
executing a couple of billion instructions per second.


> I found one other huge issue...somehow in the generic service check, the
> check_inteval was set to 5 minutes...however, normal_check_interval
> wasn't set at all and appeared to be checking every minute. I deleted
> check_interval and added normal_check_interval and that helped a ton,
> latency went down to 0.5-1.5 seconds.  That was only running 2 active
> checks and about a dozen passive on 700 hosts.  I then added back in the
> other 9 active checks and latency once again shot back up to about 2000
> *sigh*.
> 

You're doing something weird. I'm 100% certain that this isn't Nagios'
fault. Any chance you could share your config off-list? Remove passwords
and addresses first if you like.

> I grabbed another vm and made it a dnx client and that seemed to help,
> but wish I could get the main server to handle more.  Right now it has
> about 700 hosts and 12,100 service checks, of which about 7000 are
> active and rest are passive.
> 

Umm... First you said you added 9 checks and that made the entire thing
just blow up, and now you're running 7000 active checks. What checks are
you running? If you sort by cpu usage in top, is there anyone that's
really prominent?

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list