high latency

Daniel Wittenberg daniel.wittenberg.r0ko at statefarm.com
Fri Dec 3 16:31:31 CET 2010


Pagefaults - 20-30k.  This seems to be the source of most of the cpu
system time (understandably), which sits about 40-50%.  So if I could
reduce the pagefaults I think we could gain quite a bit of performance
back.

I found one other huge issue...somehow in the generic service check, the
check_inteval was set to 5 minutes...however, normal_check_interval
wasn't set at all and appeared to be checking every minute. I deleted
check_interval and added normal_check_interval and that helped a ton,
latency went down to 0.5-1.5 seconds.  That was only running 2 active
checks and about a dozen passive on 700 hosts.  I then added back in the
other 9 active checks and latency once again shot back up to about 2000
*sigh*.

I grabbed another vm and made it a dnx client and that seemed to help,
but wish I could get the main server to handle more.  Right now it has
about 700 hosts and 12,100 service checks, of which about 7000 are
active and rest are passive.

Oh, and we do have obsessive turned off.  I've even gone through as many
configs as I could and removed the macros too until I can write a
caching mech for the macro statements.

Any more ideas? 

-----Original Message-----
From: Andreas Ericsson [mailto:ae at op5.se] 
Sent: Friday, December 03, 2010 5:39 AM
To: Nagios Users List
Cc: Daniel Wittenberg
Subject: Re: [Nagios-users] high latency

On 12/02/2010 08:38 PM, Daniel Wittenberg wrote:
> Someone else noticed that nagios is generating a ton of minor page
> faults, and curious if that's normal and if that could be causing some
> of the latency in the checks?

define "a ton"

$ /usr/bin/time php -r 'echo "marsipulami\n";'
marsipulami
0.01user 0.01system 0:00.09elapsed 34%CPU (0avgtext+0avgdata
29104maxresident)k
10208inputs+0outputs (70major+1962minor)pagefaults 0swaps

That's with a reasonably simple program, and it generates 70 major and
1962
minor pagefaults.

>  I've also got a tmpfs setup for the
> status.dat and the checkresults directory to ease some of the disk i/o
> since we're on a san-backed vm host.
> 

That's good, although if you're using a virtual system you'll never know
for sure if you're really using a ramdisk or not, since the host system
might well use swap to store the ramdisk anyway.

> I turned off embedded perl this morning and our latency has been
holding
> at<  10 seconds so far, so that seemed to help a lot.
> 

Neat. Did it affect your pagefaults? If so, how?

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list