high latency

Frost, Mark {PBC} mark.frost1 at pepsico.com
Mon Dec 6 21:12:04 CET 2010


> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se] 
> Sent: Monday, December 06, 2010 6:06 AM
> To: Nagios Users List
> Cc: Frost, Mark {PBC}
> Subject: Re: [Nagios-users] high latency
> 
> On 12/03/2010 08:14 PM, Frost, Mark {PBC} wrote:
> > 
> > I too struggle with them and I'm running on lightly-loaded physical hardware.
> > We have 2 servers doing the checks sending back to a central server.  Both
> > distributed nodes use ocsp/ochp, but they do nothing more than append results
> > to a file (i.e. it exits quickly).  Results are handled outside of Nagios.
> > 
>
> Try getting rid of the oc[sh]p commands and use Merlin or google for "pnsca" or
> "persistent nsca". There's one available from op5's repositories that may or may
> not work, and there's one from somewhere else that they're apparently using to
> great effect.
> 
> Even if it exits quickly, it's still executed serially, so checking halts a
> small period of time for each and every check that runs.

Hmm.  So then I'd be so curious why the 2 distservers which are both using
oc[sh]p commands the same way have such radically different latencies.

Either way, you're suggesting that having a NEB module handle the
post-check work will eliminate the serialization.

> > What's odd is that distserver 1 and distserver 2 are configured the same
> > 
> > distserver1:
> > Hosts Checked       675
> > Services Checked:  4179
> > Active Service Latency:         0.000 / 3.155 / 0.382 sec
> > Active Service Execution Time:  0.000 / 60.038 / 0.145 sec
> > 
> > distserver2:
> > Hosts Checked:      261
> > Services Checked:  4289
> > Active Service Latency:         0.000 / 169.977 / 81.300 sec
> > Active Service Execution Time:  0.000 / 15.270 / 0.211 sec
> > 
> > yet as you can see, distserver2's latency is much higher and always has been.
> > I tried turning off EPN yesterday on distserver2 and it had no discernable effect.
> > We added 400 new service checks yesterday on distserver2 (just more of the same
> > checks we already do but on 26 new hosts) and the latency went from 35 to over 80.
> > 
> 
> What kind of checks are you running? Some plugins draw a lot of cpu.
> Are any of the checks set to run in serial (grep for parallelize_check in your
> objects.cache file).

parallelize_check is set to 1 everywhere.

Most things are NRPE checks (also NRPE to NSClient++).  Some are locally
running perl scripts and others are locally running things like check_http.


> What version of Nagios are you running?
> 

3.2.1

> > The checks we do are very different (Windows, Linux, Unix, many are app-centric) so
> > it's difficult to compare exactly what runs on distserver1 and distserver2, but given
> > the jump that was taken yesterday, I'm wondering if the fact that the type of checks
> > on these new hosts are all built on dependencies make me wonder if that doesn't
> > have something to do with it.  These hosts (Windows) have a basic check for NRPE
> > and all other checks on the host are dependent on the NRPE check succeeding.
> > 
> > I have to move to all new Nagios servers very soon.  I'm interested in Merlin, but
> > given its non-production nature just yet, I'm hesitant to commit and I'm not sure if
> > it will help me here.
> > 
> It's been running at our 400+ customers with very few problems for the past month.
> 0.9.1, released just yesterday, solves the known issues our customers have
> encountered. You might want to take a look at it again. There are some issues on
> FreeBSD though (was that you reporting them?). I just recently got a new laptop
> with better support for running virtual systems, so I'm downloading a FreeBSD 8.1
> install dvd as we speak. Hopefully I'll have those issues sorted out before the
> end of the week.
> 
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se

Thanks, Andreas.  I'm hoping to allocate sufficient resources on the new servers
to be able to play with Merlin more there.  Will I be able to have the performance
data from a poller be sent up to a NOC for digestion by pnp4nagios?  It may have
been a long time ago, but I thought I remember seeing that performance data was
not yet implemented.

No we'd be using some flavor of SLES.

Thanks

Mark

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list