high latency

Andreas Ericsson ae at op5.se
Tue Dec 7 23:56:40 CET 2010


On 12/07/2010 08:20 PM, Frost, Mark {PBC} wrote:
> 
>> -----Original Message-----
>> From: Andreas Ericsson [mailto:ae at op5.se]
>> Sent: Tuesday, December 07, 2010 9:44 AM
>>
>>> Hmm.  So then I'd be so curious why the 2 distservers which are both using
>>> oc[sh]p commands the same way have such radically different latencies.
>>>
>>
>> Agreed. There must be other differences too. Perhaps there's trouble resolving
>> from one of the nodes? That usually makes checks run a helluva lot longer than
>> they normally have to.
> 
> I had another look.  While I found a test host that I'd made that was
> deliberately unreachable, I found that when I removed it it made no
> difference.  Execution times are significantly lower (min/max/avg) on
> the host with the high latencies than for the one with low latencies.
> I don't see any unresolvable hosts or now, any unreachable hosts.
> Puzzling.
> 

Not necessarily unresolvable, but if you've configured a faulty primary
dns so it ticks over to the secondary one after the 10-15 (whatever)
timeout that would obviously cause much higher execution times. If load
is very low and latency is very high on one system but not the other
it's very nearly always down to configuration differences.

> I've always wished there was an easy way to see which processes had
> high latencies from the web interface without having to view the status.dat
> file...
> 

You won't like it, but... install merlin and enable database writing. Then
you can do 'select * from service where latency >= 10.0' and get a complete
list of it, although you probably want to grab only some of the fields,
such as host_name, service_description and check_command.

>>> Either way, you're suggesting that having a NEB module handle the
>>> post-check work will eliminate the serialization.
> 
>> Yes. Sneaking a peak at what's needed in order for an event to get sent to
>> master via an eventbroker compared to running an oc[sh]p command renders
>> this, more or less:
> 
>> [ good stuff snipped...]
> 
> Wow.
> 

Indeed. http://blogs.op5.org

The relevant post is still the topmost one.

>>>
>>> parallelize_check is set to 1 everywhere.
>>
>> Does one server have a lot of random service failures? On-demand hostchecks are
>> still run in parallel.
> 
> I don't think so.  Intermittent you mean?  Not as far as I know or can see.
> 

Check top alert producers and include soft states in the report and you should
see if there are gargantuan differences.

>>>> What version of Nagios are you running?
>>>
>>> 3.2.1
>>
>> I take it upgrading makes no difference?
> 
> To 3.2.3?   I'll probably try that on the new servers, but if things work out I may
> just move to Merlin + 3.2.4.  I wasn't sure I saw anything in the 3.2.3 release that
> I found compelling for us at the time.  As I say, this system now has fairly high
> visibility so just trying something like that would involve a rather painful
> internal change process.  It's like piloting the QE2 -- I can't change
> course very quickly :-)
> 

I quite understand. Let me know if you want me to hook you up with a sales rep.
We'll do the migration in half a day, if that.

>>> Thanks, Andreas.  I'm hoping to allocate sufficient resources on the new servers
>>> to be able to play with Merlin more there.
>>
>> It's quite resource-friendly actually. Well, compared to what you're running now
>> it's positively feather-light.
> 
> I meant more like installing MySQL everywhere, building filesystems to hold the
> MySQL data, etc.  Not so much like I need more memory or more CPUs.  I don't
> remember seeing anything in the Merlin docs (maybe I missed it), but how
> large would the MySQL database need to be?  Pretty small on each box, right?
> Like 500MB or less?
> 

You don't need to use a database at all if you don't want to. You can use merlin
for loadbalancing and redundancy and still use the old cgi's or whatever for
watching current status. I think reports will be a bit bugged though, but that
should be easy to patch in Merlin tbh.

>>>   Will I be able to have the performance
>>> data from a poller be sent up to a NOC for digestion by pnp4nagios?
>>
>> Yes, but you'll need the threadsafe version of Nagios you can obtain from either
>> CVS or git://git.op5.org/nagios.git for performance-data to work. Actually, you
>> need that for Merlin to work.
> 
> That's part of the plan.  Any chance that the OP5 site will eventually be
> configured to allow git through a proxy?  It's of course less convenient to
> use snapshot tarballs, but still workable, of course.
> 

You mean through http? Doesn't it already? I think it's supposed to. I can check
up on that later. The gitweb page has links for grabbing latest master as a
tarball though. That might work as an interim solution.

> 
>>> No we'd be using some flavor of SLES.
>>>
>>
>> Should work marvellously then.
> 
> Thanks as always for your help, Andreas.
> 

You're welcome.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list