distributed host checks: freshness checking issues

Andreas Ericsson ae at op5.se
Tue Jun 7 15:35:55 CEST 2011


On 06/07/2011 02:28 PM, Pascal Vandeputte wrote:
> On Tuesday 07 June 2011 10:12:25 Andreas Ericsson wrote:
>> On 06/01/2011 05:51 PM, Pascal Vandeputte wrote:
>>> Can anyone confirm that my reasoning is correct? That the master will
>>> *always* keep on doing *some* host checks no matter what you configure?
>>
>> More or less, yes. It will at least schedule them even if it gets results
>> for them, but eventbroker modules can block even forced host checks. I'd
>> look into using Merlin, DNX or mod_gearman if I were you. It will do what
>> you want with far better performance than NSCA will ever be able to.
> 
> Thank you for the confirmation and the tips!
> 
> I've done some quick reading on Merlin, DNX and mod_gearman, and while they
> look very interesting, our Nagios setup is probably a little too complex to
> distribute the checks automatically the way we're doing it now.
> 

I doubt it. The largest setup running mod_gearman is checking something
like 35000 services. The largest merlin install is running close to 48000.
I'm sorry to say I have no figures for DNX.

> We have Nagios slaves ("workers") in multiple, independent locations, and
> we're exploiting that to the fullest by running different service checks of the
> same host from different locations: checks for "public" services are effectively
> checked from a remote location, while NRPE checks to the same host are done
> from within the same datacenter. The configs for each master&  slave are
> generated from the same host/service database.
> 

For this you'd have to use two host definitions if you use Merlin, since
checks are split to workers based on hostgroup affiliation. Apart from
that, everything should work just fine.

> Apparently, send_gearman can be used as a send_nsca replacement, which I'll
> have to check up on later. It would be cool if it can send multi-line output
> and performance data. NSCA cannot do multi-line and doesn't seem 100% reliable
> either.
> 

I'm fairly sure it can, but submitting passive checkresults with multiline
output is not very well tested and may not work well with Nagios core. It
will be particularly problematic if the check output is larger than 4096
bytes, since that's the size of the command pipe. It works flawlessly with
Merlin though, as does external command forwarding and a lot of other nifty
things that NSCA-based setups simply doesn't handle.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list