Core 4 Remote Workers

Andreas Ericsson ae at op5.se
Sun Feb 3 13:12:44 CET 2013


On 02/03/2013 02:37 AM, Jochen Bern wrote:
> On 02.02.2013 15:12, Eric Stanley wrote:
>> The host key should be allowed to specify one or more IP addresses, IP
>> subnets, contiguous IP address ranges, host names and host name
>> patterns/wildcards (i.e. *.example.com). If multiple workers register
>> for the same host, some sort of distribution mechanism should be used to
>> load balance the workers.
> 
> First off, I'm even more firmly opposed to the assumption that
> $HOSTADDRESS$ == IP address than Andreas. I've set up Nagios instances
> for customers where $HOSTADDRESS$ actually happened to be
> -- router management address plus SNMP index of customer-facing
>     interface (for a carrier who considered it verboten to snoop *into*
>     the network of CPE-less business customers to determine whether the
>     links are "up" in an SLA-relevant way)
> -- IP address enriched with VLAN tags (which code for the path through
>     the WDM multiplexers to the CPE's management interface)
> -- IP plus SSH port, plus optionally ssh options (server admin insists
>     on login banner, Nagios admin throws a "-q" into the gearbox ...)
> -- *CHAINS* of IP / SSH port pairs (software supplier also supplies the
>     monitoring, but some of his customers insist on burying the server
>     *several* SSH hops deep within his own network)
> and I suppose I've been lucky not to have had to deal with a mixed
> IPv4/IPv6 shop yet.
> 
> Having that said: From your description, I'm under the impression that
> you're picturing a scenario of a complex network where the central
> Nagios actually cannot reach the "leaf" hosts itself, whereas the worker
> concept seems to be oriented towards load distribution IIUC.
> 
> These two scenarios aren't 100% compatible in their technical needs. For
> example, when the central Nagios winds up with no suitable worker for
> certain target hosts, the disjoint-nets scenario likely would leave it
> no choice but to mark the upcoming checks UNREACHABLE/UNKNOWN, while the
> load-distrib scenario would call for it to run the checks itself (and
> try harder to push *other* checks to workers to rid itself of the
> increased load). Also, responsibility and, thus, configuration tends to
> follow the segregation of the networks.
> 

Scenario 1: Loadbalancing, using remote workers to enhance the cpu and
memory resources available to us.
When workers go offline (for whatever reason), their load is distributed
among remaining workers.


Scenario 2: "Passing" firewalls, using remote workers to run check the
master can't access due to access restrictions.
When workers go offline, checks are either not executed (requires more
configuration), or marked as UNKNOWN. An internal check for the worker
itself will be a parent service of the services it's supposed to handle.
This requires being able to add services on-the-fly from inside Nagios,
which is halfway planned anyway, but will require additional bookkeeping
variables inside the objects and has to be left for 4.1.


Scenario 3: Remote view of inside services, using remote workers to see
the network from a particular point of view, such as a field office using
services inside the main office.
When workers go offline, checks can be executed locally, but a check
to see that the worker is up and running should trigger an alert. The
local check can be set up to return whatever the user wants, and parenting
can be handled either as in scenario 1 or as in scenario 2.

> To sum it up, what I would imagine as Nagios' long-term development for
> *your* scenario wouldn't be a Nagios/worker "tasks go downstream"
> interface but one that allows a local Nagios to push "local" status data
> (from config to current check results) to an upstream "integration and
> oversight" Nagios.
> 
> (And yes, pinpointing how exactly you can and want to do access control,
> formation of host/service *groups*, notifications for local/global
> users, yadda yadda, with such a configuration brain split *will* be a bear.)
> 

Yup. It *is* useful though. mod_gearman has the same issue, really, and
people use that.

> On 03.02.2013 00:57, Andreas Ericsson wrote:
>> libssh2, most likely, with preshared keys the same way you use keys
>> to do password-less logins via ssh
> 
> Dear ***God*** don't call it that! That's *not* a PSK (symmetric crypto,
> no auth, as *all* participants know the secret) but a pubkey
> (asymmetric, only holder of matching *private* key can answer the
> challenge, hence proper auth).
> 

I stand corrected.

>> (although using password protected
>> keys will probably be quite a large pain)
> 
> (FWIW, a look at the libssh2 API suggests that it also supports the
> communication with an ssh-agent. IOW, it should be possible to run
> Nagios as a child of ssh-agent, load the privkey into the agent at
> startup, with someone manually entering the passphrase, if need be, and
> Nagios and its subprocesses can delegate the SSH auth to the agent.)
> 

That could work, although most people will use keys without passwords
to make it work properly from init scripts. If properly designed, that
can be done reasonably safely. A master daemon has root privileges and
does nothing but read the key and then spawn a child, which immediately
drops its root privileges and then proceeds to do the work it's supposed
to do. The query handler socket can also be using tcpwrapper rules or
firewall settings, so only certain hosts can even attempt to initiate a
connection.

> (There doesn't seem to be support for the newfangled (Open)SSH
> mechanisms, though. I guess that the Nagios---worker communication
> doesn't need ControlMaster, but the (non-X.509) certificates for auth
> might have been of interest.)
> 

Well, if we use a library for it, we'll get that for free when the lib
gets updated to include it.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan




More information about the Developers mailing list