Dependencies in redundant networks and services: a idea for Nagios 4

Matt Simmons standalone.sysadmin at gmail.com
Mon May 23 14:30:00 CEST 2011


I like this idea. Could it be named something like "service_cluster",
since the vanilla term "cluster" has a many connotations?

--Matt


On Mon, May 23, 2011 at 3:34 AM, Andreas Ericsson <ae at op5.se> wrote:
> On 05/22/2011 10:12 PM, Matthew Pounsett wrote:
>>
>> Searching back through the archives it seems that the issue of
>> handing service and host dependencies on redundant services or hosts
>> comes up from time to time (actually, far less often than I would
>> have expected) and nobody seems to have a really good solution to the
>> problem.
>>
>> Imagine a web service (call it W) which depends on two separate
>> databases (call them database A and database B), where both databases
>> have redundant backups and the web service can contact either the
>> primary or backup for each database and still do its job (A1, A2, B1,
>> B2).  Doing this without a more flexible dependency system requires
>> either some very complicated combinatorial setup where we have W1
>> dependent on A1,B1, W2 dependent on A1,B2, W3 on A2,B1, etc.  or one
>> very complicated custom check script which implements the
>> dependencies itself.
>>
>> I've been thinking about this a fair bit over the last couple of
>> weeks since I manage a network and suite of services where nearly
>> everything is redundant, and almost no single outage of any component
>> results in an 'unreachable' state for any other component.  I'd very
>> much like to avoid having to run all kinds of duplicate checks and
>> train the rest of my staff to ignore alerts unless they arrive in
>> pairs.
>>
>> I think  I've hit upon an idea, but it's a fairly significant change
>> to the way service and host dependencies work today, and so I don't
>> think it's reasonable to pursue it any earlier than Nagios 4.0, but
>> I'd like to get some feedback to see if others think this might be
>> the right way to go (and I'm hoping I don't get too many TL;DRs).
>>
>> In a nutshell, my idea is to separate the definition of the master
>> service/host from the association to it by the dependent
>> service/host, and make the association by reference from the
>> dependent service or host definition... much the same way as a
>> service is associated to a host by reference.
>>
>> There are two big wins from doing this: 1) If the dependency is
>> created by reference from the service or host definition, that opens
>> the door to using a boolean syntax in that reference, allowing both
>> simple *and* complex dependencies. 2) Moving the dependency
>> association into the service or host definition also allows the
>> association to be applied to services or hosts by
>> servicegroup/hostgroup which simplifies configuration file
>> authoring.
>>
>> Here's one example where using a hostgroup for the master service (or
>> a list of hosts) contains the implicit assumption that all of the
>> services referenced in a single servicedependency definition are
>> redundancies of each other.  I don't like doing anything by
>> implication, but this provides a match to the current implication
>> that all master services referenced by a dependent are not
>> redundancies of each other, and keeps the configuration very simple.
>>
>>
>> define service { host_name           web-host service_description Web
>> Service W dependencies        db-a-dependency,db-b-dependency }
>>
>> define hostgroup { hostgroup_name      database-hosts members
>> db-host-1,db-host-2 }
>>
>> define service { hostgroup_name      database-hosts
>> service_desription  Database A }
>>
>> define service { hostgroup_name      database-hosts
>> service_desription  Database B }
>>
>> define servicedependency { servicedependency_name
>> db-a-dependency hostgroup_name                  database-hosts
>> service_description             Database A
>> notification_failure_criteria   w,u,c,p dependency_period
>> 24x7 }
>>
>> define servicedependency { servicedependency_name
>> db-b-dependency hostgroup_name                  database-hosts
>> service_description             Database B
>> notification_failure_criteria   w,u,c,p dependency_period
>> 24x7 }
>>
>> Since the implication by using a hostgroup_name or a list of hosts in
>> the servicedependency definition is that the referenced services are
>> redundant, the servicedependency doesn't 'fail' until all of the
>> referenced services meet *any* of the notifcation_failure_criteria
>> (e.g. one being w, and another being u means the servicedependency
>> fails).  Matched with the implication in the 'dependencies' directive
>> in W's service definition that those listed dependencies are not
>> redundancies of each other, and you have the following boolean
>> statement about database failures that determines whether W gets
>> notifications:
>>
>> (db-host-1:Database A&&  db-host-2:Database A) || (db-host-1:Database
>> B&&  db-host-1:Database B)
>>
>> But as I said I don't like the idea of doing anything by
>> implication... I'd like the relationships to be explicit, and so I'm
>> working on a way that the boolean statement about dependencies could
>> be written out in the dependencies directive in any host or service
>> definition.  I have a few ideas, but none are quite as clean as the
>> above example so I'll exclude them from this email for now (it's
>> already too long).  But if people are supportive of the general
>> concept I can keep working on it until I come up with a syntax that
>> is both flexible *and* manageable.
>>
>> Does this seem like a direction people would like to pursue?
>>
>
> Well... no actually. Changing how servicedependencies work is not a
> good idea. It would be far better (for Nagios 4) to implement a
> cluster-object and be able to set cluster-objects as parents for
> services (and hosts). That way we get something similar to how the
> various business process addons work today, but implemented in-core
> and without breaking servicedependencies for everyone.
>
> I agree that dependencies should have been specified somewhat like
> you mentioned if it had been done that way from the start, but
> right now it's too late to change how they work and what they do,
> as people find good use for them the way they work already.
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
>
> ------------------------------------------------------------------------------
> What Every C/C++ and Fortran developer Should Know!
> Read this article and learn how Intel has extended the reach of its
> next-generation tools to help Windows* and Linux* C/C++ and Fortran
> developers boost performance applications - including clusters.
> http://p.sf.net/sfu/intel-dev2devmay
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>



-- 
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list