Dependencies in redundant networks and services: a idea for Nagios 4

Matthew Pounsett matt at conundrum.com
Sun May 22 22:12:58 CEST 2011


Searching back through the archives it seems that the issue of handing service and host dependencies on redundant services or hosts comes up from time to time (actually, far less often than I would have expected) and nobody seems to have a really good solution to the problem. 

Imagine a web service (call it W) which depends on two separate databases (call them database A and database B), where both databases have redundant backups and the web service can contact either the primary or backup for each database and still do its job (A1, A2, B1, B2).  Doing this without a more flexible dependency system requires either some very complicated combinatorial setup where we have W1 dependent on A1,B1, W2 dependent on A1,B2, W3 on A2,B1, etc.  or one very complicated custom check script which implements the dependencies itself.

I've been thinking about this a fair bit over the last couple of weeks since I manage a network and suite of services where nearly everything is redundant, and almost no single outage of any component results in an 'unreachable' state for any other component.  I'd very much like to avoid having to run all kinds of duplicate checks and train the rest of my staff to ignore alerts unless they arrive in pairs.

I think  I've hit upon an idea, but it's a fairly significant change to the way service and host dependencies work today, and so I don't think it's reasonable to pursue it any earlier than Nagios 4.0, but I'd like to get some feedback to see if others think this might be the right way to go (and I'm hoping I don't get too many TL;DRs).

In a nutshell, my idea is to separate the definition of the master service/host from the association to it by the dependent service/host, and make the association by reference from the dependent service or host definition... much the same way as a service is associated to a host by reference.  

There are two big wins from doing this:
1) If the dependency is created by reference from the service or host definition, that opens the door to using a boolean syntax in that reference, allowing both simple *and* complex dependencies.
2) Moving the dependency association into the service or host definition also allows the association to be applied to services or hosts by servicegroup/hostgroup which simplifies configuration file authoring.

Here's one example where using a hostgroup for the master service (or a list of hosts) contains the implicit assumption that all of the services referenced in a single servicedependency definition are redundancies of each other.  I don't like doing anything by implication, but this provides a match to the current implication that all master services referenced by a dependent are not redundancies of each other, and keeps the configuration very simple.


define service {
    host_name           web-host
    service_description Web Service W
    dependencies        db-a-dependency,db-b-dependency
}

define hostgroup {
    hostgroup_name      database-hosts
    members             db-host-1,db-host-2
}

define service {
    hostgroup_name      database-hosts
    service_desription  Database A
}

define service {
    hostgroup_name      database-hosts
    service_desription  Database B
}

define servicedependency {
    servicedependency_name          db-a-dependency
    hostgroup_name                  database-hosts
    service_description             Database A
    notification_failure_criteria   w,u,c,p
    dependency_period               24x7
}

define servicedependency {
    servicedependency_name          db-b-dependency
    hostgroup_name                  database-hosts
    service_description             Database B
    notification_failure_criteria   w,u,c,p
    dependency_period               24x7
}

Since the implication by using a hostgroup_name or a list of hosts in the servicedependency definition is that the referenced services are redundant, the servicedependency doesn't 'fail' until all of the referenced services meet *any* of the notifcation_failure_criteria (e.g. one being w, and another being u means the servicedependency fails).  Matched with the implication in the 'dependencies' directive in W's service definition that those listed dependencies are not redundancies of each other, and you have the following boolean statement about database failures that determines whether W gets notifications:

(db-host-1:Database A && db-host-2:Database A) || (db-host-1:Database B && db-host-1:Database B)

But as I said I don't like the idea of doing anything by implication... I'd like the relationships to be explicit, and so I'm working on a way that the boolean statement about dependencies could be written out in the dependencies directive in any host or service definition.  I have a few ideas, but none are quite as clean as the above example so I'll exclude them from this email for now (it's already too long).  But if people are supportive of the general concept I can keep working on it until I come up with a syntax that is both flexible *and* manageable.

Does this seem like a direction people would like to pursue?



------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay




More information about the Developers mailing list