Notification configuration (Was RFC/RFP: Service parents)

Max Schubert maxs at webwizarddesign.com
Wed May 18 14:12:15 CEST 2011


Andreas,

On Tue, May 17, 2011 at 7:57 AM, Andreas Ericsson <ae at op5.se> wrote:
>> Any plans to detatch notification attributes from service / host
>> definitions in 4.x and make them their own top-level configuration
>> class like escalations  to make it easier to scale notification
>> definitions for large projects?
>>
>
> Not really. What would such an object look like? How would it add
> additional benefit compared to using templates for hosts and services?
> I think if I could just see some sort of example definition of it I'd
> get an inkling of why some seem to think it's such a great idea. Right
> now, I see no additional benefit to it.

It would look just like an escalation.  What doesn't work well for
large configurations with notification policies being stuck into host
and service objects is this scenario (which is the one we are in at
work by design):
* Multiple configuration editors who own various parts of the Nagios
configuration tree - in our case this used to be one big tree, now we
have set up separate trees for separate projects - we have about 20-30
people who can edit their project-specific configurations.
* A set of services that are global in nature - service -> hostgroup
-> host -  baseline monitoring required by all projects using
standards established by multiple organizations in our company - for
our example, base host monitoring with an SNMP agent (6 services
across every host) - we have other global services as well and a core
team who develop, maintain an augment both our distributed Nagios
software and these global services and configurations
* A set of services that are specific to each project using our
distributed variant of Nagios - managed by subject matter experts on
each team.

With this scenario, how do we let each group that is responsible for
hosts that have these global services on them create individually
tailored notification policies since there is one notification policy
per service?
* We configure our base service and host to 'notify' on every state
change using the command name do_nothing
* We created a custom patch so that when the string 'do_nothing'  is
seen in the command name this  state change only increments the
notification count - it does not trigger any external command to run
* We created a patch (partial - no serialization to disk) for
escalation logic that tracks in memory when a fault escalation was
sent so that OK escalations are only sent in response to something
that was in a fault state.  We are working on completing this patch so
that across restarts the state is saved.
* We have all groups use escalations to define their notification
policies - the service and host notification commands then trigger our
distributed pollers to send escalation requests to a network-based
notification service we have that then lets the notification requests
trigger email, SMS, SNMP traps, etc without having to re-configure
Nagios for every notification transport /. method change.

Yeah, it is very ugly, and why?  Because 1 notification policy per
service, that doesn't scale well when taking advantage of service ->
hostgroup -> host mappings, which is a critical pattern to use when
scaling a configuration.

We have over 9000 hosts being monitored by our distributed framework
(and growing) with around 30 configuration editors and 120+ users.
Our distributed framework was centralized and a ''one project for all"
but now is a cluster of distributed set ups, one distributed set up
per project, which is scaling nicely.  Our largest distributed
installations have 3900 and 5100 hosts in them respectively - we have
4 other distributed instances that are just getting ramped up and only
have a few dozen hosts apiece at this point.

So while this is ugly, it works!  All editors can define escalation
objects that take into account both their individual needs for global
service notifications as well as any project-specific notifications -
and by putting project-specific hosts in project-specific host groups,
for most groups, two escalation policy definitions are all that are
needed per project - one for hosts, one for services.

If all notifications were just done through an escalation like
configuration object, life for a big project would be much easier.
1) Having notifications clearly separated as their own configuration
template in the Nagios DSL makes it much less confusing for people new
to Nagios to understand 'where to configure notifications'
2) The configuration flexibility of the escalation template makes it
very easy to work with for a large configuration.

Our global and project specific scenario and all the notification
changes we made is also serving us very well as we grow.

Notifications as separate objects would let us back out a number of
patches and would reallly simplify our configuraiton and let our
pollers run hotter .

- Max

------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay




More information about the Developers mailing list