escalation rule creep

Stephen Carpenter scarpe01 at usg.tufts.edu
Wed Feb 12 20:21:48 CET 2003


So, we are in process of rolling out our nagios system here into 
production. Seems like a good time to sit back and look at where
I see as good areas of improvement.

Nagios has a great and very flexible way of specifying who to contact.
Its great. However, I think that it has some limitations that are hard
to get around as is... for example...

We monitor NTP on all hosts. On every single host we run we care if
NTP stops working or is way off. However, we don't care so much that we
want to be paged and maybe woken up about it. Email would be just
dandy.

On EVERY other service (ok there are one or two like NTP in this regard
but, NTP ius the ubiquitous one) we want to be paged if its down and
to escalate up through manager and last ditch efforts to "page the
world" if its not taken care of within some period of time.

Thats easy to do of course we just make our rules and for the
escalations we just do "Service_description *" and it works great.
The problem is...this then matches NTP and escalates.

Getting around this has been interesting... it basically seems to
mean that we have to have a huge set of escalation rules to handle
all services individually for each hostgroup that has different contacts
associated because.... we can't list grpoups of services in the
service description.

(ie if "foogroup" has services PING NTP and SMTP we need a service
escalation definition for each of PING and SMTP - which is fine for 3
but, I count about 12 checks that are on most every host, and no less
than 3 or 4 groups of hosts that have different escalation paths 
which means several 10s of rules for something that could easily be
defined in 4 or 5 rules if there was just a way to group 
services better)

And of course, going forward as we have more checks and more groups
on board and having their machines monitored, we expect these numbers
will grow. 

Heh if this continues, I may have to start generating the escalations
config with m4 macros :/

Anyone have any ideas on how to get around this? Any chance we might
see some manner of this in the future? 

-Steve
-- 
"The Creation of the Universe was made possible by a grant from Texas 
Instruments. "
                -- PBS 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list