aggregated status messages

Carroll, Jim P [Contractor] jcarro10 at sprintspectrum.com
Mon Jan 20 16:58:43 CET 2003


Using the 'parents' directive, one host will effectively become dependant on
another.  Plus, the status map will reflect this hierarchy.
 
Creating a host dependancy won't do anything to the status map.  If one host
is dependant on another, I suppose you could say that the services of the
dependent host depend on the depended-on host, but the services of the
dependent host don't depend on any services of the depended-on host, unless
you explicity define that dependency.  In my case, I have multiple NRPE
services dependent on one particular NRPE service on the same host, but I
could just as easily created the definition such that they were dependent on
some arbitrary other service on some arbitrary host.  (Ideally, the choice
would make sense, and not be arbitrary. ;)
 
If you can honestly say to yourself something like, "If I define
dependencies such that hosts B, C, D and E are all dependent on A, and I
really don't care what the status of hosts B, C, D and E are if the ping RTA
time of host A goes through the roof", then you may be a good candidate to
define this depedency.  BUT, if host A is acting up or in a scheduled outage
or similar, and any one of host B/C/D/E suffers the pingtime problem (e.g.,
host died a horrible death), then you won't get a notification.  Sorry, you
created the definition, so that's how it is.
 
If this isn't acceptable, then I'm not sure what to say.  You can put in the
request for consolidating notifications, but I see some problems right away:
 
How would you define the consolidation logic?  Wait for a predefined
threshold of hosts to squawk before sending the single notification?  Wait
for a quorum of hosts to complain?  Wait for a percentage failure?  What if
that threshold isn't reached?  Do you want it to wait a fixed amount of time
before sending out the consolidated message?  Do you want it to wait for a
timeout of sorts (e.g., several hosts send notifications; wait 5 mins after
the last one before sending out the consolidated message)?
 
I can't help but think that defining some sort of dependency might do what
you need to do.  Take a look at the docs regarding host/service
dependencies.  I know I ended up going over them several times in order to
get a clear picture of what it can do.  I don't think I'm even using the
full capabilities; mine is a fairly simple config by comparison (in terms of
logic, anyway).
 
Perhaps instead of having all the hosts funnel their dependencies into one
particular host, divide the dependencies so that you end up with 2 'master'
hosts.  This would let you schedule downtime on one without ending up
'flying blind' on every other host.  If one of these hosts complains, you
may not be in bad shape, but if both of the hosts complain, you could have a
major outage on your hands.  But you'll still only receive the 2
notifications.
 
Food for thought.
 
jc

-----Original Message-----
From: Shayne Lebrun [mailto:slebrun at muskoka.com]
Sent: Saturday, January 18, 2003 4:48 AM
To: Carroll, Jim P [Contractor]
Subject: RE: [Nagios-users] aggregated status messages


Service dependancies....perhaps.  But aren't services automatically
considered dependant if hosts are?
 
I kind of like the idea of squelching 'unreachable' messages and being done
with it, but....

-----Original Message-----
From: Carroll, Jim P [Contractor] [mailto:jcarro10 at sprintspectrum.com]
Sent: Friday, January 17, 2003 6:47 PM
To: 'Shayne Lebrun'; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] aggregated status messages


Perhaps host or service dependencies would be more to your liking.
 
jc

-----Original Message-----
From: Shayne Lebrun [mailto:slebrun at muskoka.com]
Sent: Friday, January 17, 2003 3:28 PM
To: Carroll, Jim P [Contractor]; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] aggregated status messages


Look at it this way.  One of my wireless subnets just went to hell; tripped
'service critical' messages for five or six servers, as the ping RTA time
suddenly shot up.  So I got five emails.  I'd rather have gotten one email
with five entries.

-----Original Message-----
From: nagios-users-admin at lists.sourceforge.net
[mailto:nagios-users-admin at lists.sourceforge.net]On Behalf Of Carroll, Jim P
[Contractor]
Sent: Friday, January 17, 2003 2:48 PM
To: 'Shayne Lebrun'; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] aggregated status messages


Read up on the 'parents' directive.  Sounds like this will do exactly what
you're looking for.  (This was one of the main reasons we switched from BB
to Nagios.)
 
http://nagios.webdev.sprintspectrum.com/nagios/docs/xodtemplate.html#host
<http://nagios.webdev.sprintspectrum.com/nagios/docs/xodtemplate.html#host> 
 
jc

-----Original Message-----
From: Shayne Lebrun [mailto:slebrun at muskoka.com]
Sent: Friday, January 17, 2003 11:33 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] aggregated status messages


One thing I'd love to see from Nagios would be the ability to have
aggregated status messages; I'd like to be able to tell it that when it
feels the need to send a status message, it should tell me what (if
anything) is already (still) down, what (if anything) has just come back up,
and what (if anything) has just gone down.
 
That way, if a satellite router goes down, I get one email saying 'routerX
down, hosts a,b,c,d,e,f,g now unreachable' instead of getting eight or nine
emails.  Also, if a subnet goes down, but things are coming back up one by
one, I can tell which is still down and which is still up.
 
Makes it more efficient for mobile devices reciving said emails, too.
 
Muskoka.com
115 Manitoba Street
Bracebridge, Ontario
P1L 2B6
(705)645-6097

Muskoka.com is pleased to announce
New High Speed  Services
please visit
http://www.muskoka.com/services.htm <http://www.muskoka.com/services.htm> 
for more information


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20030120/49e02346/attachment.html>


More information about the Users mailing list