service alert aggregation?

Tedman Eng teng at dataway.com
Tue Sep 30 17:47:56 CEST 2003


Perhaps you can clump the services into a cluster using the service cluster
plugin.
http://nagios.sourceforge.net/docs/1_0/clusters.html

Set the threshold to 1 or more, and you'll get paged if any of the services
fail.  Multiple individual notifications will be eliminated, since alerts
are only sent according to notify/renotify settings of the cluster
definition.  Once your admins know that "One or more of HTTP, FTP, SMTP, or
IMAP" has failed, they can look at the nagios screen to find out exactly
which one it is.

HTH


"Joshua Barratt" <jbarratt at serialized.net> wrote in message
news:3F791451.5080902 at serialized.net...
>
> Ah. I'm afraid I've been unclear. I very much do want to know about
> cases in which I have multiple services failing!
>
> However, assuming I have a sudden-onset catastrophic failure, such as
> what happened last night (the system started swapping at an insane
> rate), this is the alert sequence that was generated:
> 4:09 AM  "HTTP is CRITICAL"
> 4:09 AM  "IMAP is CRITICAL"
> 4:10 AM "FTP is CRITICAL"
> 4:12 AM "SMTP is CRITICAL"
> ...
> and then the 4 corresponding "... is OK" pages.
>
> I'm fairly certain that, in this case, the services all went critical at
> about the same time; it's just that the way the checks were scheduled,
> nagios wasn't sure (3/3) until :10 and :12 that FTP and SMTP were
> actually down.
>
> What I would much rather have is 2 pages instead of 8:
>
> 4:10 AM "HTTP,IMAP,FTP,SMTP are CRITICAL"
> ...
> 4:21 AM "HTTP,IMAP,FTP,SMTP are OK"
>
> So if a script simply trapped the first alert that would have been
> generated (HTTP is CRITICAL) and, because of that, scheduled service
> checks for "now" on that host, then waited (say 30 seconds) for any
> further alerts to come through for that host, an alert like the above
> could be created, instead of the flurry we otherwise have been getting.
>
> My goal is not to reduce the amount of information flowing to the
> admins, just to turn the volume down. Legitimate pages need to get
> through, and as soon as the problem is known about, but the fewer the
> better!
>
> Sorry for my initial lack of clarity, and thanks for the response...
> Josh
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list