service alert aggregation?

Demetri Mouratis dmourati at cm.math.uiuc.edu
Tue Sep 30 07:04:17 CEST 2003


On Mon, 29 Sep 2003, Joshua Barratt wrote:

> Basically, quite often if there is a problem with a host, many of it's
> services will be down, but it will still be pingable. (The TCP/IP stack
> is a hardy beast.) Possible causes: disk filling up, ram+swap filling
> up, very heavy load, etc (even some kernel panics!) -- all of these can
> cause more than one service to become unreachable, and in many cases,
> *all* services unreachable -- but still the host check will not fail.
> This causes the admins to get a flurry of service down alerts, and, when
> the problem is corrected, a flurry of service up alerts.

Well, by what you desribe you have a series of service critical events
taking place and do *not* want to be notified about them.

You can solve this any number of ways:

1.	Increase max_check_attempts for service checks
2.	Remove critical notifications for service checks
3.	Remove recovery notifications for service checks
4.	Increase notification_interval for service checks
5.	Use a global event handler (you mentioned this one)
6.	Use service dependencies (you mentioned this one)

I would argue you *do* want to be notified about situations where your
servers are failing multiple service checks.  After all, isn't that the
point of monitoring?  Moreover, you want to be notified when these
services reach a warning state so things don't reach the critical point.

In any event, you can reduce the notifications, or turn them off to your
heart's content.  Just be sure you don't go to far and miss something
because you wanted to reduce the number of pages/emails.

---------------------------------------------------------------------
Demetri Mouratis
dmourati at linfactory.com



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list