service alert aggregation?

Joshua Barratt jbarratt at serialized.net
Tue Sep 30 07:27:45 CEST 2003


>I would argue you *do* want to be notified about situations where your
>servers are failing multiple service checks.  After all, isn't that the
>point of monitoring?  Moreover, you want to be notified when these
>services reach a warning state so things don't reach the critical point.

Ah. I'm afraid I've been unclear. I very much do want to know about 
cases in which I have multiple services failing!

However, assuming I have a sudden-onset catastrophic failure, such as 
what happened last night (the system started swapping at an insane 
rate), this is the alert sequence that was generated:
4:09 AM  "HTTP is CRITICAL"
4:09 AM  "IMAP is CRITICAL"
4:10 AM "FTP is CRITICAL"
4:12 AM "SMTP is CRITICAL"
...
and then the 4 corresponding "... is OK" pages.

I'm fairly certain that, in this case, the services all went critical at 
about the same time; it's just that the way the checks were scheduled, 
nagios wasn't sure (3/3) until :10 and :12 that FTP and SMTP were 
actually down.

What I would much rather have is 2 pages instead of 8:

4:10 AM "HTTP,IMAP,FTP,SMTP are CRITICAL"
...
4:21 AM "HTTP,IMAP,FTP,SMTP are OK"

So if a script simply trapped the first alert that would have been 
generated (HTTP is CRITICAL) and, because of that, scheduled service 
checks for "now" on that host, then waited (say 30 seconds) for any 
further alerts to come through for that host, an alert like the above 
could be created, instead of the flurry we otherwise have been getting.

My goal is not to reduce the amount of information flowing to the 
admins, just to turn the volume down. Legitimate pages need to get 
through, and as soon as the problem is known about, but the fewer the 
better!

Sorry for my initial lack of clarity, and thanks for the response...
Josh




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list