service alert aggregation?

Roy Sigurd Karlsbakk roy at karlsbakk.net
Tue Sep 30 10:37:21 CEST 2003


Change misccommands.conf to pipe to a script, and you're in :)

On Tue, 2003-09-30 at 04:43, Joshua Barratt wrote:
> I just spent a very interesting afternoon reading through the last few 
> months of list archives, but was unable to come up with an answer to my 
> question. I apoligize if this has been dealt with to death.
> 
> Basically, quite often if there is a problem with a host, many of it's 
> services will be down, but it will still be pingable. (The TCP/IP stack 
> is a hardy beast.) Possible causes: disk filling up, ram+swap filling 
> up, very heavy load, etc (even some kernel panics!) -- all of these can 
> cause more than one service to become unreachable, and in many cases, 
> *all* services unreachable -- but still the host check will not fail. 
> This causes the admins to get a flurry of service down alerts, and, when 
> the problem is corrected, a flurry of service up alerts.
> 
> I tried doing the service dependency route, but the basic problem is 
> still that because of the nagios scheduler, it may decide that the SMTP 
> server is critical, say, 2 minutes before deciding that the service that 
> SMTP depends on is critical, and thus you get paged for both.
> 
> Is it possible to configure things so you don't have that problem? I 
> understand escalations, but that still doesn't really solve things, 
> unless I'm missing something. I'll still get individual pages for every 
> individual service that is experiencing a problem.
> 
> My idea (if simple configuration is not the solution) is to do something 
> like this:
> When a service alert is generated, instead of being emailed directly, it 
> is emailed (or piped) to a script. That script then communicates with 
> the nagios daemon and shedules immediate checks for all the services on 
> the affected server. It waits some suitable time period, and then 
> packages all the alerts received within that window into a single 
> message which it then sends to the admins. (The same process would 
> happen with the service up alerts.)
> 
> This might not be foolproof, but I think it would cut down on a lot of 
> spurious paging.
> 
> Has anyone else solved this problem?
> 
> Thanks for any input,
> 
> Joshua Barratt
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list