Soft fail alerts in web interface

Sean Carley scarley at gmi-mr.com
Thu Apr 19 23:42:54 CEST 2012


Being able to ACK before alerts go out (without guessing at downtime) would be convenient for anyone, whether it's one random thing you want to catch before it alerts, or whether you're responding to the first alert of a major outage and want to handle the rest of the services while they are still in SOFT state. We send a subset of Critical alerts to an on-call pager escalation system so catching them before they get there is a Good Thing. 

Disabling notifications per-service is a fantastic way to make your entire department look bad when the service fails again days, weeks, months or years from now and you've forgotten to re-enable notifications for it. For one service out of 4000+, that icon can go unnoticed for quite a while. As you mention, if the notification-disabled state IS noticed the next time it fails unexpectedly, they might assume that the problem was expected and ignore it instead of responding. This would actually be a Bad Thing. 


-----Original Message-----
From: Paul Dubuc [mailto:work at paul.dubuc.org] 
Sent: Thursday, April 19, 2012 11:25 AM
To: Nagios Users List
Cc: Sean Carley; Corcoran Smith
Subject: Re: [Nagios-users] Soft fail alerts in web interface

I wouldn't argue against this feature.  It would be convenient to have, but your argument depends on your definition of "a problem".  SOFT state errors are an indication of a potential problem, or it may be a transient problem. 
That's why SOFT states exist so Nagios can retry the test to determine if there is a real problem.  If your staff is alert enough to notice SOFT state problems before anyone is notified, surely they notice from the display that notifications are disabled.  It probably means the "problem" was expected. 
Scheduling downtime would be a better solution than a SOFT state acknowledgement if the check is flapping between OK and error states.

Sean Carley wrote:
> Hi Corcoran, I agree this is a needed feature. One of my guys was 
> saying it's only the default Nagios gui that prevents this, you can 
> try something like Nagstamon to get a jump on acking SOFT failures.
>
> One really should not have to wait until the HARD state pages people 
> to acknowledge a problem. That is silly, and so is the confused user 
> counter-argument. Surely your staff would soon learn that acks can 
> come without alerts (assuming they can't be trained to uncheck the 
> notification box). It would be even better if Nagios could make that 
> notification box checked by default for HARD state, and unchecked for SOFT.
>
> We found disabling notifications to be a dangerous thing, and not a 
> solution to this problem. People never remember to re-enable them and 
> alerts get missed. I got tired of checking for disabled notifications 
> regularly, so I put in an apache rewrite rule to discourage disabling 
> them in the first place. We insist the user either acknowledge a 
> problem or schedule downtime.
>
> -Sean
>
> -----Original Message----- From: Andreas Ericsson [mailto:ae at op5.se] Sent:
> Thursday, April 19, 2012 6:00 AM To: Nagios Users List Cc: Corcoran 
> Smith
> Subject: Re: [Nagios-users] Soft fail alerts in web interface
>
> On 04/19/2012 09:41 AM, Corcoran Smith wrote:
>>
>> Hi no we don't want to disable notifications entirely, we just want 
>> to be able to faster acknowledge SOFT FAILS or disable them entirely?
>>
>
> That's not a question, so the question mark at the end is a bit odd.
>
>>
>> Fact: All of our technical staff are Microsoft Certified
>>
>
> That might explain it and even gives a hint to why you left out the 
> email you responded to with your non-question-masking-as-a-question, 
> which I presume gave you some sort of response to some initial question.
>

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list