Acknowledgement Escalations

Mathieu Gagné mgagne at iweb.com
Thu Jan 22 06:26:31 CET 2009


Hi,

First, thanks for your time and input.

RijilV wrote:
> 2009/1/21 Mathieu Gagné <mgagne at iweb.com <mailto:mgagne at iweb.com>>
> 
>     Here is the situation:
>     Somebody acknowledges a problem and forget about it.
>     How would you implement an acknowledgement escalation?
 >
> Mmmm, there are a couple of technology things you could do for this, but 
> the root of this problem is people, not computers.

Yha. I know, you know, our managers know. However we just can't beat 
them for making mistakes or being occupied by other problems. :)

 > You need to work our
> a process where people aren't ack'ing things just so they can fall back 
> asleep.  I personally suggest having nagios create a ticket with 
> whatever ticketing system you use (you use one right?!) so you can track 
> that issue.  That and having a 24x7 NOC helps :)

Yes. We use request tracker (RT) and I personally passed about 7 days 
working on the integration of Nagios to RT and our internal customer 
database.

So basically:
1) Problem: New Ticket
2) Acknowledgement: New comment about it
3) Recovery: Comment + Status=Resolved

And implemented another escalation system within RT:

1) No update for x minutes/hours, the manager gets informed about it.
2) No answer from the manager, his manager gets informed, etc. until the 
Pope gets informed.

And if they (the ones that forget) would try to close the ticket, a 
comment is added telling them the problem is still not solved from 
Nagios perspective and reopen the ticket if it's the case.

 > I would probably write that program to un-acknowledge things as well as
 > alarming.

We tough about it. However our customer would start to receive (again) 
problem alerts which is bad. I mean, we told him we acknowledged the 
problem, we just can't tell him the problem is still going on after 1h. :)


Anyway, I though there was better way to deal with it within Nagios. But 
relying on an external ticketing system was probably the best solution 
as per your suggestion.

Should we be able to set hostescalation/serviceescalation even if the 
problem is acknowledged? But on the other hand, when will it end? :)

Any other ideas or opinions?

--
Mathieu


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list