Automatically acknowledge services of an acknowledged host

Andreas Ericsson ae at op5.se
Tue Dec 14 10:43:29 CET 2010


On 12/14/2010 08:16 AM, Matthieu Kermagoret wrote:
> Hi list,
> 
> Sorry for my late answer but thanks to all of you who replied. It'll
> try to explain our issue a bit better.
> 
> 2010/12/9 Mathieu Gagné<mgagne at iweb.com>:
>> On 12/8/10 5:08 PM, Julien Mathis wrote:
>> That said, I still do not fully understand what you want to achieve or
>> what you really need. We do agree that you are proposing a "solution" to
>> a unknown/unclear problem. (to us)
>>
>> When the host is DOWN, service problems are silenced and NO
>> notifications are sent, they are "muted". Why would you want to
>> acknowledge a service problem if there isn't any notifications sent to
>> contacts?
>>
>> Is there any particular issue you are encountering? What are the course
>> of events and what is the expected behavior?
>>
> 
> The main problem we try to fix with this patch is about notifications.
> In fact you can configure services in such a way that notifications
> are sent when their state is UNKNOWN (and that's what we do, as the
> UNKNOWN state can be triggered by host problem, service dependency
> issue, or an UNKNOWN return value from a plugin, (don't know if the
> last is definitely wrong or not'0). So some of our customers want to
> stop notifications of services associated to an host when they
> acknowledge it.
> 

It seems you would be better off with a microscopic eventbroker module
that prevents sending UNKNOWN notifications if the host is down and
acknowledged. It could use custom variables for tweaking from the
default behaviour, and something similar could be integrated into a
later Nagios release to be supported from scratch.

>> Are service notifications sent to contacts when the host is back UP? Do
>> you want to acknowledge service problems for display purposes only?
>>
> 
> No they're not. Andreas is right when he says that the patch if
> "poorly thought out", because it's only a part of the solution we
> wanted to create. The original way we wanted to do it is to keep a
> state about the acknowledgement (whether automatically generated or
> not) and remove it when the host is back up if it was automatically
> generated. However this change would require to modify the host and
> service structures, which is AFAIK forbidden for the 3.x branch.

It is forbidden, but it's not forbidden to add extra object info in
separate hashlists. Internal state for various things can be kept
there, and we'll mark them as "subject to change" so module authors
know not to use them for anything that's supposed to work a long
period of time. That being said, such a design still leaves me asking
why the entire thing isn't in a broker module from the start.

I'd take a patch to block notifications from eventbroker modules in
the blink of an eye if that's the case. NEBERROR_CALLBACKOVERRIDE is
meant for things like that, but it's currently only supported for
host and service checks. Such a patch would make it positively
trivial to write an eventbroker module that does what you want.

> So we
> went with the "try to get it into upstream" way. I agree that this
> patch itself as it is only fill our customer's needs, but does the
> whole solution seems more appealing ?
> 

I'm not sure. I don't think I've fully understood the problem, tbh.

The workflow, afaiu, is this:
Some work is scheduled for a host, but the host isn't put into
scheduled downtime.
The host goes down, causing a DOWN state for the host and an UNKNOWN
state for agent-based service checks.
Someone acks the host with "working on it. It'll be up soon".

Currently, no service notifications should be sent for the unknown
states, since the host is down. If that's not the case, it's a bug
and "patches welcome", or I'll fix it myself when I have time. Or
it could be that notifications are sent to service-contacts who
aren't also host-contacts, since they won't be marked as "already
notified". Hmm. I'm not sure if that's a bug or not, but it seems
an unlikely scenario tbh.

If service notifications still go out in spite of the host being
down (I know this can happen sometimes, although it can be worked
around using check timings), you want the acknowledgement for the
host to also filter down to the services. Presumably because there
are people who are contacts for the services on the host, but not
for the host itself.

But the result of sending the ACK notifications to the people who
get service notifications but not host notifications is that they
all of a sudden get *more* notifications, not less.

Or did I misunderstand something?

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d




More information about the Developers mailing list