Children "unreachable" on soft down?

Christopher Burke cburke at utc.net
Wed Apr 8 20:49:15 CEST 2009


I wonder if there is something you can do with notification escalations?
I know you can control how the notifications are sent out, but I don't
know if a state change from down to unreachable to down will cause the
escalation to reset.

 

 

From: Israel Brewster [mailto:israel at frontierflying.com] 
Sent: Wednesday, April 08, 2009 2:32 PM
To: Marc Powell
Cc: nagios-users at lists.sourceforge.net Users
Subject: Re: [Nagios-users] Children "unreachable" on soft down?

 





On Apr 8, 2009, at 9:28 AM, Marc Powell wrote:

>
> On Apr 8, 2009, at 11:44 AM, Israel Brewster wrote:
>
>> So is this just something I'll have to live with? I don't seem to be
>> getting much feedback on the subject. :(
>
> Well, my response would be to fix the problem that's causing the
> outages in the first place or adjust the way you're monitoring the
> parents so that the plugin used recognizes when this temporary event
> is occurring.

Ok, fair enough. There is nothing we can do about the outages (as I 
explained in one of my e-mail, they are an artifact of the connection 
type), so that leaves us with adjusting the monitoring. Now I thought 
that the recheck options were there exactly for this reason: to catch 
brief outages and not alert. And for the parent host that seems to be 
the case, but apparently that logic doesn't carry on to the child 
hosts. As such, somehow things would need to be adjusted so it never 
even sees the outages, even enough to go into a soft down state. 
Anyone have any suggestions for how I can accomplish this? Adjusting 
the timeout or using, say, an ssh check rather than icmp won't do it - 
the packets are still lost, and the ssh check would still timeout.. 
Perhaps if I sent more pings at longer intervals (so that if it 
doesn't get a response the single check retries at 15 second intervals 
or so before returning a response), but then the check would start 
taking several seconds or more to complete, and that wouldn't be a 
good thing. Assuming nagios even allowed a check to run that long - 
doesn't it have a mechanism to kill a check that doesn't return in a 
given time frame? I'm a little stumped here how I can adjust things.

> What you're asking for is that nagios track that the
> child went from down->unreachable->down without an intermediate OK
> state and suppress notifications in that case. That would appear to be
> a code change and would be better discussed on nagios-devel but I
> would encourage the check plugin approach first.

Ok. I know there is code in there that know who it sent down messages 
to and doesn't send up messages to people that didn't get a down 
(primarily dealing with escalations) so I was hoping that maybe there 
would be something similar for this, i.e. seeing that the last 
notification sent was a down notification, and as such there is no 
need to send another. But if not, so be it. Thanks for the response!

-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------
>
> --
> Marc



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20090408/ee7b0091/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list