Host down, still doing active checks, causing multiple unwanted service failures

Toussaint OTTAVI t.ottavi at medi.fr
Fri Dec 12 16:43:43 CET 2008


Hi,

Marc Powell a écrit:
> Our ideas of accuracy would seem to differ ;)
>   

Sometimes, in life, it's necessary to be able to say : "I don't know". 
When a host is simply powered off, or unreachable due to network/wan 
failure, Nagios actually displays all the service checks with the 
results depending on how the plugin is written, and also depending on 
the exact time when the latest service check has occurred. Some results 
may be UNKNOWN, some other may be CRITICAL, and the others would be OK 
(if dependancy is used).

This really bothers me, I do think  this is inaccurate. In such a 
situation, I would expect all the services to be in "UNKNOWN" state.


>> We do not use email notifications, because we are only 2 guys, and  
>> this would generate too much messages.
>>     
>
> It shouldn't. In your scenario of 1 host down with X number of  
> services on it, you should only receive 1 down message and 1 recovery  
> message per host event (unless you want more).
>   

Nagios is smart enough, and notifications are very tunable, to avoid 
email notification floods. But other products, such as routers, 
firewalls or security software, are not. They used to fill our mailboxes 
with unuseful things. That's the reason why I don't like email 
notifications, at least for general purpose problems. I use them only 
for very critical events.

Moreover, parent/child system has been design exactly to handle the 
situation where a host is unreachable. This system allows to disable 
notifications for all services, which would necessary fail or return 
wrong results if host is unreachable. I would like to be able to use 
this system also do disable "incorrect" service status display, and, 
when a host is unreachable, having the display saying "UNKNOWN" for all 
services (such as hosts are displayed as UNREACHABLE).

This is the way I would like to see my results. This may not be the way 
other users would want to see them. But not two users are the same, have 
the same configuration, ot the same needs. I just would like to find a 
solution, allowing to display my results in a way that would be the most 
usable and valuable for me.


> Possibly but with an additional requirement that regularly scheduled  
> host checks are enabled for those hosts. Those are still considered  
> optional and have been undesirable for all prior versions of nagios  
> before current. If someone were to code the patch they would need to  
> ensure they were enabled for the hosts with this new feature enabled  
> otherwise the host would never be checked and return out of it's  
> critical state.
>   

I agree with you. Checks should be for services, and hosts should only 
be "containers" for services. Having to enable checks also for the hosts 
is a little bit confusing for beginners. I also consider host checks as 
"undesirable".

But, if I understand well, host checks are here to determine 
parent/child reachability, which then allows to determine UNREACHABLE 
status, then disable unuseful service failure notifications. Then, why 
not creating parent/child relationship between services ? This would 
remove the need of  host checks, and this would allow services to be 
displayed as UNREACHABLE or UNKNOWN, if their parent service check fails.

Dependancy already exists for both hosts and services. Why not 
parent/child/unreachable relationship ?

Of course, this is only a feature suggestion, everybody should be free 
to use it or not. But I'll be happy to use it ;-)


> This is promising. http://nagios.sourceforge.net/docs/3_0/objecttricks.html#same_host_dependency 
>   will help with the config if you haven't seen it.
>   

It works fine. Ability to use wildcards is a great feature. Services now 
don't fail when a host is unreachable, but some problems (for me) remain :
- all services keep their previous status, which is usually OK. As 
previously said, in such a situation, I would prefer UNKNOWN
- "latency" problem : some service checks are sometimes  scheduled AFTER 
the WAN failure, but BEFORE the dependancy service check. Then, they 
fail.Using "soft dependancy" and scheduling the dependancy service check 
more often, helps to reduce this situation. But it still happens from 
times to times.

>>  Am I the only one having this problem ?
>>     
>
> I don't consider it a problem myself, just that nagios doesn't work as  
> you want it to in your environment. I personally prefer the current  
> behavior since it provides more accurate information over a wider  
> variety of outage scenarios.
>   

Let's be clear. Nagios has no problems, it behaves exactly as it is 
intended to. The one who as a problem is ME. I need to present the 
results in a different way in case of unreachable host, and I'm looking 
for a solution to do that.

I just would like to know if I am the only guy thinking results of 
service checks for unreachable hosts should be displayable differently ?

KInd regards,
-- 

*Toussaint OTTAVI*

*MEDI INFORMATIQUE*
***Mail:* t.ottavi at medi.fr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081212/80252f22/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list