Dependancy clarification

Israel Brewster israel at frontierflying.com
Mon Jan 11 18:45:48 CET 2010


On Jan 8, 2010, at 2:55 PM, gmartin wrote:

> Israel,
> I believe you are correct.  I'll be interested to hear what other  
> shave to say on the inner workings.  In the meantime, can the  
> problem be solved if the event handler for Service B is written to  
> restart svc A  if it is down? (perhaps it calls the same nagios  
> check from the command line and acts on the results)

Yeah, that should work, at least for my specific situation. Of course,  
doing so greatly reduces the utility of having the dependancy in the  
first place, since the situations under which it would be triggered  
(given nagios restarting service A as soon as it detects it as down)  
would be somewhat rare, and even when triggered it would no longer be  
needed, since the service B event handler does its own dependancy  
checking.

The only time the dependancy would apply (assuming our understanding  
is right) is in the situation where Nagios detects A as down, and then  
tries to run a check on B before verifying that A is back up. Of  
course, even then it wouldn't matter, since a) nagios should have  
restarted service A immediately (so a straight restart of B would be  
fine), and b) even if nagios didn't, the new event handler for service  
B would. At which point there is no need of the dependancy at all,  
since the event handler takes care of the dependancies. Basically, if  
the dependancy only applies when nagios ALREADY knows service A is  
down, then the dependancy is basically useless, at least in this  
situation. Of course, if this is just the way dependancies work, then  
there may be no other option. Thanks for the feedback.

>
> \\Greg
>
>
>
> On Fri, Jan 8, 2010 at 6:07 PM, Israel Brewster <israel at frontierflying.com 
> > wrote:
> Here's the situation: running nagios 3.2.0, I have two services,  
> we'll call them A and B. Both have event handlers such that if they  
> register a hard critical state, Nagios attempts to restart them.  
> Service B depends on service A, such that when service A goes down,  
> service B does as well, causing them both to need restarted, with A  
> needing to be restarted first. I have a servicedependancy set up in  
> nagios specifying service B's dependancy on service A.
>
> My understanding is that the way this works is that when nagios goes  
> to check service B, it first looks at the "current" state (as  
> defined by the last nagios check) of service A, and, if the  
> execution_failure_criteria matches (i.e. if service A is down)  
> nagios does not run the check on service B, thus not running the  
> event handler to attempt to restart B until A is back up. This is  
> good. But what happens in the following scenario?
>
> Service A is scheduled to check every 5 minutes.
> 1) Nagios does a normally scheduled check of service A, finding it  
> to be OK.
> 2) One minute later, Service A crashes
> 3) One minute after that (three minutes prior to the next regular  
> check of service A), thanks to nagios staggering checks, Nagios goes  
> to do a normal check of service B
>
> Now, to my understanding of this scenario, the check on service B  
> would run normally, since the last check on A was OK, and nagios  
> uses cached results for dependancy checks. Since service A is  
> actually critical, service B will be critical as well. The problem  
> with this is that Nagios will respond by attempting to restart  
> service B, which will invariably fail since service A is still down.  
> Once the next regular check time for service A is reached, Nagios  
> will detect service A as down and restart it, but service B will  
> never get restarted successfully, since nagios already tried and  
> failed.
>
> Is this correct? If so, what can be done about it? Or is nagios  
> smart enough to schedule its service checks to avoid this scenario?  
> It seems that the most logical solution (if possible) would be to  
> mirror the service/host check logic. That is, when a check of  
> service B comes back as critical, immediately check service A. If  
> service A is critical, then don't declare service B to be critical  
> until service A is OK, at which point B would enter a hard down  
> state and run the event handler. Alternately, if I could say  
> something like always check service A immediately before checking  
> service B to make sure our data is current, that would work as well.  
> Although I could see it resulting in excessive checking of service  
> A, which may be less desirable. What do you guys think?
> -----------------------------------------------
> Israel Brewster
> Computer Support Technician II
> Frontier Flying Service Inc.
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7250 x293
> -----------------------------------------------
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast  
> and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when  
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>

-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100111/46b511e1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Israel Brewster.vcf
Type: text/directory
Size: 417 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100111/46b511e1/attachment.bin>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100111/46b511e1/attachment-0001.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list