Dependent service checks don't fail when depended-on service check fails

Andreas Ericsson ae at op5.se
Tue Mar 31 11:05:43 CEST 2009


Jarrod Moore wrote:
> On Mon, Mar 30, 2009 at 10:13 PM, Andreas Ericsson <ae at op5.se> wrote:
>> Jarrod Moore wrote:
>>> On Thu, Mar 26, 2009 at 7:57 PM, Andreas Ericsson <ae at op5.se> wrote:
>>>> Jarrod Moore wrote:
>>>>> Hello everyone,
>>>>>
>>>>> I have a couple of related questions regarding service dependencies in
>>>>> Nagios and their limitations. I have two service checks (let's call
>>>>> them A and B) and service A depends on service B to function
>>>>> correctly. I want to set Nagios up so that if service B crashes then
>>>>> both services A and B are put into the critical state in Nagios. I've
>>>>> tried using service dependencies in Nagios to represent this behaviour
>>>>> but have yet to be successful. I can only get it to suppress
>>>>> notifications of service A if both services go down.
>>>>>
>>>> This is expected behaviour. If A is truly dependant on B, then A will
>>>> turn into a non-ok state of its own volition rather than as a result
>>>> of any dependency magic. Dependencies are designed as a means of
>>>> suppressing notifications. Otherwise, you would *always* get a
>>>> notification for B first, and a minute or so later from A (actually,
>>>> without the dependency you could get from A first).
>>>>
>>>>> Is there a way to do what I'm trying to do here? I'd have thought it
>>>>> would be logical that if a service depends on another service and the
>>>>> service depended on dies then all services depending on it would fail
>>>>> their checks as well, but there;s probably some scenario where it
>>>>> doesn't work so well. I've had a look through the mailing list
>>>>> archives and found someone had asked a similar question to the
>>>>> nagios-devel list about 2.5 years ago and didn't end up getting an
>>>>> answer, so I thought I might ask whether solutions to this type of
>>>>> problem had been developed since then.
>>>>>
>>>> They haven't. You're using dependencies the wrong way, really. If
>>>> A is truly dependent on B and doesn't go into a non-ok state after
>>>> B has crashed, then your check isn't doing what it's supposed to do,
>>>> or you've misunderstood the relationship somehow.
>>>>
>>>> If you were to explain what the two services actually are, it would
>>>> be easier to point you to a solution that works.
>>>>
>>>> --
>>>> Andreas Ericsson                   andreas.ericsson at op5.se
>>>> OP5 AB                             www.op5.se
>>>> Tel: +46 8-230225                  Fax: +46 8-230231
>>>>
>>>> Considering the successes of the wars on alcohol, poverty, drugs and
>>>> terror, I think we should give some serious thought to declaring war
>>>> on peace.
>>>>
>>> Well basically I have a map (similar to Google Maps) embedded in a
>>> website, which hits a URL to retrieve maps. So I have one check using
>>> check_http to check that the website itself is up and another check on
>>> that URL to make sure that the map service is available. Now if the
>>> map service goes down, the website is still up but the maps won't
>>> appear, which means the website's functionality is significantly
>>> affected. However, it is still up and viewable so doing a check on the
>>> website URL still passes.
>>>
>> It sounds to me like you'd want to make the map-check dependent on
>> the webserver-check. That would suppress notifications from the
>> map-check when it's the webserver that's bombing out. Do you really
>> need two notifications when the map-service goes offline?
> 
> Sorry, I didn't explain that very well. I have a website check that I
> want to have depend on the result of a map service check. The thing is
> that I would like two notifications to be sent to my email - one for
> the service check that is failing and one for each site that is
> affected by the crashed service. That way I would know what is
> affected and what needs fixing. Now I should mention at this point (if
> it wasn't already blindingly obvious) that I'm by no means a Nagios
> master. However, my idea was to have a chain of service dependencies
> and then not send notifications for service dependencies in between
> that I don't want emails about. There's probably a better way of doing
> what I want and in that case, I'm all ... eyes.
> 
>>> Now of course I could just write a script or something to check both
>>> URLs and set that as the check command. There is a problem for me with
>>> this approach, however, because I have some other instances where a
>>> web service depends on other web services.
>> Define "depend". As I understand the definition, coal-based lifeforms
>> on our fine planet depend on water and sunlight; Life cannot function
>> properly without them.
>> It sounds like you want to make sunlight depend on coal-based lifeforms,
>> because without the life, the sun is rather pointless.
>>
>> Instead of trying to coerce dependencies to work backwards, I'd sit
>> down and think what you want your Nagios installation to do for you,
>> and why you would want two services to go critical when one of them
>> does. Isn't one notification and one red blob in the UI enough? If
>> it isn't, what do you hope to gain from having two notifications add
>> two red blobs?
> 
> I'd say that a service "depends" on another when it requires the other
> service to provide 100% of its functionality. What I'm trying to say
> is that the two services that I'm providing are merely a subset of the
> entire dependency chain. The map service depends on data being in a
> PostgreSQL database. If the data isn't there, I want two emails - one
> saying the website doesn't work and that the data is missing. That
> check depends on the database server being available. If it isn't, I
> want two emails - one saying that the website is affected and one
> saying that the database is down.
> 
>>> When I want to use these
>>> services in websites, I'd then have to write a check for each script,
>>> each containing every service in the chain that is needed to display
>>> the website correctly. This way of doing things just seems a bit
>>> repetitive to me, especially when I have a check for these web
>>> services already.
>> I'm sorry, but I still fail to see the point. Perhaps you'd be better
>> off defining each website as a servicegroup with all of the services
>> that make up the entire visitor-experience parts of a particular
>> servicegroup. That would make it possible for you to get some sort of
>> visualization of what (Nagios-)services affect which customer-services,
>> while at the same time keeping configuration work to a minimum.
>>
>> --
>> Andreas Ericsson                   andreas.ericsson at op5.se
>> OP5 AB                             www.op5.se
>> Tel: +46 8-230225                  Fax: +46 8-230231
>>
>> Considering the successes of the wars on alcohol, poverty, drugs and
>> terror, I think we should give some serious thought to declaring war
>> on peace.
>>
> 
> Service groups would be enough if I was primarily using the Nagios web
> UI but, unfortunately, I'm after email notifications and (as far as I
> am aware) you can't define contacts for service groups. I could settle
> for a notification saying "Service <name> from the <name> group is
> down" or something similar.

There's a $SERVICEGROUPNAME$ macro you can use for notifications. I'm
not sure it'll do what you want though. The docs will tell you more
about which macros are available where.

http://www.nagios.org/docs/

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list