Dependent service checks don't fail when depended-on service check fails

Jarrod Moore masternayru at gmail.com
Tue Mar 31 07:05:55 CEST 2009
Previous message: Dependent service checks don't fail when depended-on service check fails
Next message: Dependent service checks don't fail when depended-on service check fails
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Mar 30, 2009 at 10:13 PM, Andreas Ericsson <ae at op5.se> wrote:
> Jarrod Moore wrote:
>>
>> On Thu, Mar 26, 2009 at 7:57 PM, Andreas Ericsson <ae at op5.se> wrote:
>>>
>>> Jarrod Moore wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> I have a couple of related questions regarding service dependencies in
>>>> Nagios and their limitations. I have two service checks (let's call
>>>> them A and B) and service A depends on service B to function
>>>> correctly. I want to set Nagios up so that if service B crashes then
>>>> both services A and B are put into the critical state in Nagios. I've
>>>> tried using service dependencies in Nagios to represent this behaviour
>>>> but have yet to be successful. I can only get it to suppress
>>>> notifications of service A if both services go down.
>>>>
>>> This is expected behaviour. If A is truly dependant on B, then A will
>>> turn into a non-ok state of its own volition rather than as a result
>>> of any dependency magic. Dependencies are designed as a means of
>>> suppressing notifications. Otherwise, you would *always* get a
>>> notification for B first, and a minute or so later from A (actually,
>>> without the dependency you could get from A first).
>>>
>>>> Is there a way to do what I'm trying to do here? I'd have thought it
>>>> would be logical that if a service depends on another service and the
>>>> service depended on dies then all services depending on it would fail
>>>> their checks as well, but there;s probably some scenario where it
>>>> doesn't work so well. I've had a look through the mailing list
>>>> archives and found someone had asked a similar question to the
>>>> nagios-devel list about 2.5 years ago and didn't end up getting an
>>>> answer, so I thought I might ask whether solutions to this type of
>>>> problem had been developed since then.
>>>>
>>> They haven't. You're using dependencies the wrong way, really. If
>>> A is truly dependent on B and doesn't go into a non-ok state after
>>> B has crashed, then your check isn't doing what it's supposed to do,
>>> or you've misunderstood the relationship somehow.
>>>
>>> If you were to explain what the two services actually are, it would
>>> be easier to point you to a solution that works.
>>>
>>> --
>>> Andreas Ericsson                   andreas.ericsson at op5.se
>>> OP5 AB                             www.op5.se
>>> Tel: +46 8-230225                  Fax: +46 8-230231
>>>
>>> Considering the successes of the wars on alcohol, poverty, drugs and
>>> terror, I think we should give some serious thought to declaring war
>>> on peace.
>>>
>>
>> Well basically I have a map (similar to Google Maps) embedded in a
>> website, which hits a URL to retrieve maps. So I have one check using
>> check_http to check that the website itself is up and another check on
>> that URL to make sure that the map service is available. Now if the
>> map service goes down, the website is still up but the maps won't
>> appear, which means the website's functionality is significantly
>> affected. However, it is still up and viewable so doing a check on the
>> website URL still passes.
>>
>
> It sounds to me like you'd want to make the map-check dependent on
> the webserver-check. That would suppress notifications from the
> map-check when it's the webserver that's bombing out. Do you really
> need two notifications when the map-service goes offline?

Sorry, I didn't explain that very well. I have a website check that I
want to have depend on the result of a map service check. The thing is
that I would like two notifications to be sent to my email - one for
the service check that is failing and one for each site that is
affected by the crashed service. That way I would know what is
affected and what needs fixing. Now I should mention at this point (if
it wasn't already blindingly obvious) that I'm by no means a Nagios
master. However, my idea was to have a chain of service dependencies
and then not send notifications for service dependencies in between
that I don't want emails about. There's probably a better way of doing
what I want and in that case, I'm all ... eyes.

>> Now of course I could just write a script or something to check both
>> URLs and set that as the check command. There is a problem for me with
>> this approach, however, because I have some other instances where a
>> web service depends on other web services.
>
> Define "depend". As I understand the definition, coal-based lifeforms
> on our fine planet depend on water and sunlight; Life cannot function
> properly without them.
> It sounds like you want to make sunlight depend on coal-based lifeforms,
> because without the life, the sun is rather pointless.
>
> Instead of trying to coerce dependencies to work backwards, I'd sit
> down and think what you want your Nagios installation to do for you,
> and why you would want two services to go critical when one of them
> does. Isn't one notification and one red blob in the UI enough? If
> it isn't, what do you hope to gain from having two notifications add
> two red blobs?

I'd say that a service "depends" on another when it requires the other
service to provide 100% of its functionality. What I'm trying to say
is that the two services that I'm providing are merely a subset of the
entire dependency chain. The map service depends on data being in a
PostgreSQL database. If the data isn't there, I want two emails - one
saying the website doesn't work and that the data is missing. That
check depends on the database server being available. If it isn't, I
want two emails - one saying that the website is affected and one
saying that the database is down.

>> When I want to use these
>> services in websites, I'd then have to write a check for each script,
>> each containing every service in the chain that is needed to display
>> the website correctly. This way of doing things just seems a bit
>> repetitive to me, especially when I have a check for these web
>> services already.
>
> I'm sorry, but I still fail to see the point. Perhaps you'd be better
> off defining each website as a servicegroup with all of the services
> that make up the entire visitor-experience parts of a particular
> servicegroup. That would make it possible for you to get some sort of
> visualization of what (Nagios-)services affect which customer-services,
> while at the same time keeping configuration work to a minimum.
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
>

Service groups would be enough if I was primarily using the Nagios web
UI but, unfortunately, I'm after email notifications and (as far as I
am aware) you can't define contacts for service groups. I could settle
for a notification saying "Service <name> from the <name> group is
down" or something similar.

------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Dependent service checks don't fail when depended-on service check fails
Next message: Dependent service checks don't fail when depended-on service check fails
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list