FW: Using service dependencies

Greg Vickers g.vickers at qut.edu.au
Tue Nov 15 01:23:52 CET 2005


Hi Deborah,

(I assume you are using v2.x)

Deborah Martin wrote:
> I'm trying to use service dependencies as part of my nagios config.
>  
> Basically, I want to do 2 checks, the primary check connects to a 
> database. The secondary check is for RAM usage.
> However, if the primary check fails I don't want nagios to try and 
> attempt the secondary check as there is no point if
> a system is down. I still want to see Nagios alert via the web front-end 
> but not be notified via email when the primary check
> is in a Unknown or Critical state.

I would ask: Why bother with this dependency at all?

Sequencial scenario:
0) Your DB check runs, all OK.
1) The DB crashes.
2) The RAM check dependency is checked, the DB service last reported OK.
3) The RAM check is executed and returns OK (unless your RAM check 
really does depend on the DB somehow??)
4) Your DB is checked later and returns UNKNOWN or CRITICAL.
5) Then NEXT time your RAM check runs, it will not run because the DB 
service is UNKNOWN or CRITICAL.

Normal operation of Nagios would go like this:
0) Check services on a host.
1) As soon as ANY service on a host changes to a non-OK state, *that 
host status is checked* and *no other service checks  on any hosts are 
executed* until the state of that host is determined. This is why host 
checks have to execute quickly - no other service checks will be 
processed/executed until host status is determined.
2) If the host is in a non-OK state, *no notifications for services on 
that host are sent out until that host comes back up*

(So if all services fail but the IP still responds to your check_host 
command (ping or ICMP) you will receive many service notifications.)

<quote> ... I don't want nagios to try and attempt the secondary check 
as there is no point if a system is down.</quote>
Fair enough. If you leave the RAM check non-dependant, *your host 
downtime will decrease* (according to Nagios) because there are more 
services to run on a host and a host state change from CRITICAL to OK 
will be detected quicker when there are fewer dependant services.

Since the RAM and DB checks aren't really dependant, having this 
dependency doesn't bring any value (IMHO) to your monitoring and will 
increase the amount of time a host spends in a DOWN state, since Nagios 
schedules and executes service checks for a DOWN host (increasing the 
number of times during a given time period that Nagios can discover a 
host state change) while suppressing notifications.

Service dependancies are usually used for truly dependant services, i.e. 
a db that serves content for a webserver. If the db goes down, don't 
check the webserver, as the content will not be accessable by the webserver.
Of course you may stil get a situation like the first example where the 
checks are executed in such an order that will not catch the db failure 
before the webserver is checked, and you get a notification about the 
webserver failure before the db failure.

If you have this dependency, you generate extra config for you or others 
to manage. (This may be fine :))

I would not implement this dependency as RAM status doesn't really 
depend on the db availability (really the other way around, the db 
depends on the amount of RAM available, depending on your situation.)

> (The plugins on their own all work fine thru nagios as do they via the 
> command-line. )
>  
> I've defined below what I understand to be the way to configure 
> dependencies but i'm not convinced its right even though
> it works fine. Could someone take a look and just sanity check this for 
> me and let me know if i'm doing the right/wrong thing ?

<snip>

> # device1 dependency checks
> define servicedependency{
>         host_name                       ngcp4
>         service_description             DB RAM checks for device1
>         dependent_host_name             ngcp4
>         dependent_service_description   device1 DB Check
>         execution_failure_criteria      u,c
>         notification_failure_criteria   u,c
>         }

You've got it back-to-front; the above config makes the 'device1 DB 
Check' service /depend on/ the 'DB RAM checks for device1' service.
The DB service check will not run, nor will notifications be sent out 
for this service, when the RAM service check is UNKNOWN or CRITICAL.
Your first paragraph states that you want the RAM check to depend on the 
DB check.

We use dependancies on our mail server as there are many dependant 
services (IMAP, POP, SMTP) that depend on many other components of the 
mail server - this is the only situation where we use dependancies. (I 
needed a large beer after figuring out that config...)

In conclusion, I would remove your dependency. But that's me :)

HTH,
-- 
Greg Vickers
Project Manager, IT Security
Information Technology Services
Queensland University of Technology
L12, 126 Margaret St, Brisbane

Phone: (07) 3864 9536
Mobile: 0410 434 734
Email: g.vickers at qut.edu.au
IT Security web site: http://www.its.qut.edu.au/itsecurity/

CRICOS No. 00213J


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list