eventhandlers running when a dependent service dependency is not satisfied

John P. Rouillard rouilj at cs.umb.edu
Fri Dec 9 09:52:01 CET 2005


Hi Eli:

You didn't say what version of nagios you are running so I'll assume
2.0.

In message <439912BC.5020000 at ilm.com>,
Eli Stair writes:
>The question comes down to this:
>
>   Should a failed service check for a dependent trigger a check of its 
>parent before continuing?

IIRC from the code it does not force a check of the parent service. I
can see arguments for and against forcing a poll of the parent. Also
the documentation:

  http://nagios.sourceforge.net/docs/2_0/dependencies.html

in the "How Service Dependencies Are Tested" section, says:

  Nagios gets the current status of the service that is being depended upon.

not nagios repolls the service being depended upon. A footnote
says:

  by default, Nagios will use the most current hard state of the
  service(s) that is/are being depended upon

an option in the config file will allow it to use the current soft
state instead. I use the soft state of the service being depended upon
myself.

>If this is not the case, or default, is there _ANY_ way to implement this?

Sort of. The event handler for the child can send a
SCHEDULE_FORCED_SVC_CHECK external command for the parent specifying
the current time in seconds. See

 http://www.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=129

for details. The command will be acted upon immediately since nagios
reads the external command file after an event handler runs. Use this
to force an update of the current service status for the parent. Parse
through the objects.cache (probably in /var/log/nagios/objects.cache)
file for the expanded servicedependency objects to find the service
dependencies that match your host/service.

I set my nagios options so that:
  
	max_check_attempts(dependent)*retry_check_interval(dependent) >
	normal_check_interval(parent)

This way the parent service will be checked at least once during the
soft error interval of the dependent service.

>I want to avoid at all costs having an every-minute check of the parent
>processes on many thousand hosts just to keep from having the child
>process checks and event handlers going hay-wire.

You need to use the max_check_attempts to provide a buffer in which
the parent service will be checked. You can have your event handler
submit an external command on the first soft error and try to fix the
problem on a subsequent soft, or hard error. You don't have any of
those directives in your sample config.

>I want a dependency chain like this:
>
>   SSH -- SNMP --\
>                  - Ganglia
>                  - NTP

Just a note, I wouldn't have ssh in the dependency chain unless you
are accessing snmp over ssh (e.g. running check_snmp via
check_by_ssh). I can't tell if that is the case or not. Just because
your event handler runs over ssh doesn't add it to the dependency
chain IMO. If ssh is down, it means none of the other services will be
checked and you won't recognize them as down.

>I believe I have this set up so that a service check for SNMP is
>dependent on the SSH service running.

Did you verify in the web interface or object.cache?

>In turn, the service checks for
>other processes that use SNMP are dependent on SNMP running.  My intent 
>is that service checks for NTP,etc will not be attempted if its parent 
>SNMP process is not in an OK state (as I have an event handler that will 
>restart SNMP if it is dead).  If the parent SNMP _IS_ running, then the 
>child process checks (Ganglia, NTP, etc) will be checked and if dead 
>their own event handler will activate.

It looks like the config is ok on that score with one possible
exception noted below.

>The problem is that in this case, if I kill off SNMP the child process 
>checks STILL execute and return a CRITICAL.  As a result, nagios fires 
>off the event handler for all these checks which results in an SSH out 
>to the nodes in question and restarting a bunch of services that are 
>probably still running.  It SHOULD NOT schedule the child checks and 
>thus not run their event handlers until AFTER a new parent check has 
>returned executed and returned successfully, correct?

Nope, nagios doesn't re-run the parent or parents. If you are in a
soft failure mode, you can write your event handler to wait until you
are in a hard failure mode.

>I've included a dependency example below, and a snip from the nagios log 
>showing it sequentially hammering out checks of all the child processes 
>at the same time it already knows the parent is dead.
>[...]
>###################################################
>### snip of this host/group definition include:
>define host{
>         use                     linux-node-production
>         host_name               HOSTNAME1
>         address                 IP
>}
>
>define servicedependency{
>         host_name                       HOSTNAME1
>         service_description             SSH
>         dependent_host_name             HOSTNAME1
>         dependent_service_description   SNMP
>         execution_failure_criteria      w,p,u,c
>         notification_failure_criteria   w,p,u,c
>         inherits_parent                 1
>}
>
>define servicedependency{
>         host_name                       HOSTNAME1
>         service_description             SNMP
>         dependent_host_name             HOSTNAME1
>         dependent_service_description   SNMP--*

Not sure if SNMP--* does what you think (and I hope) it does.  Have you
looked at the view config web page and verified that nagios is seeing
the appropriate service dependencies?

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list