eventhandlers running when a dependent service dependency is not satisfied

Eli Stair estair at ilm.com
Fri Dec 9 22:57:04 CET 2005


Thanks a million for pointing out the 'SCHEDULE_FORCED_SVC_CHECK', I'm 
now rewriting and testing the event handlers to take care of this.  If 
only there were a macro/variable of the master service... looking for a 
lightweight way to determine the <service_description> to pass to the 
macro that is the direct parent of the check that just failed.

WRT the SSH/SNMP dependency issue, I have a feeling that I'm missing 
something here altogether, or didn't include enough info in my initial 
report, as both you and Hugo mentioned a possible issue with this.

To be clear, I'm doing this only so that if a dependent service IS down 
(Ganglia) and SNMP has been shown to be up (after 
'SCHEDULE_FORCED_SVC_CHECK',) I need to (or want to) make sure that SSH 
is running before attempting to connect.  There are enough failure modes 
that occur causing SSH to die at the same time as other services that I 
want to avoid a bunch of high-latency/timeout/CPU event handlers running 
if they are bound to fail.

Thanks for the accurate pointer to that macro,

Cheers,

/eli


Here's the output of view config showing that it is configured the way I 
think... just not sure if that is something I don't want to do :)


Host	Service	Host	Service	Dependency Type	Dependency Failure Options
deathstar1001	SNMP-- Ganglia running 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- Ganglia running 	deathstar1001	SNMP 	Check 
Execution	Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- NTP running 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- NTP running 	deathstar1001	SNMP 	Check Execution 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- cron running 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- cron running 	deathstar1001	SNMP 	Check Execution 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- automounter running 4 instances 	deathstar1001 
SNMP 	Notification	Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- automounter running 4 instances 	deathstar1001 
SNMP 	Check Execution	Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- load -lt 4 	deathstar1001	SNMP 	Notification 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP-- load -lt 4 	deathstar1001	SNMP 	Check Execution 
Warning, Unknown, Critical, Pending
deathstar1001	SNMP 	deathstar1001	SSH 	Notification	Warning, Unknown, 
Critical, Pending
deathstar1001	SNMP 	deathstar1001	SSH 	Check Execution	Warning, Unknown, 
Critical, Pending

John P. Rouillard wrote:
> Hi Eli:
> 
> You didn't say what version of nagios you are running so I'll assume
> 2.0.
> 
> In message <439912BC.5020000 at ilm.com>,
> Eli Stair writes:
> 
>>The question comes down to this:
>>
>>  Should a failed service check for a dependent trigger a check of its 
>>parent before continuing?
> 
> 
> IIRC from the code it does not force a check of the parent service. I
> can see arguments for and against forcing a poll of the parent. Also
> the documentation:
> 
>   http://nagios.sourceforge.net/docs/2_0/dependencies.html
> 
> in the "How Service Dependencies Are Tested" section, says:
> 
>   Nagios gets the current status of the service that is being depended upon.
> 
> not nagios repolls the service being depended upon. A footnote
> says:
> 
>   by default, Nagios will use the most current hard state of the
>   service(s) that is/are being depended upon
> 
> an option in the config file will allow it to use the current soft
> state instead. I use the soft state of the service being depended upon
> myself.
> 
> 
>>If this is not the case, or default, is there _ANY_ way to implement this?
> 
> 
> Sort of. The event handler for the child can send a
> SCHEDULE_FORCED_SVC_CHECK external command for the parent specifying
> the current time in seconds. See
> 
>  http://www.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=129
> 
> for details. The command will be acted upon immediately since nagios
> reads the external command file after an event handler runs. Use this
> to force an update of the current service status for the parent. Parse
> through the objects.cache (probably in /var/log/nagios/objects.cache)
> file for the expanded servicedependency objects to find the service
> dependencies that match your host/service.
> 
> I set my nagios options so that:
>   
> 	max_check_attempts(dependent)*retry_check_interval(dependent) >
> 	normal_check_interval(parent)
> 
> This way the parent service will be checked at least once during the
> soft error interval of the dependent service.
> 
> 
>>I want to avoid at all costs having an every-minute check of the parent
>>processes on many thousand hosts just to keep from having the child
>>process checks and event handlers going hay-wire.
> 
> 
> You need to use the max_check_attempts to provide a buffer in which
> the parent service will be checked. You can have your event handler
> submit an external command on the first soft error and try to fix the
> problem on a subsequent soft, or hard error. You don't have any of
> those directives in your sample config.
> 
> 
>>I want a dependency chain like this:
>>
>>  SSH -- SNMP --\
>>                 - Ganglia
>>                 - NTP
> 
> 
> Just a note, I wouldn't have ssh in the dependency chain unless you
> are accessing snmp over ssh (e.g. running check_snmp via
> check_by_ssh). I can't tell if that is the case or not. Just because
> your event handler runs over ssh doesn't add it to the dependency
> chain IMO. If ssh is down, it means none of the other services will be
> checked and you won't recognize them as down.
> 
> 
>>I believe I have this set up so that a service check for SNMP is
>>dependent on the SSH service running.
> 
> 
> Did you verify in the web interface or object.cache?
> 
> 
>>In turn, the service checks for
>>other processes that use SNMP are dependent on SNMP running.  My intent 
>>is that service checks for NTP,etc will not be attempted if its parent 
>>SNMP process is not in an OK state (as I have an event handler that will 
>>restart SNMP if it is dead).  If the parent SNMP _IS_ running, then the 
>>child process checks (Ganglia, NTP, etc) will be checked and if dead 
>>their own event handler will activate.
> 
> 
> It looks like the config is ok on that score with one possible
> exception noted below.
> 
> 
>>The problem is that in this case, if I kill off SNMP the child process 
>>checks STILL execute and return a CRITICAL.  As a result, nagios fires 
>>off the event handler for all these checks which results in an SSH out 
>>to the nodes in question and restarting a bunch of services that are 
>>probably still running.  It SHOULD NOT schedule the child checks and 
>>thus not run their event handlers until AFTER a new parent check has 
>>returned executed and returned successfully, correct?
> 
> 
> Nope, nagios doesn't re-run the parent or parents. If you are in a
> soft failure mode, you can write your event handler to wait until you
> are in a hard failure mode.
> 
> 
>>I've included a dependency example below, and a snip from the nagios log 
>>showing it sequentially hammering out checks of all the child processes 
>>at the same time it already knows the parent is dead.
>>[...]
>>###################################################
>>### snip of this host/group definition include:
>>define host{
>>        use                     linux-node-production
>>        host_name               HOSTNAME1
>>        address                 IP
>>}
>>
>>define servicedependency{
>>        host_name                       HOSTNAME1
>>        service_description             SSH
>>        dependent_host_name             HOSTNAME1
>>        dependent_service_description   SNMP
>>        execution_failure_criteria      w,p,u,c
>>        notification_failure_criteria   w,p,u,c
>>        inherits_parent                 1
>>}
>>
>>define servicedependency{
>>        host_name                       HOSTNAME1
>>        service_description             SNMP
>>        dependent_host_name             HOSTNAME1
>>        dependent_service_description   SNMP--*
> 
> 
> Not sure if SNMP--* does what you think (and I hope) it does.  Have you
> looked at the view config web page and verified that nagios is seeing
> the appropriate service dependencies?
> 
> 				-- rouilj
> John Rouillard
> ===========================================================================
> My employers don't acknowledge my existence much less my opinions.
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list