selectively disable notifications for services inheriting a specific template?

Owen LaGarde owen.m.lagarde at usace.army.mil
Fri Jan 23 22:03:31 CET 2009


Here's a moderately satisfactory fix:  with a test already in place to
check the availability of a TGT and the cache maintained by an event
broker module (ie., such that all nagios child processes see the same
cache) you can add an even handler to the service template inherited by
the services for which you want to disable notifications when the TGT
isn't available.  The event handler script then runs the TGT check
plugin as an external process and enables/disables service notifications
for "this" service (ie., the one executing the event handler) by
including the NAGIOS_HOSTNAME and NAGIOS_SERVICEDISPLAYNAME in the
external command.

Note:

- The TGT check itself must *not* inherit this template -- if it does,
TGT failure will be blocked along with TGT-failure-caused failures.

- A race condition exists in that the event handler fires after the
parent service check declares a state change.  That state change,
depending on the applicable retries value and delay between retries, can
potentially cause a notification to fire prior to processing of any
external commands generated by the event handler.  Total time between
first soft state transition and first hard state transition must be at
least twice the interval at which nagios processes the external command
queue.

- This does *nothing* about the failures themselves -- the assumption is
that you want the failure to occur but with select notifications
blocked.


On Wed, 2009-01-21 at 20:21 -0600, Owen LaGarde wrote:
> I essentially want to disable only notifications, only when a specific
> service check fails, only for services inheriting a specific template.
> And I'm lazy, and don't want to double the size of my config for this
> one function.
> 
> I'm using nagios 3.0.2 and nagios-plugins 1.4.12 with a large (>1000
> hosts/services) configuration.  Most (800+) of the services use a
> "check_remote" custom plugin to tunnel network calls [ie., to plugins on
> other hosts] within kerberos authenticated and encrypted sessions.  That
> kerberos activity requires a ticket cache containing a TGT;  said cache
> is maintained by a custom event broker module and said TGT's presence is
> monitored by a service definition referencing a custom check_krbtgt
> plugin.  This has worked great so far -- no race conditions, clean
> start/restart/refresh cycles for the cache and TGT, scales well, etc.
> For a number of policy reasons all service definitions use active
> checks.  When the TGT check fails it logically follows that all service
> checks using the check_remote plugin have or are about to fail.  This is
> desirable behavior -- depending on the nature of the kerberos TGT
> problem there are a number of check_remote failure messages and these
> need to be captured in the event log, so I *don't* want to block *any*
> checks from running.
> 
> But...
> 
> If it's the TGT check that's failed then the notifications for
> everything except the TGT failure are inappropriate and should be
> blocked.  In effect I want to cause a specific service check's failure
> to disable notifications for a large (800+) number of other service
> checks but I don't want to nearly double the size of the config tree and
> have that much more text to wade through when maintaining the nagios
> configuration.  Remember, I need the checks that are about to fail
> because of the TGT failure to go on and run, and fail, and log their
> events and perfdata.  It's just the notifications that should be
> stopped, and only then if the originating service "use"-es the
> remote-active-service template.
> 
> Anybody else doing this?
> 
> 
-- 
Sincerely,

    Owen LaGarde
    Senior Systems Administrator
    Owen.M.LaGarde at usace.army.mil
    1-800-522-6937 x4879

Engineering Research and Development Center
attn: CEERD-IH-C (Owen LaGarde)
3909 Halls Ferry Road
Vicksburg, MS 39180-6199

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <https://www.monitoring-lists.org/archive/users/attachments/20090123/796087ea/attachment.sig>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list