Distributed failover enhancements

Jason Martin jhmartin at toger.us
Wed Aug 11 17:45:03 CEST 2004


In reading the documentation on distributed / failover
situations, I see that there is a need for two additional
features to make it complete.

One is that there should be an external command that will
disable the OCSP. Imagine you have three machines, central
server C and distributed servers D1 and D2.  D2 is the backup
for D1 and is configured for failover monitoring. The
documentation says that D1's ocsp should report to both C and D2
so that D2 is up to date on current conditions. The problem is
that even though notifications, checks and event handlers are
all disabled on D2, the OCSP can't be disabled from within
Nagios. So, when D1 sees that a particular service is down,
it'll report it via the OCSP to C and D2, and D2 will go on to
use it's ocsp to report it again to C.  

This means that the central server will get two copies of all
reports, making a mess of the max_check_attempts and hard / soft
logic.  I propose that an external command be added that
controls whether or not the OCSP is enabled. I realize that
the OCSP command could have a kill switch in it, but since many
of the other failover properties are controlled within Nagios it
seems proper that this be included as well.

I also propose that when the global 'perform service checks'
option is disabled that the forced checks from freshness
checking also be disabled. This way the 'idle' instance is truly
idle other than updating its table of status from the incoming
passive checks.

Thanks,
-Jason Martin
-- 
This message is PGP/MIME signed.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 211 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20040811/437873c4/attachment.sig>


More information about the Developers mailing list