distributed monitoring - slave server not that intelligent

Andreas Ericsson ae at op5.se
Fri Feb 15 09:52:35 CET 2008


mark redding wrote:
> Hi all,
> 
> I currently have Nagios 2.10 installed on a couple of machines, one of
> which is configured as a master and the other as a slave.
> 
> I have a script running on the slave which rsync's up the configs from
> the master and performs health checks of the master to see that it is
> running (and if it is not then it enables service checks/notifications
> on the slave until such time as it detects that the master is back up
> and running). I also use nsca to pass passive checks to the slave to
> ensure that it has up to date information about services. The slave
> does not perform any active service checks, nor are notifications
> enabled unless the master is down.
> 
> I do however still have one problem and that is that the slave has no
> way of knowing when we're ack'ed a critical, scheduled downtime,
> disabled/enabled notfications/event handlers/checks for a service/host
> on the master. What this means is that if we schedule downtime on a
> host, then the master goes down, the slave starts bitching about the
> host that is down (because it does not know that it's in downtime). A
> similar problem occurs if we disable an event handler on the master,
> because unless the slave also knows to disable the event handler it
> will fire it (regardless of whether or not it is active) as soon as
> the passive check result returns a critical.
> 
> At present I am getting round this by tailing the nagios log file
> through a perl script that looks for specific 'EXTERNAL COMMAND'
> entries and then flushes those through to the slave by ssh'ing to the
> slave and writing the command string to the nagios pipe file on the
> slave.
> 
> Is there a better way of doing this ?
> 

You might get lucky using the attached NEB-module. It's not well
documented, and it's not very well tested. It will do what you're
after though. Contact me off-list if you run into problems. I've
been looking for someone to test this for quite some time now, so
I'll be happy to help.

It's written to make the two servers loadbalanced, so the slave
and the master will help each other out doing checks and then
transmit them to one another. External commands are also copied
from one to the other, so scheduled/cancelled downtime etc will
instantly show up on both servers as soon as its parsed in one.

If you don't want the host/service check syncing you'll have to
either get clever with the config or manually hack that out of
the module.

Like I said; Feel free to contact me off-list if you're having
any problems with it.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mrm-0.1.tar.gz
Type: application/x-gzip
Size: 27970 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080215/40369ab9/attachment.bin>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list