nagios server redundancy

Mike Lindsey mike-nagios at 5dninja.net
Fri Feb 11 22:35:26 CET 2011


On 2/11/11 10:26 AM, Morty wrote:
> I'm looking to implement redundant nagios servers, with the backup
> server in a different location than the prime server.  This is nagios
> 3.2.3, with the default web interface.  I'm synchronizing
> configurations by rsyncing /usr/local/nagios/etc/ between systems.
> I'm doing active/active (i.e. I want the backup server monitoring at
> the same time as the prime server.)  So far so good.
>
> Problem: acknowledgements on the prime are not being synced to the
> backup.
>
> Is there a (clean) way to sync the prime's acknowledgements to the
> backup, as well?  I'm tempted to shut down the backup, rsync the
> prime's var directory to the backup, and then bring the backup back
> online.  But the docs have various warnings about not messing with the
> var files, so figured I'd ask about possible hidden gotchas.
>
> I've read http://nagios.sourceforge.net/docs/3_0/redundancy.html, but
> scenario one doesn't discuss syncing acknowledgements, and scenario 2
> is active/passive.
What I end up doing with my backup master is leave it off, with frequent 
rsyncs of both config and the status files in var.

Both the active master and the backup master are sitting behind a load 
balanced vip, with the nsca and http/https ports managed by the load 
balancer.  There's a cronjob running on the backup master that, if it 
determines an error on the active master, starts up nsca, nagios, and 
apache.  That causes the vip to fail over to the backup master, giving 
automatic recover with no more than five minutes of downtime (the 
frequency of the cronjob).

The active master does not have apache, nsca, or nagios configured to 
start on boot, instead those are also managed by a cronjob that does a 
check of the backup master.  If the backup master is running 
apache/nagios/nsca, then the active master doesn't start up (and if 
they're already running, say from an intermittent error, they shut down) 
and the rsyncs also don't happen.  This allows me to do automatic 
failover, and manual fail-back, after whatever issue triggered the 
failover has been verified and resolved.

You cannot - to the best of my knowledge - sync acknowledgments to a 
backup server while it's actively running, unless you want to write 
something that checks for new acks and dumps them into the command 
pipe.  So, if you want to maintain acks and downtime, you'll need to 
have your backup disabled for the syncs.

-- 
Mike Lindsey


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list