Distributed Monitoring - Redundancy

Steve Shipway s.shipway at auckland.ac.nz
Sun Jun 25 23:58:17 CEST 2006


> > I'm running Nagios is a distributed environment which is working very 
> > well. I would like to add a little redundancy to the 
> picture now that I have everything working. ;-)
...
> > It seems that a secondary "cold spare" might be the best solution.
> > Then there are maintenance issues with keeping software up to data, 
> > etc.
> No - look at linux HA (heartbeat) and drbd.
> > So many problems, so little beer.
> The linux HA/drdb setup is well understood and quite easy.

We use linux-HA here to have a redundant setup of two servers.  In fact, we
are running our Nagios on one and our MRTG on the other, and they both
provide failover for each other.  They both pass between each other a set of
virtual IPs, services, disks and filesystems.  Works very well, and is very
reliable.  I uses the v1.x linux HA (trather than the newer feature-rich
v2.x) as we only have a 2-machine failover cluster and simplicity makes
things easier.

We have an external SCSI disk pack connected to two adaptec serveRAID cards
(these helpfully have locking capabilities for just this setup).  There are
two LUNs on the external pack passed between the servers.

Heartbeat goes via serial cable, crossover network cable, and the main
network.  

For people who are really paranoid, I also have a little linux-ha plugin
which uses a tiny raw partition on the disk to effect an additional lock
before mounting the filesystem.

In a failover situation, we lose only about 30 seconds and everything is
fine.  Nagios (since it uses text files) is very stable - however, I also
run mysql on the Nagios server to hold archives and summarised logs, and
this passes back and forth with no difficulty as well.

If anyone would like detailed instructions, please contact me directly.

Steve



Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list