Nagios - High Availability

Steve Shipway s.shipway at auckland.ac.nz
Thu Jul 14 06:00:14 CEST 2005


I'm just at the final stages of implementing out High Availability
Nagios/MRTG setup.  If you are interested in how it was achieved, then read
on.

We have two identical servers with ServeRAID-6M SCSI cards (important to use
this type of SCSI card) connected to the same bus of an external SCSI disk
unit.  They each have two network interfaces, one on the LAN and one joined
by a crossover cable.

The two servers have Linux RedHat AS4 with Linux-HA installed.  The
crossover ethernet and LAN are used for heartbeat.

I have defined a resource group for Nagios, consisting of a virtual IP, the
SCSI RAID logical volume, the filesystem, an instance of Apache, and Nagios.
For Nagios, I defined a special resource.d file which changes settings in
the xined.d/nsca file and starts/stops Nagios.

All Nagios files are installed on the /u02 filesystem, which resides on the
external disk  unit.  This is identified by a disk label and NOT defined in
fstab.

For MRTG, a similar setup is used on a separate RAID LV, ip address, and
filesystem /u01.  It adds and removes crontab entries to enable and disable
the MRTG data collection.

One server is the primary for MRTG, and the other for Nagios.  If one goes
down, then the other can take over within 2 minutes, enough to have constant
monitoring coverage although the web interface experiences a slight outage
during the changeover.

At the moment, this is working very well.  There were a couple of places
where error messages needed to be binned if a machine was running cron or
xinetd entries for a system not currently mounted, but it was minor.  The
web servers httpd.cfg files had to be sure to keep all data on the HA
filesystems, and to use different filenames in /var, particularly for the
/var/run/http.pid

The ServeRAID card helpfully locks the other card out of accessing the
disks, so we cannot get both machines mounting the filesystem at once.  This
obviously protects data integrity.

All SNMP and firewall rulesets are set to allow access from both machine's
static IP addresses.  Any nsca processes send to the Nagios virtual IP, and
web access is to the virtual Ips (the web servers bind to just the virtual
IP address, so we can have two separate instances)

If anyone would like more details on how to set up a similar system, and a
copy of the haresources.d/nagios script, then please contact me.

Steve

---
Steve Shipway: ITSS, University of Auckland
Email: s.shipway at auckland.ac.nz  Web: http://www.steveshipway.org/  
** We can only discover new oceans when we have the **
** courage to lose sight of the shore.              **
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Steve Shipway.vcf
Type: text/x-vcard
Size: 154 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050714/63469056/attachment.vcf>


More information about the Users mailing list