Using two nagios servers...

Chris Beattie cbeattie at geninfo.com
Fri Oct 15 21:23:00 CEST 2010


Wow, I completely forgot that I’d responded to this.  This is what I do.  If you use this script, you’ll want to change the notification e-mail address, where it will send notifications when the failover server decides it needs to take over and when it decides to yield to the primary if the primary has come back online.

 

-------------------------------------------------

Failover Configuration

 

On the failover server, install the same OS the same way it's installed on the

primary monitoring server, but set Nagios to not start in runlevels 3 and 5 or

else the failover checking script will generate e-mail notifications when the

failover server is rebooted (Nagios will start before the failover server

notices it's running on the primary, and the message will come when the fail-

over server shuts the failover Nagios down).

 

On the failover server, generate a public/private key pair.  This is necessary in

order to avoid having to type in a password every time the state of the Nagios

process on the primary server is checked:

 

       # ssh-keygen -t rsa

 

Take the default name and location (/root/.ssh/id_rsa and id_rsa.pub).  Do not enter

a passphrase.

 

Copy id_rsa.pub to the primary server:

 

       # rsync -avzu id_rsa.pub primaryserverhostname:/root/.ssh/

 

On the primary server, append the id_rsa.pub to authorized_keys2:

 

       cat id_rsa.pub >> $HOME/.ssh/authorized_keys2

       chmod 0600 authorized_keys2

 

Download, compile, and install Nagios on the failover server the same way it's

installed on the primary server.

 

Create a script named nagios_check.sh in /root/:

-----------------------

 

       #!/bin/bash

       nagiospath='/usr/local/nagios'

       alertaddress='you at yourdomain'

       maxfaillimit='3'

 

       touch failed_nagios_checks

       failedchecks=$(cat failed_nagios_checks)

 

       if [[ -z "${1}" ]]

       then

              echo Usage: nagios_check hostname

              exit

       fi

 

       nagiosstatusnow=$(${nagiospath}/libexec/check_by_ssh -H ${1} --command='/usr/local/nagios/libexec/check_nagios --filename=/usr/local/nagios/var/status.dat --expires=1 --command=nagios')

       nagiosstatus="${nagiosstatusnow%%:*}"

       nagiosrunninglocally=$(/etc/init.d/nagios status)

 

       if [[ "${nagiosstatus}" = "NAGIOS OK" ]]

       then

              echo -ne "[`date`] ${nagiosstatus} on ${1}. "

              if [[ "${nagiosrunninglocally%% *}" = "nagios" ]]

              then

                     echo -e Nagios is currently running on the failover server, and needs to be stopped.

                     /etc/init.d/nagios stop

                     /usr/bin/printf "%b" "[`date`] Nagios recovery on ${1} detected.  Stopping failover Nagios.\n\n${nagiosstatusnow}" | /bin/mail -s "Nagios recovery on ${1}" ${alertaddress}

              fi

              echo -e "Failed ${failedchecks} checks: synchronizing files.  Status: ${nagiosstatusnow} "

              echo 0 > failed_nagios_checks

              rsync --quiet --archive --compress --delete-during --exclude=var/spool/checkresults/* --exclude=var/archives/* --exclude=*~ --exclude=nagios.lock --exclude=nagios.cmd ${1}:${nagiospath} /usr/local

       else

              failedchecks=$((${failedchecks} + 1))

              echo ${failedchecks} > failed_nagios_checks

              if [[ "${failedchecks}" -lt "${maxfaillimit}" ]]

              then

                     echo -e "[`date`] Uh-oh! Failed ${failedchecks} out of ${maxfaillimit} checks.  Status: ${nagiosstatusnow} "

              fi

              if [[ "${failedchecks}" -ge "${maxfaillimit}" ]]

              then

                     echo -ne "[`date`] ${nagiosstatus} on ${1}. "

                     if [[ "${nagiosrunninglocally%% *}" = "No" ]]

                     then

                           echo -e " Failed ${failedchecks} checks, and needs to be started on the failover server. "

                           /etc/init.d/nagios start

                           /usr/bin/printf "%b" "[`date`] Nagios on ${1} has failed ${failedchecks} checks.  Starting Nagios on failover server.\n\n${nagiosstatusnow}" | /bin/mail -s "Nagios failure on ${1}" ${alertaddress}

                     else

                           echo -e "Failed ${failedchecks} checks, but is already running on the failover server. "

                     fi

              fi

       fi

 

-----------------------

 

Make it executable by root:

 

       chmod u+x nagios_check.sh

 

Run crontab -e as root and add this line:

 

       * * * * * /root/nagios_check.sh primaryserverhostname >> /var/log/nagios_check.log 2>&1

 

The *s set it to run every minute.  The output is redirected to a log file, and the 2>&1 redirects both STDOUT and STDERR.

 

At the top of the every minute now, the failover server will obtain a current replica of the

primary server's Nagios status (comments, acknowlegements, downtime, configuration files, etc).

 

Add a file in /etc/logrotate.d called nagios_check:

-----------------------

 

       /var/log/nagios_check.log {

           weekly

           missingok

           notifempty

       }

 

-----------------------

 

From: quanta [mailto:quanta.linux at gmail.com] 
Sent: Wednesday, October 13, 2010 7:17 AM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Using two nagios servers...

 

Try something like this:

#!/bin/sh

RETURN_STATUS=`/usr/local/nagios/libexec/check_nrpe -H <primary_host> -c check_nagios | awk -F: '{ print $1 }' | awk '{ print $2 }'`
if [ $RETURN_STATUS != "OK" ]; then
    sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg
    sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg
else
    sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg
    sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg
fi
sudo /etc/init.d/nagios reload

Note: you must add nagios user to sudoers group (without password prompt).


On 08/16/2010 02:44 PM, ravishankar.gundlapali at wipro.com wrote: 

Hi,

 

Even I run Nagios on Virtual machines.

 

Please let me know where can I get the support for running cron job on my secondary Nagios server to monitor the Nagios service on primary Nagios server?

 

Thanks,

Ravi G

 

From: Chris Beattie [mailto:cbeattie at geninfo.com] 
Sent: Monday, August 16, 2010 6:51 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Using two nagios servers...

 

Your servers will probably be fine servicing the extra Nagios polling, unless they are overloaded already.

 

Since I run Nagios on virtual machines, however, I tried to keep the load on my failover Nagios server minimized.  My failover Nagios server runs a cron job that uses the check_nagios plugin to monitor the state of the primary Nagios server.  If the primary server is up and running, the failover server will just rsync the state and configuration files from the primary.  If the primary server becomes unavailable, the cron job will start the Nagios service on the failover server and keep it running until it detects the primary has recovered.

 

From: ravishankar.gundlapali at wipro.com [mailto:ravishankar.gundlapali at wipro.com] 
Sent: Monday, August 16, 2010 7:45 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Using two nagios servers...

 

Hi All,

 

I am planning to configure all the servers in my client environment in two Nagios servers(in two different locations) in order to create Back up.

 

Please let me know whether there will be any overload on the servers as two Nagios servers will be polling them.

 

 

Thanks,

Ravi G

 
 
------------------------------------------------------------------------------
This SF.net email is sponsored by 
 
Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
 
 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20101015/e1c7981f/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list