init script must wait for nagios to exit (w/patch)

John Sellens jsellens at syonex.com
Tue May 31 23:52:44 CEST 2005


There's a race condition of sorts in daemon-init.in that can result
in multiple nagios daemons running.

We have a nagios machine that gets a little slow sometimes, and
we were seeing dupicate copies of nagios running (which of course
made the slowness problem worse).  We're using nagmin for config,
which restarts nagios (on our redhat box) by running
    /etc/init.d/nagios restart
which is very much like
    /etc/init.d/nagios stop
    /etc/init.d/nagios start

My conclusion is that the likely cause of the duplicate nagios daemons
was this:
   - busy machine
   - /etc/init.d/nagios stop
     - which kill -TERMs nagios, and then immdiately removes the pid lock file
   - /etc/init.d/nagios start
     - new nagios daemon creates new pid lock file
   - original nagios daemon finishes reaping and syncing, and rounds
     the event loop, removes the new pid lock file and exits

So the next "/etc/init.d/nagios restart" didn't know nagios was already
running (since the newest lock file was removed by the exiting nagios),
and so another daemon gets started.

The (or at least a) fix is to make sure that the exiting nagios daemon
has a chance to clean up after itself.  Suggested patch against CVS
head below.

Cheers, and thanks!

John



*** daemon-init.in.old	Tue May 31 17:18:39 2005
--- daemon-init.in	Tue May 31 17:44:25 2005
***************
*** 131,136 ****
--- 131,156 ----
  	stop)
  		echo "Stopping network monitor: nagios"
  		killproc_nagios nagios
+ 		# now we have to wait for nagios to exit and remove its
+ 		# own NagiosRunFile, otherwise a following "start" could
+ 		# happen, and then the exiting nagios will remove the
+ 		# new NagiosRunFile, allowing multiple nagios daemons
+ 		# to (sooner or later) run
+ 		echo -n 'Waiting for nagios to exit .'
+ 		for i in 1 2 3 4 5 6 7 8 9 10 ; do
+ 		    if status_nagios > /dev/null; then
+ 			echo -n ' .'
+ 			sleep 1
+ 		    else
+ 			break
+ 		    fi
+ 		done
+ 		if status_nagios > /dev/null; then
+ 		    echo ''
+ 		    echo 'Warning - running nagios did not exit in time'
+ 		else
+ 		    echo ' done.'
+ 		fi
  		rm -f $NagiosStatusFile $NagiosTempFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
  		;;
  



-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005




More information about the Developers mailing list