Nagios kept from restarting after reboot by lockfile

eric.berg at barclayscapital.com eric.berg at barclayscapital.com
Tue Dec 21 20:57:49 CET 2010


Good stuff, Dan.  I was not aware of the differences between how the reboot and shutdown commands handle the reboot process.

Turns out that we're doing a reboot -f, which explains why I have orphaned PID files laying around.

I'm going to make the call right now that to fight the fight to have 'reboot -f' changed to the plays-more-nicely-with-others "shutdown -r" is already lost and I'm going to work around that in code.

Thanks for helping clarify this.  

It's weird....when I run nagios and kill it with -9, it leaves the pid file in tact, but when I restart it, it zero's out the pid file and starts just fine.  when I just kill it with the default kill signal, it removes the pid file.

In any case, I now know what the issues are and how to address this.  Thanks again very much for you help, guys.  You are a feature of Nagios.

Eric

> -----Original Message-----
> From: Daniel Wittenberg [mailto:daniel.wittenberg.r0ko at statefarm.com] 
> Sent: Tuesday, December 21, 2010 9:23 AM
> To: Nagios Users List
> Subject: Re: [Nagios-users] Nagios kept from restarting after 
> reboot by lockfile
> 
> So are you using the actual "reboot" command not "shutdown -r 
> now" which
> is a little friendlier?  The standard nagios shutdown script 
> should take
> care of cleaning those up for you.  Otherwise putting something like:
> rm -f <lockfile>; service nagios start
> in your rc.local would take care of it.  But when you mention 
> pid file,
> are you saying the PID file is still there, or the lock file?  Since
> they are different things.  Again though, if nagios it 
> shutdown properly
> you shouldn't be seeing that.
> 
> Dan
> 
> -----Original Message-----
> From: eric.berg at barclayscapital.com
> [mailto:eric.berg at barclayscapital.com] 
> Sent: Monday, December 20, 2010 6:59 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Nagios kept from restarting after rebootby
> lockfile
> 
> We reboot all of our hosts on a weekly basis.  I used to 
> price myself in
> keeping my boxes up as long as possible, but having spent years now
> supporting mission-critical financial production applications, I'm on
> board with the weekly reboots.  Lets you know early if some system or
> app change is problematic.
> 
> Reboot is being done via a standard reboot command.  
> 
> I've looked around for rc scripts that might address this issue, but
> haven't found any.  Got any pointers?
> 
> Regarding the rc.local solution, a) I'd prefer to solve the 
> problem, not
> just address the symptoms, and b) elsewhere in this thread I've
> described the roadblocks that we have to doing anything a 
> system level.
> Yep, that's right, boys, we survive in the app developer layer within
> which we do not have root on these boxes.  It's a tedious,
> time-consuming, frustrating, productivity-killing endeavor to do just
> about anything you can't do yourself.
> 
> So....got any sample RC scripts, or command line params to nagios to
> make it smart enough to know that the PID that is in it's PID 
> file isn't
> an active process?
> 
> Thanks.
> 
> Eric
> 
> > -----Original Message-----
> > From: Daniel Wittenberg 
> [mailto:daniel.wittenberg.r0ko at statefarm.com] 
> > Sent: Monday, December 20, 2010 11:56 AM
> > To: Nagios Users List
> > Subject: Re: [Nagios-users] Nagios kept from restarting after 
> > reboot by lockfile
> > 
> > Couple questions
> > 1)  Why do you have to reboot your monitoring server weekly?
> > 2) How is the reboot being done?
> > 
> > Reason I ask 2) is because the standard rc script will remove the
> > lockfile when nagios is told to stop.  So if you are having 
> > this problem
> > is sounds like you are not doing a clean shutdown and 
> > something could be
> > wrong.
> > 
> > Either way, I guess worst case one way to check for this 
> would be put
> > something like this in your /etc/rc.d/rc.local:
> > rm -f /var/lock/subsys/nagios
> > 
> > Assuming that's where your lockfile is. 
> > 
> > Dan
> > 
> > 
> > -----Original Message-----
> > From: eric.berg at barclayscapital.com
> > [mailto:eric.berg at barclayscapital.com] 
> > Sent: Monday, December 20, 2010 10:16 AM
> > To: eric.berg at barclayscapital.com; 
> nagios-users at lists.sourceforge.net
> > Subject: Re: [Nagios-users] Nagios kept from restarting after 
> > reboot by
> > lockfile
> > 
> > Alternatively, could you recommend a good system/resource monitoring
> > tool that would be able to let me know if nagios is down and 
> > restart it
> > automatically?
> > 
> > _____________________________________________
> > From:   Berg, Eric: IT (NYK)
> > Sent:   Monday, December 20, 2010 11:03 AM
> > To:     'nagios-users at lists.sourceforge.net'
> > Subject:        Nagios kept from restarting after reboot by 
> lock file
> > 
> > Gee, this seems like an annoying newbie problem, but if 
> Nagios crashes
> > or is killed (as on system reboot), it leaves a lock file 
> around that
> > prevents it from starting again until the lock file is 
> > manually removed.
> > 
> > I see this on Monday mornings after weekend reboots on a 
> Red Hat Linux
> > box:
> > 
> > nagios: Lockfile '/home/nagios/nagios/var/nagios.lock' 
> looks like its
> > already held by another instance of Nagios (PID 0).  Bailing out...
> > 
> > Does anyone know if there's a config option or something else that
> > obviates the need to write a wrapper scropt to check to see 
> > if Nagios is
> > really running and remove the lock file (look slike Nagios 
> > already knows
> > it's not running by virtue of the value of the PID inthis 
> > very message!)
> > so that it can cleanly start up again?
> > 
> > Thanks.
> > 
> > Eric
> > 
> > _______________________________________________
> > 
> > This e-mail may contain information that is confidential, 
> > privileged or
> > otherwise protected from disclosure. If you are not an intended
> > recipient of this e-mail, do not duplicate or redistribute it by any
> > means. Please delete it and any attachments and notify the 
> sender that
> > you have received it in error. Unless specifically indicated, this
> > e-mail is not an offer to buy or sell or a solicitation to 
> buy or sell
> > any securities, investment products or other financial product or
> > service, an official confirmation of any transaction, or an official
> > statement of Barclays. Any views or opinions presented are 
> > solely those
> > of the author and do not necessarily represent those of 
> Barclays. This
> > e-mail is subject to terms available at the following link:
> > www.barcap.com/emaildisclaimer. By messaging with Barclays 
> you consent
> > to the foregoing.  Barclays Capital is the investment 
> banking division
> > of Barclays Bank PLC, a company registered in England 
> (number 1026167)
> > with its registered offic
> >  e at 1 Churchill Place, London, E14 5HP.  This email may 
> relate to or
> > be sent from other members of the Barclays Group.
> > _______________________________________________
> > 
> > --------------------------------------------------------------
> > ----------
> > ------
> > Lotusphere 2011
> > Register now for Lotusphere 2011 and learn how
> > to connect the dots, take your collaborative environment
> > to the next level, and enter the era of Social Business.
> > http://p.sf.net/sfu/lotusphere-d2d
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when
> > reporting any issue. 
> > ::: Messages without supporting info will risk being sent 
> to /dev/null
> > 
> > --------------------------------------------------------------
> > ----------------
> > Lotusphere 2011
> > Register now for Lotusphere 2011 and learn how
> > to connect the dots, take your collaborative environment
> > to the next level, and enter the era of Social Business.
> > http://p.sf.net/sfu/lotusphere-d2d
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS 
> > when reporting any issue. 
> > ::: Messages without supporting info will risk being sent 
> to /dev/null
> > 
> --------------------------------------------------------------
> ----------
> ------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> --------------------------------------------------------------
> ----------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS 
> when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 
------------------------------------------------------------------------------
Forrester recently released a report on the Return on Investment (ROI) of
Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
within 7 months.  Over 3 million businesses have gone Google with Google Apps:
an online email calendar, and document program that's accessible from your 
browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list