can nagios take some pro-active actions?

Menard, Chris Chris.Menard at Aspect.com
Thu Sep 3 16:20:17 CEST 2009


We use event_handlers EXACTLY as you describe. Let nagios restart service immediately and THEN figure out why it stopped.

We all agree that root cause analysis is important....but often secondary to restoring service.


-----Original Message-----
From: Leonardo Carneiro [mailto:lscarneiro at veltrac.com.br] 
Sent: Thursday, September 03, 2009 10:04 AM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] can nagios take some pro-active actions?

Tks to everyone. Let me explain the situation. The service in question 
is a software developed by my own company. This service "consumes" files 
in a defined directory, generated by other program. This is the metric i 
use to monitor.

Like any software in constant development, it will eventualy crash or 
freeze. Doing so, the files on the directory end up accumulating. If the 
number of files cross the threshold, the warn or crit flag is set up.

We DO check why the service stoped, but the service must be up and 
running as fast as possible, so this is why we restart the service. 
Later we can check what is going wrong.

I also made, some months ago, a simple bash script that monitors the # 
of files, restart the service if necessary and logs this kind of event.

What i do not know if this is the best aproach. Nagios gives me the 
visual tools to se in real time in a big panel if everything is OK with 
my servers. So i though if it can take proactives actions and if this 
aproach would be better than my simple scripts.

dave stern - e-mail.pluribus.unum escreveu:
> Ok, everyone agrees event handler can take action to fix a problem but bear in
> mind that this comes with caveats. Affectively, nagios event handler is treating
> a symptom; the disease goes merely on its way. If a service stops, WHY did
> it stop in the first place? Most good sysadmins would tackle the problem from
> the system end to insure that the service would never fail again. Furthermore,
> let's say a service failed for a reason, eg out of disk space. What
> good what it
> do to restart the service again? And if you build smarts into the
> event handler to
> look for and fix such a condition, is that the ONLY condition that could occur
> to stop this service?
>
> Having said all this, event handlers do have their place. We in fact use them
> to shut down hosts if the temperature gets too hot. You can imagine the
> testing we went through before rolling out something like this.
>
>
>
> On Thu, Sep 3, 2009 at 7:44 AM, Leonardo
> Carneiro<lscarneiro at veltrac.com.br> wrote:
>   
>> hello everyone.
>>
>> Started to play with Nagios a few days ago and i'm very excited with it.
>> I have a very small setup (2 linux server being monitored via npre by a
>> third linux server) and i'd wrote some bash scripts to monitor some of
>> the services that we run on those services (proprietary services,
>> non-standard ones like ssh, apache and that stuff).
>>
>> I know Nagios can send sms, email and other things to warn
>> administrators about problems, but can Nagios take any action to fix the
>> problem, like restart the service if reach critical state, or restart
>> the service if the service stays critical for more than 5 minutes?
>>
>> If yes, can someone just point me to the direction i should go? :)
>>
>> Tks in advance, and sorry about my poor english. I'm from Brazil.
>> --
>>
>> *Leonardo de Souza Carneiro*
>> *Veltrac - Tecnologia em Logística.*
>> lscarneiro at veltrac.com.br <mailto:lscarneiro at veltrac.com.br>
>> http://www.veltrac.com.br <http://www.veltrac.com.br/>
>> /Fone Com.: (43)2105-5601/
>> /Av. Higienópolis 1601 Ed. Eurocenter Sl. 803/
>> /Londrina- PR/
>> /Cep: 86015-010/
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>> trial. Simplify your report design, integration and deployment - and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>>     
>
>   

-- 

*Leonardo de Souza Carneiro*
*Veltrac - Tecnologia em Logística.*
lscarneiro at veltrac.com.br <mailto:lscarneiro at veltrac.com.br>
http://www.veltrac.com.br <http://www.veltrac.com.br/>
/Fone Com.: (43)2105-5601/
/Av. Higienópolis 1601 Ed. Eurocenter Sl. 803/
/Londrina- PR/
/Cep: 86015-010/

	


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list