can nagios take some pro-active actions?

Leonardo Carneiro lscarneiro at veltrac.com.br
Thu Sep 3 16:26:46 CEST 2009


Yeah, i understand that exists some situations that a event handler 
can't effectively fix something, but reading the documention link you 
guys send me, it turns out that this is EXACTLY what i'm looking for. 
check some times, restart, check again, if still down, notify the admin 
somehow.

Thanks again to everyone for your support.

Menard, Chris escreveu:
> We use event_handlers EXACTLY as you describe. Let nagios restart service immediately and THEN figure out why it stopped.
>
> We all agree that root cause analysis is important....but often secondary to restoring service.
>
>
> -----Original Message-----
> From: Leonardo Carneiro [mailto:lscarneiro at veltrac.com.br] 
> Sent: Thursday, September 03, 2009 10:04 AM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] can nagios take some pro-active actions?
>
> Tks to everyone. Let me explain the situation. The service in question 
> is a software developed by my own company. This service "consumes" files 
> in a defined directory, generated by other program. This is the metric i 
> use to monitor.
>
> Like any software in constant development, it will eventualy crash or 
> freeze. Doing so, the files on the directory end up accumulating. If the 
> number of files cross the threshold, the warn or crit flag is set up.
>
> We DO check why the service stoped, but the service must be up and 
> running as fast as possible, so this is why we restart the service. 
> Later we can check what is going wrong.
>
> I also made, some months ago, a simple bash script that monitors the # 
> of files, restart the service if necessary and logs this kind of event.
>
> What i do not know if this is the best aproach. Nagios gives me the 
> visual tools to se in real time in a big panel if everything is OK with 
> my servers. So i though if it can take proactives actions and if this 
> aproach would be better than my simple scripts.
>
> dave stern - e-mail.pluribus.unum escreveu:
>   
>> Ok, everyone agrees event handler can take action to fix a problem but bear in
>> mind that this comes with caveats. Affectively, nagios event handler is treating
>> a symptom; the disease goes merely on its way. If a service stops, WHY did
>> it stop in the first place? Most good sysadmins would tackle the problem from
>> the system end to insure that the service would never fail again. Furthermore,
>> let's say a service failed for a reason, eg out of disk space. What
>> good what it
>> do to restart the service again? And if you build smarts into the
>> event handler to
>> look for and fix such a condition, is that the ONLY condition that could occur
>> to stop this service?
>>
>> Having said all this, event handlers do have their place. We in fact use them
>> to shut down hosts if the temperature gets too hot. You can imagine the
>> testing we went through before rolling out something like this.
>>
>>
>>
>> On Thu, Sep 3, 2009 at 7:44 AM, Leonardo
>> Carneiro<lscarneiro at veltrac.com.br> wrote:
>>   
>>     
>>> hello everyone.
>>>
>>> Started to play with Nagios a few days ago and i'm very excited with it.
>>> I have a very small setup (2 linux server being monitored via npre by a
>>> third linux server) and i'd wrote some bash scripts to monitor some of
>>> the services that we run on those services (proprietary services,
>>> non-standard ones like ssh, apache and that stuff).
>>>
>>> I know Nagios can send sms, email and other things to warn
>>> administrators about problems, but can Nagios take any action to fix the
>>> problem, like restart the service if reach critical state, or restart
>>> the service if the service stays critical for more than 5 minutes?
>>>
>>> If yes, can someone just point me to the direction i should go? :)
>>>
>>> Tks in advance, and sorry about my poor english. I'm from Brazil.
>>> --
>>>
>>> *Leonardo de Souza Carneiro*
>>> *Veltrac - Tecnologia em Logística.*
>>> lscarneiro at veltrac.com.br <mailto:lscarneiro at veltrac.com.br>
>>> http://www.veltrac.com.br <http://www.veltrac.com.br/>
>>> /Fone Com.: (43)2105-5601/
>>> /Av. Higienópolis 1601 Ed. Eurocenter Sl. 803/
>>> /Londrina- PR/
>>> /Cep: 86015-010/
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
>>> trial. Simplify your report design, integration and deployment - and focus on
>>> what you do best, core application coding. Discover what's new with
>>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>>> _______________________________________________
>>> Nagios-users mailing list
>>> Nagios-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>>> ::: Messages without supporting info will risk being sent to /dev/null
>>>
>>>     
>>>       
>>   
>>     
>
>   

-- 

*Leonardo de Souza Carneiro*
*Veltrac - Tecnologia em Logística.*
lscarneiro at veltrac.com.br <mailto:lscarneiro at veltrac.com.br>
http://www.veltrac.com.br <http://www.veltrac.com.br/>
/Fone Com.: (43)2105-5601/
/Av. Higienópolis 1601 Ed. Eurocenter Sl. 803/
/Londrina- PR/
/Cep: 86015-010/

	


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list