Forcing a Service State to Recovery

Victor Lanza vicjalan at gmail.com
Tue May 13 21:19:32 CEST 2008


Thank you and Jay for your responses. I think that both of you are focusing on the check_logfiles add-on which seems far too complicated for me to try to accomplish.

Maybe I can do it with an event-handler? So that it works like this:

Nagios finds an error and goes into a critical state and triggers the event handler to disable future checks (as this service is volatile) but still keep notifications enabled. This way the state remains critical and notifications keep going out until a user's action. 

The user can then acknowledge the error and manually enable active checks on the service (via the web interface) which in turn will go into an OK state (if no further errors are found of course).




-----Original Message-----
From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of Jim.Melin at co.hennepin.mn.us
Sent: Tuesday, May 13, 2008 2:51 PM
To: nagios-users at lists.sourceforge.net; Jay R. Ashworth
Subject: Re: [Nagios-users] Forcing a Service State to Recovery

Here's my 2 cents worth.

nagios-users-bounces at lists.sourceforge.net wrote on 05/13/2008 12:33:56 PM:

> On Tue, May 13, 2008 at 01:20:26PM -0400, Victor Lanza wrote:
> >    Basically what I have going on is, using the check_logfiles
> add-on I check
> >    several application logs for specific errors, however if an error is
> >    detected the service will go into a critical state and then on following
> >    check (if no errors are found) will issue an “OK” state.
> What I want is for
> >    Nagios to only issue a critical state and remain in this state.This will
> >    force someone to actually look at problem and not wait for it to
> >    automatically send a recovery alert. Once the log has been investigated,
> >    then the user should force the OK state. I’ve looked into the“Disable
> >    active checks of this service” as well as “Acknowledge this service
> >    problem” but I find that these 2 do not satisfy completely
> what I’m looking
> >    for.
>
> The problem you're really complaining about, as near as I can rephrase
> it, is that check_logfiles is level sensitive, and you want it to be
> edge sensitive -- instead of "logfile contains this string" being an
> error condition, you want "logfile just got this string added to it" to
> be a red trap, and for managers to be able to send the green trap
> manually.
>
> I think this is a failing in the design of c_l, or at least it not
> being designed for what you want -- which seems a reasonable thing to
> want -- but you could also view it as Nagios being iffy on handling
> trap type notices, as well, I suspect.

perhaps a modified service check that has a list of error strings in a table (read from a config file) and if you match that, cut a record into an
error state file along with an expiration time stamp that is more reasonable. So the service check file would have to look for unexpired error state
records, and if it finds any, generate a critical alert until the expiration is there. Still checking the log file for the transient condition but
adding a state preservation that meets your needs. (this provided that the source for check log is available)

Just an idea.Disclaimer: Information in this message or an attachment may be government data and thereby subject to the Minnesota Government Data Practices Act, Minnesota Statutes, Chapter 13, may be subject to attorney-client or work product privilege, may be confidential, privileged, proprietary, or otherwise protected, and the unauthorized review, copying, retransmission, or other use or disclosure of the information is strictly prohibited. If you are not the intended recipient of this message, please immediately notify the sender of the transmission error and then promptly delete this message from your computer system.   
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list