Forcing a Service State to Recovery

Jim.Melin at co.hennepin.mn.us Jim.Melin at co.hennepin.mn.us
Tue May 13 20:51:24 CEST 2008


Here's my 2 cents worth.

nagios-users-bounces at lists.sourceforge.net wrote on 05/13/2008 12:33:56 PM:

> On Tue, May 13, 2008 at 01:20:26PM -0400, Victor Lanza wrote:
> >    Basically what I have going on is, using the check_logfiles
> add-on I check
> >    several application logs for specific errors, however if an error is
> >    detected the service will go into a critical state and then on following
> >    check (if no errors are found) will issue an “OK” state.
> What I want is for
> >    Nagios to only issue a critical state and remain in this state.This will
> >    force someone to actually look at problem and not wait for it to
> >    automatically send a recovery alert. Once the log has been investigated,
> >    then the user should force the OK state. I’ve looked into the“Disable
> >    active checks of this service” as well as “Acknowledge this service
> >    problem” but I find that these 2 do not satisfy completely
> what I’m looking
> >    for.
>
> The problem you're really complaining about, as near as I can rephrase
> it, is that check_logfiles is level sensitive, and you want it to be
> edge sensitive -- instead of "logfile contains this string" being an
> error condition, you want "logfile just got this string added to it" to
> be a red trap, and for managers to be able to send the green trap
> manually.
>
> I think this is a failing in the design of c_l, or at least it not
> being designed for what you want -- which seems a reasonable thing to
> want -- but you could also view it as Nagios being iffy on handling
> trap type notices, as well, I suspect.

perhaps a modified service check that has a list of error strings in a table (read from a config file) and if you match that, cut a record into an
error state file along with an expiration time stamp that is more reasonable. So the service check file would have to look for unexpired error state
records, and if it finds any, generate a critical alert until the expiration is there. Still checking the log file for the transient condition but
adding a state preservation that meets your needs. (this provided that the source for check log is available)

Just an idea.Disclaimer: Information in this message or an attachment may be government data and thereby subject to the Minnesota Government Data Practices Act, Minnesota Statutes, Chapter 13, may be subject to attorney-client or work product privilege, may be confidential, privileged, proprietary, or otherwise protected, and the unauthorized review, copying, retransmission, or other use or disclosure of the information is strictly prohibited. If you are not the intended recipient of this message, please immediately notify the sender of the transmission error and then promptly delete this message from your computer system.   
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list