Alternate check interval when state become CRITICAL

Justin Pasher justinp at newmediagateway.com
Wed Feb 11 00:54:03 CET 2009


Thomas Guyot-Sionnest wrote:
>> Alrighty. I took the script above as the base and tweaked it to my
>> setup. The theory behind the code is working, but there is still one
>> caveat. When the service goes into a HARD CRITICAL state, the event
>> handler is called and it correctly sends the command to Nagios to update
>> the check interval. The problem is that when the command is sent to
>> Nagios, Nagios has already set the next scheduled check (which defaults
>> to five minutes out). This means the next service check still won't
>> happen for another five minutes. After the next check occurs, if the
>> service is still in a HARD CRITICAL state, the NEXT scheduled check will
>> follow the new check interval that was set by the event handler (one
>> minute). At that time, it will continue to perform checks at one minute
>> intervals until the service is normal again.
>>
>> Once the service is back to a normal state, the event handler is called
>> again, which send the command to Nagios to change the check interval
>> back to five minutes. However, like before, the next scheduled check has
>> already been set (one minute out), so the check happens again in one
>> minute. If the service is still up, it applies the check interval set by
>> the event handler.
>>
>> In the latter instance, it's not that big of a deal since it just causes
>> another check a little sooner than usual. However, in the first
>> instance, because the next scheduled check is still five minutes out the
>> first time around, it defeats the whole purpose of having the custom
>> event handler
>>
>> Do you know any way around this? I've attached the service info and
>> event handler for reference.
>>     
>
> Have you tried scheduling a check or forced check? I'm not 100% sure,
> but one of these commands might override the next scheduled check..
>
> See here for the nagios commands:
> http://www.nagios.org/developerinfo/externalcommands/
>   

I actually thought of that shortly after I sent the original message. 
When I did the force check from the CGI, I noticed the command in the 
Nagios log and it gave me that idea. I tried it out and it ends up doing 
exactly what I needed. Thanks for your help.

For the list, here is the updated version of the script
==============================
#!/usr/bin/perl
#

use strict;
use warnings;

# Fork to let Nagios keep on working...
if (fork != 0) {
    # Nobody cares if fork failed...
    warn("Daemonizing... Thanks for calling me.");
    exit(0);
}

die("Usage: $0 <hostname> <service desc> <state> <statetype> 
<stateattempt>") unless (@ARGV == 6);

my $commandfile     = '/var/lib/nagios3/rw/nagios.cmd';
my $hostname        = $ARGV[0];
my $servicedesc     = $ARGV[1];
my $state           = $ARGV[2];
my $statetype       = $ARGV[3];
my $stateattempt    = $ARGV[4];

# If state becomes HARD WARNING, change the check interval to something
# smaller so the check eventually gets back to OK.
if ($state eq 'CRITICAL' && $statetype eq 'HARD')
{
    open(CMD, ">>$commandfile");
    printf(CMD "[%lu] CHANGE_NORMAL_SVC_CHECK_INTERVAL;%s;%s;1\n", time, 
$hostname, $servicedesc);

    # the check_interval change above is applied AFTER the next 
scheduled check is already scheduled
    # (five minutes out). because of this, we force a check to occur in 
one minute. after this check
    # the NEXT scheduled check should use the newly applied check_interval
    printf(CMD "[%lu] SCHEDULE_FORCED_SVC_CHECK;%s;%s;%lu\n", time, 
$hostname, $servicedesc, time + 60);

    close(CMD);
    die("Check interval for $hostname set to 1 minute");
}

# If state becomes HARD OK, revert the check interval to yearly check in
# order to avoid flooding Nagios logs.
if ($state eq 'OK' && $statetype eq 'HARD')
{
    open(CMD, ">>$commandfile");
    printf(CMD "[%lu] CHANGE_NORMAL_SVC_CHECK_INTERVAL;%s;%s;5\n", time, 
$hostname, $servicedesc);
    printf(CMD "[%lu] SCHEDULE_FORCED_SVC_CHECK;%s;%s;%lu\n", time, 
$hostname, $servicedesc, time + 300);
    close(CMD);
    die("Check interval for $hostname set to 5 minutes");
}
==============================

Justin Pasher

------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list