Interesting problem while trying to monitor Oracle RAC services

Kumar, Ashish xml.devel at gmail.com
Mon Mar 30 09:50:46 CEST 2009


Hello,

We are facing an interesting but strange issue while trying to monitor
Oracle RAC services.

Oracle RAC is running on AIX 5.3 and nagios is running on Fedora Core 9.

The scripts we are using to monitor Oracle RAC services on AIX are as follows

-------------------------
$ cat check_oracle_services.sh

#!/usr/bin/ksh
# found on the Internet
RSC_KEY=$1

/oracle/crs_home/bin/crs_stat -u | awk \
        'BEGIN { FS="="; state = 0; } \
        $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1}; \
        state == 0 {next;} \
        $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} \
        $1~/STATE/ && state == 2 {appstate = $2; state=3;} \
        state == 3 {printf "%-45s %-18s\n", appname, appstate; state=0;}'
-------------------------

$ cat check_oracle_services.pl

#!/usr/bin/env perl

use strict;
use Getopt::Std;

my %return_value = (
        OK => 0,
        CRIT => 2,
        UNKNOWN => 3
);

my $message = "nagios";
my $exit_status;

my %opt=();
getopts("p:h", \%opt);

sub usage(){
        print "Usage: $0 -p service_name\n";
        exit $return_value{'UNKNOWN'};
}

usage() if defined $opt{'h'};

my $SERVICE = $opt{'p'} if defined $opt{'p'} || usage();

# the following code was added to make sure that nrpe was not getting confused
# with dotted argument
if ($SERVICE =~ "foo") {
        $SERVICE = "ora.foo.bar.inst";
}

my $PIPED = qx/ ksh check_oracle_services.sh $SERVICE/;
print $PIPED;

if ($PIPED =~ /OFFLINE/g) {
        $exit_status = $return_value{'CRIT'};
        $message = "Critical: $SERVICE is not running.";
} else {
        $exit_status = $return_value{'OK'};
        $message = "OK: $SERVICE is running.";
}

print "$message\n";
exit $exit_status;
-------------------------

When we try to run this script on AIX (local system) the output is as follows:

[srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p foo
ora.foo.bar.inst                     OFFLINE
Critical: ora.foo.bar.inst is not running.

[srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p
ora.foo.bar.inst
ora.foo.bar.inst                     OFFLINE
Critical: ora.foo.bar.inst is not running.

The service indeed is offline

[srv01@/home/nagios/nrpe/libexec]$ perl check_oracle_services.pl -p
ora.foodb.bardb1.inst
ora.foodb.bardb1.inst                         ONLINE on srv01
OK: ora.foodb.bardb1.inst is running.


Now when we try to run the same thing from nagios server it shows the
services are online even if they are not

[root at nagios libexec]# ./check_nrpe -n -H 10.0.10.20 -c
check_oracle_services -a ora.foo.bar.inst
OK: ora.foo.bar.inst is running.

[root at nagios libexec]# ./check_nrpe -n -H 10.0.10.20 -c
check_oracle_services -a foo
OK: ora.foo.bar.inst is running.

This is strange that we get the correct status when scripts are
executed locally but wrong status when the scripts are executed
remotely.

Has anyone faced a similar issue?  I would appreciate if someone could
give some insights on this.

Thanks

------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list