nrpe and NetBackup Plugin

Andreas Ericsson ae at op5.se
Thu Oct 20 18:37:44 CEST 2005


Tao Yaoning wrote:
> the check_nb_queque works its output looks like "OK - Queue is of normal
> size [0]"
> the check_nb_jukebox works too, output looks like "OK - all drives are up."
> 
> but the check_nb_errs still doesn't work error message looks like
> "CHECK_NRPE: Error receiving data from daemon."

What does the syslog file say on the server where nrpe is running?

Presumably you have two servers involved in this problem.
Let's call one of the servers NAGIOS and the other NRPE. The server NRPE 
is the one that's running the NRPE daemon (the one you want to fetch 
data *FROM*). The server NAGIOS is the one running the NAGIOS daemon 
which calls the check_nrpe program (the one you want to fetch data *TO*).

Here's what I want you to do:
On the NRPE server (not, I repeat *NOT*, on the NAGIOS server) I want 
you to run the command *exactly* as it is specified in the nrpe 
configuration file while logged in as the user the nrpe daemon runs as.

You can do this by running this command if you have sudo installed, are 
logged in as root (which I assume is what you normally log in as....) 
and the nrpe configuration file is called /etc/nrpe.cfg

eval `sed -n /^nrpe_user=/p` /etc/nrpe.cfg;
sudo -u $nrpe_user `sed -n s/command.check_nb_errs.=//p` >/dev/null

If you didn't get any output there, you won't get any output in Nagios 
either.


> The permission has no problem. I can run this command as nagios on local
> machine,

Please don't use terms like "local machine". To me, the "local machine" 
is my laptop. I have absolutely no idea which server you're talking 
about when you say "local machine". Since you're mentioning the nagios 
user I'll assume you're running this on the NAGIOS server, but that 
can't be right because you said earlier that that didn't work.

> and get output like "ERRORS: (59 total) 1.) bptm has error-level
> general error: cannot count up drives, device manager daemon (ltid) may not
> be running. 2.) bptm has error-level general error: cannot count up drives,
> device manager daemon (ltid) may not be running. 3.) bptm has error-level
> general error: cannot count up drives, device manager daemon (ltid) may not
> be running. 4.) bptm has error-level general error: cannot count up drives,
> device manager daemon (ltid) may not be running. 5.) bptm has error-level
> general error: cannot count up drives, device manager daemon (ltid) may not
> be running. 6.) bptm has error-level general error: cannot count up drives,
> device manager daemon (ltid) may not be running.


This does definitely seem like stderr output to me. NRPE only reads 
output on stdout.

> .............................."
> 
> the script for output is
> if (defined(@errors)) {
> if ($critcount) { $status = CRITICAL; }
> elsif ($warncount) { $status = WARNING; }
> else { $status = UNKNOWN; }
> # print "NETBACKUP ERRORS: (" . ( $critcount + $warncount ) . " total) ";
> print "ERRORS: (" . ( $critcount + $warncount ) . " total) ";
> my $counta = 0;
> foreach my $errorline (@errors) {
> $counta++;
> print "$counta.) $errorline ";
> }
> print "\n";
> } else {
> # print "No Netbackup errors found.\n";
> print "OK: No Netbackup errors found.\n";
> }
> 

This can't possibly be the entire script and is as such completely 
worthless for debugging purposes. It would also help if you attached it 
as a file rather than paste it inline, since your mail program seems to 
do funny things with the indentation.


> I get debug from strace
> 
> munmap(0xb7fff000, 4096) = 0
> write(3, "\27\3\1\0 }\22\302\252\250\7!%\251Xs+\253\361dh \232\266"...,
> 1114) = 1114
> read(3, "sh: l", 5) = 5
> write(3, "\25h:\0 \304\374\20\240\370\23wK]s\241\232\300\347W\270"..., 37) =
> 37
> alarm(0) = 10
> write(3, "\25h:\0 #\347\366\313\257\32\2714\327D\17\16 \2l\4D/3)"..., 37) =
> 37
> close(3) = 0
> fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0xb7fff000
> write(1, "CHECK_NRPE: Error receiving data"..., 46CHECK_NRPE: Error
> receiving data from daemon.
> ) = 46
> munmap(0xb7fff000, 4096) = 0
> exit_group(3) = ?
> 

This is an strace from the check_nrpe program. You're (still) looking at 
the entirely wrong end of the problem here since your other checks seems 
to work just fine.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list