Extended info from nrpe

Rusch, Daniel Daniel.Rusch at GlobalCrossing.com
Fri Oct 18 15:52:21 CEST 2002


All,

We have developed a method of getting extended info back from nrpe without
hacking (although it was tempting) any of the nagios or nagios plugin code.
I'll warn you now, it ain't pretty but it works.

We use it to get the output of things like: ps, top, metastat and others.  

Synopsis:

On the host running Nagios (local host) we have a cgi script called
remote_info.cgi
On the host running the nrpe daemon (remote host) we have a script called
get_info

The remote_info.cgi (ok we added a link to the side bar) runs check_nrpe
normally, for example:

/usr/local/nagios/libexec/check_nrpe somehost.somewhere.com -p 5666 -c
get_ps

The nrpe daemon on the remote host runs the get_ps command which is defined
as follows:

command[get_ps]=/usr/local/nagios/libexec/get_info -c 'bin/ps -few'

Therefore when the nrpe daemon is asked to run "get_ps" it actually runs
get_info which in turn will run ps.

Why have the output of ps (or top, metastat or whatever) run through the
get_info script, because the nrpe daemon returns up to 1k of data or until
the first new line character (it uses an fgets and the code limits it to
1k). That's why the output of the nrpe daemon is limited to one line.  

So, get_info runs ps (or top, metastat or whatever) and stores the output of
ps in memory.  Then get_info replaces the new line characters with ~'s and
then gets the size of the output. If output is less than 1k, get_info
appends and EOF to the output, then it prints it to STDOUT and then exits.
The remote_info.cgi script receives the output, finds the EOF, htmlizes it,
prints it and exits.

Here's the tricky part, if the output from the process is greater than 1k,
get_info writes the output to a file (text1), then prints the first 1k
(first appending text1 to the end) to STDOUT remembering where it is in the
file by writing it's position to a file (text1_marker) and then exits.  The
remote_info.cgi script receives the output, does not find an EOF,  so it
executes a get_text1 command:

/usr/local/nagios/libexec/check_nrpe somehost.somewhere.com -p 5666 -c
get_text1

defined as follows in the nrpe.cfg on the remote:

command[get_text1]=/usr/local/nagios/get_info -t text1

get_info then opens the marker file (text1_marker) reads in the position it
left off at, then it opens the text file (text1) seeks to the position and
reads in the data.  If it's less than 1k it sends it with an EOF appended,
if not it repeats the above process until all the data is sent.  Once all
the data is received by remote_info.cgi it htmlizes it, prints it and exits.

I did warn you that it wasn't pretty. But, it does work very well, without
hacking nrpe and with out compromising security.  Why do all this? Why not
simply ssh to the box etc.  The idea is that anyone can go to the nagios
site and have a suite of diagnostic tools available to them.  


Dan

P.S. If you are interested in the code for remote_info.cgi and get_info let
me know and I'll email it to you.







 

Sincerely,

Daniel G. Rusch
 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf




More information about the Users mailing list