RFC embedded Perl Nagios changes: usability and performance.

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Jan 1 03:54:11 CET 2004


Dear Ladies and Gentlemen,

I am writing to request your comments on proposed changes to the
embedded Perl interpreter support in Nagios (ePN).

This letter has two sections: part A is general discussion and the
proposals; B concerns testing.

Part A Discussion and Proposals.

The ePN support contributed by Mr Stephen Davies embeds a Perl
interpreter into the Nagios binary and (with much cleverness)  allows
Perl plugins to be compiled only once before they are run (as well as
letting the Plugin call the Perl exit without trashing the
interpreter; saving the output the plugin writes to STDOUT and other
good things)

ePN provides these benefits to Perl plugins and Nagios

1 Perl plugins are not subject to the OS forking the Nagios process;
Perl plugins are called as Nagios functions

While Nagios forks to execute _each_ plugin, an ePN Nagios does not
request another fork (in the popen() system call) to run the plugin.

2 Perl plugins are compiled only once, saving both an exec of the Perl
interpreter and the Perl compilation phase (to the Perl op-code parse
tree) each time a Perl plugin is run. Since the compilation phase may
include loading Perl modules required by the plugin (some of which are
huge), there is quite a saving or work if not execution time (since Perl
is pretty fast) in single compilation.

There are also ePN tradeoffs such as markedly increased memory
consumption (the Perl parse trees remain in memory).

The ePN implementation that has performed well in both Netsaint
0.0.[4-7] and Nagios 1.[0-1] and it seems to be remains unchanged in the
HEAD CVS branch (the Perl calls in the head checks.c seem to be the same
as those in 1.x, and p1.pl appears unchanged), could be enhanced in
these areas

1 Performance

Three observations concerned with performance are that

1.1 The plugin output is returned to Nagios through the file system
(rather than as an extra element of a list returned by
Embed::Persistent::run_plugin).

Instead of STDOUT being tied to the file system it could be tied to an
in memory data structure (probably scalar) and the value the plugin
'writes' as output returned as an extra element on the Perl stack.

This I think is a straight forward enhancement that saves Nagios system
calls to generate a temporary file name, open the file, read the line of
plugin output and unlink the file.

Unfortunately, I don't know why STDOUT was tied to a file: it doesn't
seem to have any debugging advantages because the contents are either
logged by Nag (and the file unlinked) or there are _no_ contents. 

1.2 The ePN author comments specifically on the separation of the parse
and execution phases

'
# Only major changes are to separate the compiling and cacheing from 
# the execution so that the cache can be kept in "non-volatile" parent
# process while the execution is done from "volatile" child processes
'

Unfortunately, I dnn't understand this (presumably the 'processes' are
the eval_file and run_package subroutines in the Embed::Persistent
package. However, while eval_file is only concerned whether to parse the
plugin, the data structure that it uses to avoid reparsing unchanged
plugins that have already been compiled (%Cache) is an entry in the
package symbol table: it is visible to both subroutines).

It _may_ be better to replace the two Nagios calls to Perl (one for
eval_file and the other for run_package) by one call to a new Perl
subroutine that optionally compiles and runs the plugin.

This change is extensive; I have no plans to do so immediately if at
all.

1.3 p1.pl uses IO::File to open the file to which plugin output is
sent. IO::File is a big module (according to Lincoln Stein) and in this
case where it is only being used to return a file handle glob, it seems
overkill.

Unfortunately, replacing it by a normal two argument Perl open produces
a wierd failure in _all_ the Perl plugins.

In view of 1.1 this doesn't seem worth worrying about.

2 Usability

Coding plugins to succeed under ePN requires more Perl nouse and
experience than without ePN: the same plugin can run from the command
line but fail with ePN.

The ePN support can be enhanced to provide information useful to the
plugin developer and to avoid spurious CRITICAL states (caused by
plugin mistakes) by 

2.1 Logging Perl warnings and compile time errors

2.2 Logging (in the Nagios log file) a clear indication that the plugin
has failed and at the same time returning UNKNOWN instead of CRITICAL
when a plugin cannot be executed.

Here is an extract from a test Nagios nagios.log running with ePN
support that provides such logging

[1072870185] SERVICE ALERT: oradev;AUB;UNKNOWN;SOFT;1;**ePN plugin
runtime error: Can't locate object method "new" via package
"Nagios::WebTransact" at (eval 1) line 79 in plugin 'check_aub'.

[1072870245] SERVICE ALERT: oradev;AUB;UNKNOWN;SOFT;2;**ePN plugin
runtime error: Can't locate object method "new" via package
"Nagios::WebTransact" at (eval 1) line 79 in plugin 'check_aub'.

[1072870275] SERVICE ALERT: oradev;bad_plugin;UNKNOWN;SOFT;1;**ePN
plugin 'ap5' has syntax errors. Check ePN log.

[1072870305] SERVICE ALERT: oradev;AUB;UNKNOWN;HARD;3;**ePN plugin
runtime error: Can't locate object method "new" via package
"Nagios::WebTransact" at (eval 1) line 79 in plugin 'check_aub'.

[1072870335] SERVICE ALERT: oradev;bad_plugin;UNKNOWN;SOFT;2;**ePN
plugin 'ap5' has syntax errors. Check ePN log.

[1072870395] SERVICE ALERT: oradev;bad_plugin;UNKNOWN;HARD;3;**ePN
plugin 'ap5' has syntax errors. Check ePN log.

These errors signify that

1 the plugin named check_aub would compile but had a fatal run-time
error 

(
Hardly suprising since it was hacked for this purpose

tsitc> diff -c ../libexec/check_aub
/usr/local/nagios/libexec/check_aub
*** ../libexec/check_aub        Wed Dec 31 12:20:33 2003
--- /usr/local/nagios/libexec/check_aub Sat Jun 21 11:15:51 2003
***************
*** 30,36 ****
  
  use Getopt::Long;
  
! # use Nagios::WebTransact ;
  use utils qw($TIMEOUT %ERRORS &print_revision &support);
  
  my $PROGNAME = 'check_aub' ;
--- 30,36 ----
  
  use Getopt::Long;
  
! use Nagios::WebTransact ;
  use utils qw($TIMEOUT %ERRORS &print_revision &support);
  
  my $PROGNAME = 'check_aub' ;
tsitc> 
)

2 The plugin named 'ap5' failed to compile under the ePN

Here is the corresponding entry in the new ePN log of plugin syntax
errors.

tsitc> tail -30 epn.log 

**ePN plugin syntax error: Global symbol "$i" requires explicit package
name at (eval 3) line 5.
 in package Embed::Persistent file
/home/anwsmh/nagios-1.0_test-debug/bin/p1.pl at line 144 in text
"
                package main;
                use subs 'CORE::GLOBAL::exit';
                sub CORE::GLOBAL::exit { die "ExitTrap: $_[0]
(Embed::ap5)"; }
                package Embed::ap5; sub hndlr { shift(@_);
@ARGV=@_;
#!/usr/bin/perl -w

use strict ;

$i = 0 ;

while ($_ = shift @ARGV) {
  print "\$ARGV\[$i\]: $_ " ;           # NB embedded Perl only reads
__1__ (one) line of output !
  $i++ ;
}
; }
                
;".
tsitc> 

This shows the complete plugin listing _as it is executed by ePN (the
original plugin text is wrapped as a subroutine with the exit method
overridden); the line number reported as the line containing the error
(5) is wrt to the _original_ plugin text.

The usability changes are

1 comprised of patches to p1.pl solely

2 change the exit status of a plugin with a run time error from CRITICAL
to UNKNOWN (this was obviously a minor mistake in the original p1.pl)

3 change the message logged for a plugin with a run time error from 

(No output!)

to

**ePN plugin runtime error: Can't locate object method "new" via package
"Nagios::WebTransact" at (eval 1) line 79 in plugin 'check_aub'.

and 

4 change the the message logged for a plugin with a syntax error from

(No output!)

to

*ePN plugin 'ap5' has syntax errors. Check ePN log.

5 add an ad-hoc log to p1.pl to record plugin syntax errors. At the
moment this log file is specified by a hard coded string in p1.pl that
re-opens STDERR in append mode to that file. The CPAN Sys::Syslog module
could be be used to allow syslogd to rotate and archive but I think this
an unacceptably memory tradeoff.

B Testing

I have patched p1.pl (the version I think is 1.2 from both 1.x and 2.x
CVS branches) to implement the usability changes above (and will also
do so for the first of the performance changes [don't use file system to
return plugin output]) and used it successfully on _one_ FreeBSD system
(system Perl 5.005_03) for

1 My production Nag (200 hosts/400 services/"if it's not pinged it's
Perl'd"), only for about 24 hours at this stage.

2 A test Nag (same host/Perl - tiny config, hacked plugins, different
paths) for adhoc testing.

3 ePN simulator (same host/Perl)

I will try on at least a Linux/threaded Perl system before asking for
testers.

Since these changes have the potential to 

. creep into the Nagios C code (almost certainly will. NB I think the
changes are confined to checks.c but any part of the code that contains
#ifdef EMBEDDEDPERL potentially needs changing)

. create havoc (see 1.3 under Performance)

I welcome any comments and particuarly those about testing.

At this stage my plan is to

. allow the usability changes more time to misbehave (as well as trying
them on a Linux test system - mini config)

. try the more extensive performance change (1.1) as above - since this
is the only one of the changes that really is helpful to Nagios ePN
installations.

. Invite testers - preferably from experienced sysadmins/entropy
removalists

. submit patches.

Yours sincerely. 


-- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click




More information about the Developers mailing list