Drill Down Facility in APAN

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Apr 24 11:50:58 CEST 2003


Dear Sir,

I am writing to thank you for your letter and say,

On Thu, Apr 24, 2003 at 08:45:09PM +1200, Jamie Baddeley wrote:

  .. snip
   
> There's shitloads of rrd front-end's out there. Cricket, MRTG NRG etc etc.
> see here:
> http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/rrdworld/index.html
> 
> Why are we creating another one?
> 

Yep. Moreover, the front-ends are in general aimed at graphing rather
than exception reporting whereas Nag is aimed at exception detecting.

Also the front-ends already do two significant things 

1 Collect data efficiently 

2 Set up the RRDs without intervention

that any Nag infrastructure would have to either crib or redevelop.

> Smokeping already does what Atul was asking about. RRD is backend, of course 
> you can do this. Store the additional the RRD files and zoom. simple. It what 
> my system does.

Up to the upper limit of the RRD resolution - related to how many
observations/samples you have, but fundamentally yes.

> 
> I can't understand why we are screwing around with front-ends when the data 
> that nagios needs to make a decision on whether the threshold is being 
> breached is held in the RRD files that a multitude of packages already look 
> after.....
> 

  .. snip

> 
> All that needs to be done is a plugin that reads local RRD files.  
> 

I think so too. I have done this in two trivial examples (by way of
encouragement)

1 A plugin that reads the FAILURES RRA from a dev branch RRD (that with
the time series prediction) and reports CRITICAL if the last sample is a
1 ... (ie the Holt-Winters prediction +- 2 * DEVPREDICT is still less
than the observation: the measurement is an aberration) 

2 A plugin that computes the differences in observations and reports
CRITCAL if all are zero (this is to detect that a producer process has
stopped).

Since there are Perl and Python (probably Ruby also) bindings to the RRD
libraries, this is pretty easy.

Here's the guts of the first one 

use RRDs ;
use utils qw($TIMEOUT %ERRORS &print_revision &support &usage);

my $PROGNAME = 'check_coms' ;

Getopt::Long::Configure('bundling', 'no_ignore_case') ;
GetOptions
        ("V|version"    => \&version,
        "h|help"        => \&help,
        "r|rrd_file:s"  => \$rrd,
        "s|start:s"     => \$start,
        "d|debug"       => \$debug,
) ;

use constant RRD        => '/home/anwsmh/perl/rrd/hwpredict/coms.rrd' ;
use constant START      => 'now -1 hour' ;
use constant RRA_SUCC   => 'AVERAGE' ;
use constant RRA_FAIL   => 'FAILURES' ;
use constant GRAPH      => '<a
href=http://pc09011/cgi-bin/cg2?RRD_NAME=coms&INT=-1h>graph</a>' ;

my @rra_fail = () ;
my $fetch_ok = &from_rrd($rrd, $start, RRA_FAIL, \@rra_fail) ;
&outahere('UNKNOWN', 'COMS cannot be checked. ', [ 'RRDs::fetch failed
with error "', @rra_fail, '"'  ]) unless $fetch_ok ;

print "HW predicted Failures\n" and &dump( @rra_fail ) if $debug

# &from_rrd returns 
# $rra_x[$i]->[0] $rra_x[$i]->[1]
# 1029576900        74.0
# 1029577200         0.0

foreach ( @rra_fail ) {
  push @fail, $_->[1] ;
}

&outahere('OK', 'Ok.', [ ($delta_s == 0 ? "Nothing processed
successfully in last $observed_int minutes." :
                          $delta_s == 1 ? "$delta_s success $last_succ
minutes ago." : "$delta_s successes $last_succ minutes ago."),
                          "Deltas: (" . join(' ', reverse
@delta_s) . ') or Holt-Winters forecast', GRAPH ])
        if $fail[-1] == 0 ;

# HW predicts failure. Is it because the predictions have failed to
# converge after a restart ?
# In this case, @succ may look like (2000, 0, 0, 1, 1, 2)
#               @delta_s [reversed] (1, 0, 1, 0, -2000)


&outahere('CRITICAL',   'Failed. No restart but HW forecast
violations.', [ $delta_s, ($delta_s == 1 ? 'success' : 'successes'),
"$last_succ minutes ago.",
                              "$observed_int minute deltas: (" . join('
', reverse @delta_s) . ') or Holt-Winters forecast', GRAPH ]) ;


In this case, the plugin also presents differences (to convince the
contact) and a link to a (rrdcgi) graph that shows the output.

> It seems simple to me. But that may be because I'm crap at coding, and better 
> at hacking

> 
> All that needs to be done is a plugin that reads .rrd files.
>

Nag developers can add value by considering how this can be done
efficiently for large numbers of RRDs.

Perhaps having a Nag add-on process _all_ the RRDs periodically (perhaps
anything found in a path) and submit passive results for anomalies the
add-on detects.

Yours sincerely.

-- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list