Generic check result manipulations (percentages, max(), min(), etc)

Nathanael Hoyle nhoyle at hoyletech.com
Wed Jan 9 17:49:54 CET 2008


Hey all,

Hope this hasn't been asked and answered to death, but I've read through 
forums and quite of bit of the mail archives and can't find prior 
discussion.  I am trying to monitor several Dell PowerEdge servers for a 
variety of availability criteria, including things like average 
processor load and percentage of disk space used.  This is trivial using 
the check_nt plugin and the nsclient (or another API compatible 
monitoring agent), which I am well aware of.  In fact, I've tested that 
just fine without difficulty on the box I'm prototyping on.  The issue 
is that the production servers I'll be monitoring are government 
interest, and there is substantial overhead for accrediting any new 
software (particularly a persistent process accepting connections) to be 
installed on the machines.  I'd rather not try to fight to get nsclient 
accredited.

One of the nice things about the PowerEdge servers is that they have 
fairly advanced backplane status monitoring and provide a host of 
information via snmp.  I have configured and tested things like 
obtaining the processor load values via snmp with:

define service{
    name                                cpu1-load
    use                                 generic-service
    service_description                 CPU 1 Load
    hostgroup                           poweredge2850-servers
    check_command                       check_snmp!-C removed -o 
HOST-RESOURCES-MIB::hrProcessorLoad.1 -w 0:80 -c 0:95
    notification_options                c
    first_notification_delay            10
}

There are several hosts, so these are set up against a hostgroup, etc.  
There are four processors in each machine; the relevant availability 
metric is more the average processor load across all four processors 
than it is the load of any one processor.  What I want is a way to 
capture the average of these four values and test that result against 
various threshold criteria.  Something like an avg() macro that allowed 
me to pass multiple checks within it.

Similarly, the disk drive configurations are slightly different amongst 
the various hosts, but there are more hosts than I want to calculate and 
hand-specify warning/critical thresholds based on used space for against 
their varying total space.  The ability to do something like 
percent(<snmp check for used space>, <snmp check for total space>) and 
check that against the thresholds would be an ideal solution which 
generically supported all configurations.

Again, I realize that checking for percentage of free disk space is 
available with nsclient, with local disk checking, and with remote ssh 
checks.  ssh is not an option either in this case (performance and 
security concerns).  It seems to me however, that the need/desire to 
calculate these type of values based on component values is more broadly 
applicable and could be useful in areas outside my somewhat unusual 
needs.  So my question is... is there some built-into-the-config-file 
syntax I'm missing to calculate this stuff?  Would I have to extend the 
snmp plugin?  Could a plugin generically wrap other plugin results to do 
this...  in other words, what is likely to be the least-pain method of 
being able to do this?  Ideally, I'd hope that the result would not be 
plugin-specific, i.e. need to be implemented for check_snmp and any 
other plugin needed.

I'd be happy to hear what ideas folks have (if it's already out there, 
great!).

Thanks,
Nathanael Hoyle

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list