Dynamic warning/critical thresholds

Justin T Pryzby justinp at norchemlab.com
Fri Jun 22 18:17:13 CEST 2012


I do something similar in some places, but I think they're all custom
checks.  For example, the number of DNS queries per second in BIND
query.log.  Rather than set some (static) threshold, I warn if one
host has more than 2x the queries of the next ranked host. 

If it were me, I would write a shell wrapper around the existing
nagios check to determine the dynamic thresholds, then exec the stock
plugin (perhaps with a longer check_interval since it will be somewhat
more "expensive").

For DNS, it might look like (untested):
f=/var/cache/bind/query.log
x=`awk '{print $1,$2; quit}' "$f"`
t0=`date -d "@x" +%s`
x=`tail "$f" |awk '{print $1,$2; quit}' "$f"`
t1=`date -d "@x" +%s`
d=$(($t1-$t2))
n=`wc -l <"$f"`
r=$(($n/$d))
[ "$r" -eq 0 ] && echo "OK: very low $n/$d" && exit

exec /usr/lib/nagios/plugins/check_???? -w $((2*$r)) -c $((4*$r)) -f "$f" -t 60

That is kind of contrived; I don't know if there's actually a check
for DNS query rate (?)  The "exec"ed check above is doing essentially
nothing that the shell isn't already doing, so you could also write
code to test the query rate for just the most recent N minutes, and
test if it is above 2*$r or 4*$r.

Justin

On Fri, Jun 22, 2012 at 03:11:05PM +0100, Jonathan Gazeley wrote:
> I've got a bunch of Nagios plugins that monitor things like 
> DNS/HTTP/RADIUS hits per second.
> 
> I've set what I believe to be sensible max/min warning thresholds but 
> what I really want is dynamic thresholds. If some quantity suddenly 
> doubles or halves, I'd like an alert.
> 
> For example, if I usually serve 10 DNS lookups per second, and suddenly 
> it is doing 20 per second, that isn't a "fault" but I would like to know 
> about it, because it might mean there is a problem with the network in 
> general.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list