Alerting based on past-to-current trends?

Ian Ehrenwald iehrenwald at tripadvisor.com
Mon Dec 6 20:02:46 CET 2010


Hello
I was wondering if there was a straight-forward way to alert based on an average of past data plus a current perfdata entry.  I understand I'm not explaining it very well that way, so here is the real-world example I am working with -

I am polling a set of machines via SNMP for CPU load every 1 minute (looking at hrProcessorLoad).  If the return value is at or above 95%, send out a WARNING.  If the return value is 98% or above, send out a CRITICAL.  The problem here is that it's OK for a process to take up 100% CPU for multiple seconds, and sometimes that high CPU usage coincides with the SNMP %CPU query, so I get a lot of false alerts.

Is there a way to use past perfdata in conjunction with the current returned data to generate an average and send a WARNING or CRITICAL based on that new number?  I only care to get alerted from Nagios if, for example, the %CPU has been at 100% for 5 minutes.  Or am I just way over-thinking this and should be monitoring 1m, 5m, 15m UNIX load averages (which doesn't seem that accurate anyway)?  What are other people doing to monitor CPU usage and alert on abnormal long periods of utilization?

Thanks for your help.

				Ian Ehrenwald


------------------------------------------------------------------------------
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list