Alerting based on past-to-current trends?

Jim Avery jim at jimavery.me.uk
Fri Dec 10 17:26:44 CET 2010


On 6 December 2010 19:02, Ian Ehrenwald <iehrenwald at tripadvisor.com> wrote:
> Hello
> I was wondering if there was a straight-forward way to alert based on an average of past data plus a current perfdata entry.  I understand I'm not explaining it very well that way, so here is the real-world example I am working with -
>
> I am polling a set of machines via SNMP for CPU load every 1 minute (looking at hrProcessorLoad).  If the return value is at or above 95%, send out a WARNING.  If the return value is 98% or above, send out a CRITICAL.  The problem here is that it's OK for a process to take up 100% CPU for multiple seconds, and sometimes that high CPU usage coincides with the SNMP %CPU query, so I get a lot of false alerts.
>
> Is there a way to use past perfdata in conjunction with the current returned data to generate an average and send a WARNING or CRITICAL based on that new number?  I only care to get alerted from Nagios if, for example, the %CPU has been at 100% for 5 minutes.  Or am I just way over-thinking this and should be monitoring 1m, 5m, 15m UNIX load averages (which doesn't seem that accurate anyway)?  What are other people doing to monitor CPU usage and alert on abnormal long periods of utilization?


Nagios will alert as soon as the plugin returns a non-OK status.  You
can of course configure max_check_attempts and/or
first_notification_delay so that Nagios won't send a notification
until after a given time, but this won't stop it from appearing on on
the web page for problem services straight away.

It would be great if you could get Nagios to display only hard status
alerts - I don't think you can though, not with ordinary Nagios Core
anyway.  Some of the third-party Nagios front ends will do it, for
example you can configure the icons in NagVis only to display hard
alerts.

Cheers,

Jim

------------------------------------------------------------------------------
Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL,
new data types, scalar functions, improved concurrency, built-in packages, 
OCI, SQL*Plus, data movement tools, best practices and more.
http://p.sf.net/sfu/oracle-sfdev2dev 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list