check_snmp_load.pl best linux practices

Robert Eden rmeden at gmail.com
Fri Mar 11 18:52:36 CET 2011


If I write the extension, I will certainly commit it to the nagios-snmp 
SF project for inclusion.

Frank, feel free to beat me to the punch (since yours is already running)

Robert

On 3/11/2011 10:13 AM, Joe Beck wrote:
> Frank,
>
> This looks like a great addition to my core alerting.
> Any chance you can share the details of your setup--
> Did you make these updates to check_snmp_load.pl&  do something similar to
> Robert?
>>> I'm getting good results by using the NETSL option to report load averages.
>>> I'm setting '-c 99,4,10' to basically ignore the 1 minute value and alarm
>>> on 5 and 15 minutes.
> Thx,
> Joe
>
>
> On 3/9/11 9:00 PM, "frank"<ratty at they.org>  wrote:
>
>> On my installation I added code to the SNMP load check to count the CPU
>> cores via SNMP and set WARN to 1.25*cores and CRIT to 1.5*cores (for
>> any/all load values). Seems to be working ok. Haven't had any complaints
>> from the NOC for excessive alerting.
>>
>> -f
>>
>> On Wed, 9 Mar 2011, Robert Eden wrote:
>>
>>> Date: Wed, 09 Mar 2011 14:33:13 -0600
>>> From: Robert Eden<rmeden at gmail.com>
>>> Reply-To: Nagios Users List<nagios-users at lists.sourceforge.net>
>>> To: nagios-users at lists.sourceforge.net
>>> Subject: [Nagios-users] check_snmp_load.pl best linux practices
>>>
>>> I'm currently experimenting with using check_snmp_load.pl to alarm on system
>>> overload.
>>>
>>> Monitoring CPU usage is giving me a lot of false alarms due to their
>>> instantaneous nature.
>>>
>>> I'm getting good results by using the NETSL option to report load averages.
>>> I'm setting '-c 99,4,10' to basically ignore the 1 minute value and alarm
>>> on 5 and 15 minutes.
>>>
>>> Unfortunately, unlike the CPU percentages,  the load numbers should be based
>>> on the number of processors.  The NETSL option doesn't do that.
>>>
>>> One option is to have a series of service commands based on the number of
>>> processors, but  I'm considering writing a new mode that will using the
>>> "STAND" option to get the number of CPUs and then use that as a
>>> multiplication factor for alarms.
>>>
>>> Does that make sense?   Surely others have run into this problem.  How do you
>>> alarm on excessive load w/o causing lots of false alarms.
>>>
>>> Robert
>>>
>>>
>>>
>>>
>>>
>>>
> ----------------------------------------------------------------------------->>
> -
>>> Colocation vs. Managed Hosting
>>> A question and answer guide to determining the best fit
>>> for your organization - today and in the future.
>>> http://p.sf.net/sfu/internap-sfd2d
>>> _______________________________________________
>>> Nagios-users mailing list
>>> Nagios-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>> ::: Please include Nagios version, plugin version (-v) and OS when reporting
>>> any issue.
>>> ::: Messages without supporting info will risk being sent to /dev/null
>>>
>> ------------------------------------------------------------------------------
>> Colocation vs. Managed Hosting
>> A question and answer guide to determining the best fit
>> for your organization - today and in the future.
>> http://p.sf.net/sfu/internap-sfd2d
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when reporting
>> any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
> Joe


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list