Nagios and Cacti

Max perldork at webwizarddesign.com
Thu Apr 9 17:31:54 CEST 2009


Hi Chris, Daniel,

I write about a number of the configuration decisions we made in order
to achieve our current level of performance on my blog:

http://www.semintelligent.com/blog/?q=Nagios

Please note that a number of configuration steps we have done go
against what the Nagios documentation recommends, so if you wish to do
anything similar to what we have done, make sure you understand the
Nagios documentation and understand the risks of violating the
recommendations in it.

We have done a lot of custom development to help make implementing
SNMP-based checks across a large number of hosts easier for us:
1)  We develop agent-specific checks (we currently use Net-SNMP and
SysEdge, starting to do Cisco monitoring) using perl that run clean
under ePN.  These groups of checks are associated with host groups
specific to each agent type (e.g. net-snmp-host).
2)  We create a custom base template for each agent type.  The
template has custom attributess that associate SNMP version, community
string etc with the host template.  We also use custom attributes in
each agent-specific check (e.g. CPU), so that all thresholds are
defined at the host level and we can provide default thresholds.  For
example

define host {
    name net_snmp_host
  hostgroups +net_snmp_hosts
   __snmp_version 2c
   __snmp_community myreadonlycommunity
    __snmp_port    161
    __snmp_version 2c
    __snmp_storage_partitions all
    __snmp_storage_warn 90
    __snmp_storage_crit 95
    __snmp_la_warn 15:10:5
    __snmp_la_crit 30:20:10
    __snmp_mem_warn free,lt,8
    __snmp_mem_crit free,lt,5
    __snmp_swap_warn 50
    __snmp_swap_crit 65
    __snmp_cpu_warn wait,gt,20
    __snmp_cpu_crit wait,gt,30
  ...
  register 0
}

for custom communities we create separate templates, e.g.

define host {
    name southwest-region-host
   hostgroups +southwest-hosts
  __snmp_community southWestRegionCommunity
}

so now our end users can easily tell Nagios to poll their hosts with
SNMP and they can override our thresholds if they want at the host
level without having to know a thing about programming:

define host {
   use generic-host, net_snmp_host, southwest-region-host
   #  Override CPU default thresholds
   __snmp_cpu_warn wait,gt,40
   ...
}

3)  We have developed, and hope to release sometime this year, a
perl-based, ePN friendly SNMP check script that handles counters and
gauges well, it lets you check multple SNMP OIDs at once.  This has
been extremely useful for custom SNMP application agents .. a service
definition ends up looking like this:

define service {
    use check_snmp_oids-base
    service_description    Custom App  - 5 minute SNMP checks
    __snmp_oids_spec -O 'TimeMin:g:1.3.6.1.4.1.1900.5.5.2.2.1.0' \
                  -O 'labelFor1sttOid:g:1.3.6.1.4.1.9999.1.3.0' \
                  -O 'labelFor2ndOid:g:1.3.6.1.4.1.9999.1.4.0' \
                  -O 'labelFor3rdtOid:g:1.3.6.1.4.1.9999.1.5.0'\
     __snmp_oids_crit_spec labelFor1stoid,lt,0
    hostgroup_name   custom-agent-group
    servicegroups    custom-service-group
}

In some cases we check 15-20 OIDs at once using this methodology.
Our script uses memcached to cache counter data to get delta output
properly and we have code that adjusts data properly for over samples,
under samples, and large deltas.

Many of our checks are based off of the code I wrote that can be
downloaded here:

http://www.nagios3book.com/nagios-3-enm/checks/

Though we have significantly enhanced things.

So, a lot of development time up front but the end result is we get
terrific performance and a lot of flexibility.  We are using Nagios to
replace $$$ COTS products, so our company is happy to have us spend
time doing custom development.  I realize many of you do not have that
luxury so I understand that this won't be ideal for many of you.
sorry.

Development time with two people to get to where we are now - about 3-4 months.

We have permission to release a lot of the code we have done, just
need time to package it properly for a public release .. so hopefully
we can share some of  our tools and help others do something similar
without the 3-4 months development time :p.

hope this helps more than it confuses.

- Max

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list