What kind of checks/minute numbers are you getting for single host / non-distributed setups?

Ryan Bowlby rbowlby83 at yahoo.com
Sat Aug 29 05:31:50 CEST 2009


--- On Fri, 8/28/09, Max <perldork at webwizarddesign.com> wrote:

> From: Max <perldork at webwizarddesign.com>
> Subject: [Nagios-users] What kind of checks/minute numbers are you getting for single host / non-distributed setups?
> To: "Nagios Users mailinglist" <nagios-users at lists.sourceforge.net>
> Date: Friday, August 28, 2009, 12:07 AM
> Just curious, we are starting to move
> to a distributed setup because
> we appear to be maxing out our current HW.
> 
> Nagios 3.0.3
> Dual quad-core Compaq server, 16 GB RAM, SCSI disks.
> 
> We have one server that does trap receiving and polling ..
> * Notification requests are sent off to a second machine
> * Trap MySQL records (SNMPTT) sent to a second host
> * PNP data sent to a second host using modpnpsender
> * All Nagios temp directories, config directories, the
> main
> nagios.log, retention.dat, objects.cache, and plugins
> reside on RAM
> disks.
> * 80% SNMP checks with ePN scripts, 15% NRPE checks, 5%
> other
> 
> We get about 2000 checks/minute avg (8500 active checks in
> 4 minutes).
> 
> Anyone who is willing to post their numbers I would really
> be
> interested in hearing your performance numbers for a
> non-distributed
> setup.
> 
> I am about to enter another week of attempts at tuning our
> configuration until we get our distributed setup set up :p
> as our
> latency is starting to rise to unacceptable limits every
> 12-16 hours
> or so after a restart.
> 
> Thanks,
> Max


Max,

Those are impressive numbers for a single Nagios instance. You may be able to tweak out some additional time but you leave the Nagios daemon little room for leeway. What I mean is if two dozen hosts start reporting critical and Nagios starts performing checks at the more aggressive retry_check_interval instead of the normal check_interval, then your check latency is going to go through the roof.

That being said here are some ideas that you may already be trying, but if not may by you some time.

- switch from check_ping to check_icmp as it's 9x faster in some instances. 

- If any of the client-side nrpe checks are perl, python, etc you may see a decrease in check-time by compiling them. Same for the Nagios server if you aren't already (built-in perl, etc).

- Often NRPE checks such as those monitoring hardware don't need to be performed as often as say a check_tcp, but since people use templates NRPE frequently gets configured with the same aggressive check_interval as other checks. Scaling back on these will greatly increase the amount of checks the server can do.

At my work we have 4 remote Nagios instances performing approximately 9400+ checks to our Central Nagios server via nsca. This leaves room for a 400% increase in checks as more departments begin utilizing the monitoring system. Our configs are built by a custom script from our custom dbase and pushed out to the servers via a custom script that keeps everything in cvs. It all works great but took forever to configure. If I had to do it again I would take a serious look at two other options:

http://dnx.sourceforge.net/ - Crap ton of checks ONE nagios instance!

http://www.opsview.org/ - Multiple Nagios instances without writing a slew of custom scripts to do it!

Hope something in there is helpful.

-Ryan


      

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list