Large scale network monitoring limits with nagios

Marc Powell marc at ena.com
Thu Mar 11 15:11:30 CET 2004


Noah Leaman <mailto:noah at mac.com> wrote:
> Hopes it's o.k. cross posting to both groups on this matter...
> 
> Using the concept of one service per up/down trap for each network
> interface, I tested a little by creating a very simple set of nagios
> configs, but with about 8000 PASSIVE service checks and no active
> service checks. of course there was no problem in terms of scheduling
> issues, but the CGIs all crawled to a snails pace. In my setup
> (nagios 1.2, Dual G4 first-gen xServe) it takes about 30 secs to
> display the Status Summary page.      
> 
> Of course that config setup isn't the actual production plan...
> 
> I enabled the closer to real-world configs:
> 
> 552 check_traffic (2 snmpgets running every 10 minutes per service
> check storing to an RRD) 295 check_ping (number of locally monitored
> hosts) 8389 check_dummy (mostly the up/down Trap and about 100 are
> passive services coming from 2 other distributed nagios servers doing
> pings and  
> check_traffics)
> 
> ... So 9236 services all together but this is really just a small
> subset of what I would like to be able to do. The plan is to through
> hardware at it to spread out the real work being done (i.e. the
> active checks).   
> 
> But with just this setup, a single CGI take up an entire CPU to run
> and for a few minutes a lot of the time... and the plan was to have a
> good handful of GUI users (5 ish at a time)... it's just about
> unusable with one GUI user.   
> 

I can't speak to your issue about how to handle 8000+ traps but from
personal experience, the Nagios 1.x cgis do indeed have difficulty with
large numbers of hosts. I have over 150 hostgroups defined with some of
them having over 1000 hosts and the hostgroup summary takes almost 5
minutes to generate on a quad p3 800. I've got a cron job that simply
dumps that to flat file every 5 minutes in the background. The great
news is that this has been fixed in 2.0 and now that I have gotten it
running, that same page takes less than 5 seconds to generate. That's a
significant improvement and I applaud Ethan and the contributor of the
chained-hash patch for improving that dramatically.

--
Marc


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list