Scaling nagios: The right way to go?

Rob Moss robmossrm at aol.com
Tue Jul 12 19:19:05 CEST 2005


Hi,
    AOL UK is using Nagios 2.0b3 for some monitoring (under 1,000 hosts).

I'm actively developing plugins and tuning/compiling code for Sun 
Solaris 8 and 9, no problems so far.

There are some real bottlenecks performing monitoring from one host, 
aside from the fact that some of the built-in plugins are simply 
wrappers to the real program (check_icmp for a start).

If you are after performance, multiple check hosts feeding into a 
collector is probably the best way, and scrapping some of the built-in 
nagios checks, in favour of compiled C checks that do all the work 
instead of being wrappers around other system tools.

Also, the built-in Perl compiler should be quite fast for executing Perl 
scripts, such as HTTPS checks in perl/LWP etc.

Hope this helps
Rob Moss
AOL UK

Jochen Tuchbreiter wrote:

> Hello,
> 
> currently I am running a setup of about 700 hosts with about 6.500 services,
> all actively checked. This is Nagios v1 with historic "check_nrpep"
> (compiled via perlcc). About 3.000 of the services get graphed via apan.
> Nagios runs on one machine, all checks that get graphed are run on a second
> one (graphs are also generated there). Both machines are high end x86
> equipment.
> 
> Loading status.cgi currently takes about 6 CPU-seconds on a dual-xeon. The
> service checks (scheduler) is getting behind on a regular basis, concurrency
> is tuned to the max.
> 
> Obviously I need to do something about this. In fact I am looking for a
> solution that will scale up to 2.000 hosts with 15.000 services.
> 
> What way of scaling to you recommend? I can see three different scenarios:
> 
> 
> 1.
> "Divide and conquer" by using many smaller nagios-installations monitoring
> disjoint host quantities, somehow keeping the configuration loosely in sync
> (-> contact-data etc.)
> 
> pros: 
> -> KISS at its best
> -> will scale "indefinitely"
> 
> cons: 
> -> syncing of configuration files will be a major PITA
> -> No common status overview. (Will have to write my own frontend to merge
> the status information)
> 
> 
> 2.
> "Optimize as hell" by using passive service checks only (-> uh, will
> graphing still work?), switching from apan to something faster, ditching
> nrpep for nrpe, ...?
> 
> pros:
> 
> -> Still KISS
> 
> cons:
> 
> -> major changes compared to my current setup would be required
> 
> -> not sure if this will scale up to the numbers above no matter how well I
> optimize
> 
> 
> 3.
> "Scale like the documentation says" by switching to Nagios 2 beta(!) and
> setting up the complex "cluster" configuration required to do this.
> 
> pros:
> 
> -> doing it the way it is supposed to be done -> people won't kill me for
> being "weird" when I show them my setup
> 
> cons:
> 
> -> major changes compared to my current setup would be required
> 
> -> using beta software for this
> 
> -> complexity! I don't really like to maintain n sets of configuration files
> for n cluster members. I'd have to find a way to generate cluster-member
> configuration files by parsing the file on the master
> 
> 
> Has anybody gathered experience with one of the three ways and can recommend
> / speak against one of them? Are there any large installations that are well
> documented? Does anybody trust their business on Nagios v2 beta yet? Any
> help / comments would be highly appreciated.
> 
> regards,
> Jochen
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
> July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
> core and dual graphics technology at this free one hour event hosted by HP, 
> AMD, and NVIDIA.  To register visit http://www.hp.com/go/dualwebinar
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 

-- 
Rob Moss
Snr. Unix Administrator
Hosting & DB Operations


-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, 
AMD, and NVIDIA.  To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list