Scaling nagios: The right way to go?

Jochen Tuchbreiter jt at domainfactory.de
Tue Jul 12 18:17:44 CEST 2005


Hello,

currently I am running a setup of about 700 hosts with about 6.500 services,
all actively checked. This is Nagios v1 with historic "check_nrpep"
(compiled via perlcc). About 3.000 of the services get graphed via apan.
Nagios runs on one machine, all checks that get graphed are run on a second
one (graphs are also generated there). Both machines are high end x86
equipment.

Loading status.cgi currently takes about 6 CPU-seconds on a dual-xeon. The
service checks (scheduler) is getting behind on a regular basis, concurrency
is tuned to the max.

Obviously I need to do something about this. In fact I am looking for a
solution that will scale up to 2.000 hosts with 15.000 services.

What way of scaling to you recommend? I can see three different scenarios:


1.
"Divide and conquer" by using many smaller nagios-installations monitoring
disjoint host quantities, somehow keeping the configuration loosely in sync
(-> contact-data etc.)

pros: 
-> KISS at its best
-> will scale "indefinitely"

cons: 
-> syncing of configuration files will be a major PITA
-> No common status overview. (Will have to write my own frontend to merge
the status information)


2.
"Optimize as hell" by using passive service checks only (-> uh, will
graphing still work?), switching from apan to something faster, ditching
nrpep for nrpe, ...?

pros:

-> Still KISS

cons:

-> major changes compared to my current setup would be required

-> not sure if this will scale up to the numbers above no matter how well I
optimize


3.
"Scale like the documentation says" by switching to Nagios 2 beta(!) and
setting up the complex "cluster" configuration required to do this.

pros:

-> doing it the way it is supposed to be done -> people won't kill me for
being "weird" when I show them my setup

cons:

-> major changes compared to my current setup would be required

-> using beta software for this

-> complexity! I don't really like to maintain n sets of configuration files
for n cluster members. I'd have to find a way to generate cluster-member
configuration files by parsing the file on the master


Has anybody gathered experience with one of the three ways and can recommend
/ speak against one of them? Are there any large installations that are well
documented? Does anybody trust their business on Nagios v2 beta yet? Any
help / comments would be highly appreciated.

regards,
Jochen



-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, 
AMD, and NVIDIA.  To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list