Large Installation

Matt Simmons standalone.sysadmin at gmail.com
Thu Jun 10 22:12:07 CEST 2010


I can't say that I've solved the scalability problem, but I I don't
have it, just because I've implemented a policy such that I never
check any server over a WAN link, with the exception of another Nagios
server (plus both ends of all of the WAN links themselves).

This does require one Nagios server per site, but to me, that's an
appealing idea anyway, because I don't have a single point of failure.
Any of my Nagios installations could die completely, and I'd be
alerted by the others, just like any one internet connection could
die, and I'd still get alerts about it. In the event of a "weird"
failure, I can pretty much construct the network diagram based on
which links are reporting up, and from where.

It does require a certain amount of configuration overhead, but most
of that is done with templating anyway. I don't have my system laid
out exactly like I want, but I'm implementing version control
(subversion, in my case) and I have a different Nagios repository for
each site. If I had more templates (or more shared configuration
files), I would probably have a 'nagios-shared' repository, so I
wouldn't have to replicate everything manually.

As for the arrangement of my configs, it mostly follows this howto
that I did a year ago:
http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/

Hope it can help someone

--Matt


On Thu, Jun 10, 2010 at 3:55 PM, Kevin Keane <subscription at kkeane.com> wrote:
> Nagios does have some scalability issues, but for the most part you won’t
> run into them until you get to truly huge installations.
>
>
>
> I can see three main scalability issues: config file maintenance and the
> need for one central server, and firewall issues.
>
>
>
> Config file maintenance can be improved to some extent with careful design
> of the config files, as well as tools. It is an issue that I am running into
> with a relatively small installation with 80+ hosts and 400+ services. My
> installation is highly heterogeneous and very dynamic, which makes config
> file maintenance a nightmare. Having to restart Nagios after a configuration
> change doesn’t help either. On the other hand, a network with 2000 identical
> machines is probably going to be much easier to manage than my type of
> network.
>
>
>
> The central server is an obvious bottleneck. No matter how powerful the
> machine and the network connection, there are only so many checks results it
> can handle. Fortunately, Nagios doesn’t require much horsepower. Distributed
> monitoring helps with this issue because the most expensive part of Nagios
> is running active checks. With distributed monitoring, the active checks can
> run on multiple smaller boxes, and then send the check results back as
> passive checks.
>
>
>
> Of course distributed monitoring compounds the config file maintenance
> issue, because you have to configure each check multiple times.
>
>
>
> The third issue is not directly a scalability issue. Nagios is built with
> the assumption of a local and mostly trusted network. It’s non-trivial to
> securely get checks to work on remote machines without pretty gaping poking
> holes into firewalls, and/or frequently establishing and tearing down
> encrypted connections with the attendant processing load. There are some
> third-party solutions for this issue, though.
>
>
>
> From: Scott Ward [mailto:13.sward.13 at gmail.com]
> Sent: Thursday, June 10, 2010 12:34 PM
> To: Nagios Users List
> Subject: Re: [Nagios-users] Large Installation
>
>
>
>>Make sure to read these pages:
>>
>>http://nagios.sourceforge.net/docs/3_0/tuning.html
>>http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html
>>
>>Also, if you're monitoring 800 machines across WANs, you might look
>>into distributed monitoring:
>>http://nagios.sourceforge.net/docs/3_0/distributed.html
>>
>>Let us know how it goes!
>
> Thanks for the links.  So the distributive monitoring provided by the Nagios
> docs can handle what we're trying to do?  I have read in a few places that
> Nagios has scalability issues.
>
>>
>>--Matt
>>
>>BTW, what are you using for your config maintenance?
>
> We haven't decided yet. Do you have any recommendations?
>
>
> ~S
>
> On Thu, Jun 10, 2010 at 2:23 PM, Matt Simmons
> <standalone.sysadmin at gmail.com> wrote:
>
> Make sure to read these pages:
>
> http://nagios.sourceforge.net/docs/3_0/tuning.html
> http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html
>
> Also, if you're monitoring 800 machines across WANs, you might look
> into distributed monitoring:
> http://nagios.sourceforge.net/docs/3_0/distributed.html
>
> Let us know how it goes!
>
> --Matt
>
> BTW, what are you using for your config maintenance?
>
> On Thu, Jun 10, 2010 at 1:51 PM, Scott Ward <13.sward.13 at gmail.com> wrote:
>
>> We are looking to do an large installation of Nagios. Is it possible to
>> monitor over 800 machines and over 14000 services?
>>
>> Has anyone tried doing anything like this? If you have how successful was
>> it
>> and how did you configure it?
>>
>> ~Rultax
>>
>
>>
>> ------------------------------------------------------------------------------
>> ThinkGeek and WIRED's GeekDad team up for the Ultimate
>> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
>> lucky parental unit.  See the prize list and enter to win:
>> http://p.sf.net/sfu/thinkgeek-promo
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting
>> any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
>
> --
> LITTLE GIRL: But which cookie will you eat FIRST?
> COOKIE MONSTER: Me think you have misconception of cookie-eating process.
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-- 
LITTLE GIRL: But which cookie will you eat FIRST?
COOKIE MONSTER: Me think you have misconception of cookie-eating process.

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list