Opinions on load balancing and failover mechanisms

Andreas Ericsson ae at op5.se
Thu Jan 26 16:58:49 CET 2012


On 01/25/2012 11:44 PM, Mike Lindsey wrote:
> There are a lot of options..  DNX, Merlin, mod_gearman to name a few...
> I could read the docs (and have read a good portion of some of them) and
> could implement test environments (and will eventually need to) but
> first I want opinions from people who've done this at large scale.
> 
> I need to improve on our load distribution and failover mechanisms.
> Right now worker node outages are handled through freshness checking,
> and master node outages are handled through a load balanced vip and some
> fancy cron jobs that kick up a cold spare.
> 
> What are the better options for local load distribution and geographic
> master failover?  Which options will better handle thousands of servers
> across a dozen colos, in half a dozen countries, when the goal is that
> no single host (or colo!) going offline can be allowed to have an effect
> on any other subset of the infrastructure?  Which options should I avoid?
> 
> Currently running Nagios Core 3.2.1 with NSCA 2.9 on mostly FreeBSD
> systems.  Soon that should be Core 3.3, with XI on top, plus whatever
> load distribution mechanism wins the dog fight.
> 

For failover, merlin is the only solution. If a poller at some colo or
in some country goes down, the master will try to take the checks over,
unless you tell it not to.

mod_gearman is probably more efficient at running checks with minimal
cpu usage on multiple nodes until the new check engine in vanilla
Nagios is completed. After that, the in-core one will be superior to
all other options for simply distributing load, although it still won't
do failover.

To forestall your question "When can I expect to see that new check
stuff", the answer is "in 9 weeks time, tops". That's when my deadline
for it expires. Mid april, if you hate weekcounts as much as I do.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list