Reducing Load on a Distributed Nagios Installation

Jan-Piet Mens jpm at retail-sc.com
Mon Oct 24 17:10:01 CEST 2005


Hello Marcel,

thank you for your comments. The guys in charge of Nagios need or want
the host alive status, so we have to go that way.

Regards,
	-JP

On Mon Oct 24 2005 at 16:34:42 CEST, Marcel Mitsuto Fucatu Sugano wrote:

> Hi JP,
> 
> On Sat, 2005-10-15 at 11:53 +0200, Jan-Piet Mens wrote: 
> > We've experienced quite a bit of load on a distributed Nagios
> > installation with several thousand passive service checks which
> > are supplied to a central Nagios server via NSCA. Our central
> > Nagios 1.2 server started swapping and subsequently thrashed
> > itself to death. After a bit of debugging, we've come up with a
> > solution which may be interesting to those in a similar position.
> 
> I'm dealing with distributed monitoring with central server as you do,
> but in my case, we have 11 monitoring agents, that sends their check
> results to nsca on the central server. I'm using nagios2.0b4 for the
> central server and nagios1.X on the agents. Counting all checks that is
> passively sended to the central server, it sums over 10000 passive
> checks been received by one commoditie hardware, highly available, a
> Pentium4-HT, running SuSE9.3, very simple. But it works. No thrashing
> experienced so far. But we do not check_icmp over stale check results.
> We simply show this as an Unknown alert with an output of stale, and try
> to find reasonable freshness thresholds. 
> 
> In your situation, i would thought about upgrading the central nagios
> server to 2.0b4.
> 
> > 
> > We've documented the proceedings as well as the solution we 
> > implemented at http://wiki.fupps.com/nagios/icmp
> > 
> > Regards,
> > 	-JP
> 
> Nice solution there, it may show that an installation with big passive
> nagios configuration will thrash the central server, if
> freshness_threshold and freshness_checking report staled results from
> distributed monitoring agents, become to be happening in such a low
> latency that the command associated with the staled passive service
> report, will fork too many childs, waiting to write to the pipe.
> 
> But, have you thought _not_ to be checking host-alive whenever a staled
> results check-in? Anyways, it was very nice and clearfull reading the
> workaround of your problem. Thanks.
> 
> -- 
> Marcel Mitsuto Fucatu Sugano <msugano at uolinc.com>
> Universo Online S.A. -- http://www.uol.com.br
> 


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list