Reducing Load on a Distributed Nagios Installation

Marcel Mitsuto Fucatu Sugano msugano at uolinc.com
Mon Oct 24 16:34:42 CEST 2005


Hi JP,

On Sat, 2005-10-15 at 11:53 +0200, Jan-Piet Mens wrote: 
> We've experienced quite a bit of load on a distributed Nagios
> installation with several thousand passive service checks which
> are supplied to a central Nagios server via NSCA. Our central
> Nagios 1.2 server started swapping and subsequently thrashed
> itself to death. After a bit of debugging, we've come up with a
> solution which may be interesting to those in a similar position.

I'm dealing with distributed monitoring with central server as you do,
but in my case, we have 11 monitoring agents, that sends their check
results to nsca on the central server. I'm using nagios2.0b4 for the
central server and nagios1.X on the agents. Counting all checks that is
passively sended to the central server, it sums over 10000 passive
checks been received by one commoditie hardware, highly available, a
Pentium4-HT, running SuSE9.3, very simple. But it works. No thrashing
experienced so far. But we do not check_icmp over stale check results.
We simply show this as an Unknown alert with an output of stale, and try
to find reasonable freshness thresholds. 

In your situation, i would thought about upgrading the central nagios
server to 2.0b4.

> 
> We've documented the proceedings as well as the solution we 
> implemented at http://wiki.fupps.com/nagios/icmp
> 
> Regards,
> 	-JP

Nice solution there, it may show that an installation with big passive
nagios configuration will thrash the central server, if
freshness_threshold and freshness_checking report staled results from
distributed monitoring agents, become to be happening in such a low
latency that the command associated with the staled passive service
report, will fork too many childs, waiting to write to the pipe.

But, have you thought _not_ to be checking host-alive whenever a staled
results check-in? Anyways, it was very nice and clearfull reading the
workaround of your problem. Thanks.

-- 
Marcel Mitsuto Fucatu Sugano <msugano at uolinc.com>
Universo Online S.A. -- http://www.uol.com.br


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list