Nagios scalability issues

Cristian M. Streng CristiS at Prominic.NET
Mon Jul 19 21:30:30 CEST 2004


Hi Guys,

I've been using Nagios for about two years, and now it seems that it's 
kinda approaching its limits.

We use our nagios server to remotely monitor 400+ hosts with a total of 
~2500 services. We have some plugins running on the main monitoring 
machine, but most are run on the remote machines by using the NRPE plugin. 
Things have been getting slower and slower, and although we upgraded both 
the hardware and the software (we're using Nagios 1.2), it's still too 
slow.

I'd like to get some suggestions on improving Nagios performance. In my 
opinion, the biggest problem with Nagios is the fact that all checks are 
scheduled on a single machine - and this makes it not scale well.

After a bit of thought, I decided to start implementing an alternative 
monitoring engine - it's a lightweight client-server system that moves the 
scheduler part from the main server to the individual machines. This way 
each machine schedules its ~10-20 service checks, and reports back to the 
server the changes in the service status. And it fixes all of Nagios's 
problem - at least all that matter to me: the main server becomes less 
loaded, and the network load is also much reduced. The server part of my 
application just collects the results from client machines and writes them 
to a database, so that adds practically no load to the server machine. I'm 
planning on writing another component that would take these results and 
send them to nagios - but I have a few questions. What's the best way to 
send check results to nagios? Will the external command interface work? 
I'm also interested in the scalability of this feature.

I've also noticed that we are getting pretty close to Nagios 2.0 (maybe 
we'll have it in another year or two :-). So I'd like to ask the 
developers if they are planning to implement something similar in Nagios 
or in the NRPE or NSCA plugins by the time 2.0 is launched - so that I 
know my work is not in vain :)

Thanks,
Cristian Streng.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040719/cd76209c/attachment.html>


More information about the Users mailing list