Nagios and PNP Perfomance Issue

Andreas Ericsson ae at op5.se
Wed Feb 3 09:06:09 CET 2010


On 02/02/2010 05:47 PM, Rodney Ramos wrote:
> Hi everybody,
> 
> I´m using Nagios (3.2.0) to monitoring and colect perfomance data of 25.000
> hosts, with 50.000 services.
> 

That's quite a large environment. I think the brazilian government is
monitoring a completely huge network as well.

> I have two central machines (one for backup) and 10 distributed servers to
> colect status and send them to the central servers.
> 
> It´s working but I´m having serious performance problems.
> 
> First the Tactical Overview on the central machines is taking almost 1
> minute to refresh. I think that its because the status.dat file is too big
> (almost 100 MB).
> 

I'm not surprised. You'd probably want to get that data into a database to
get some quick filtering on it.

> Second, the adddon PNP 0.4.14 is taking a long time to process the
> performance data files. These files are increasing faster than the capaciy
> of process_perfdata.pl script to process them.
> 

I wouldn't use PNP on the same system as such a huge Nagios installation,
to be honest. A separate system with a flushing/caching daemon piping
output directly to a single instance of process_perfdata.pl would be far
better. I don't know if process_perfdata.pl has to be hacked to accept
input on stdin, but I can't imagine that would be very difficult. Then
it's just a matter of flushing the performancedata files to that running
instance. A really small daemon program could easily handle that if the
performance data processor script just renames the perfdata file to
something unique.

> 
> Can anyone help me to improve the performance of Nagios and PNP to this
> enviroment?
> 

Yes, but it sounds like an awful lot of work that I'm not very interested
in doing for free. You have some pointers now, so try that. If that doesn't
work, come back here and we'll try something different.

> P.S.: All my Nagios servers are virtual machines with Red Hat. The central
> servers have 2 CPUs and 2 GB of memory. The colectors have 1 CPU and 1 GB of
> RAM. Do you think that change the central servers to physical machine I will
> have a big performance improvement? How much?
> 

Virtual machines have notoriously poor disk performance. Moving it to a
physical machine will almost certainly remove or widen your current bottleneck
by quite a lot.

> I think that this is a good test for Nagios. I have a demand to put 100.000
> hosts with 200.000 services in this enviroment!!!!. Is it possible? Has
> someone a Nagios configuration so big?
> 

What matters is how much data per second you intend to process, and how many
checks per minute you intend to run. With a check interval of 6 months, I
expect Nagios will run just fine with several million service checks configured.
With a check interval of 10 seconds, you'd probably run into problems around
10000 services.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list