Nagios and Gearman - huge environment performance problem

Jim Avery jim at jimavery.me.uk
Tue Aug 23 22:51:54 CEST 2011


On 23 August 2011 21:21, Rodney Ramos <rodneyra at gmail.com> wrote:
> Hi, everybody. Sorry for taking so long to reply, but I was testing what was
> suggested.
>
> Well, I put all files (status.dat, checkresults, nagios.tmp, nagios.log etc)
> on a ram disk (/dev/shm). I also disabled all brokers module, leaving only
> the mod_gearman broker, of course. I disabled flapping detection,
> performance processing, everything.
>
> The result: absolutely nothing. No improvement. Nagios still stays with 100%
> of CPU. Latency is still big, beteween 250 to 500 sec.

I've been squeezing performance out of my Nagios system lately.  One
thing I've just started experimenting with is status_update_interval.
If Nagios is updating the status file, then anything else wanting to
read it is presumably going to be waiting on the lock (I know
embarassingly little about linux file locking by the way).  I found
that even a modest increase in status_update_interval helps a fair
bit, certainly with the response times for status.cgi.  I  must learn
more about how linux locking works and in particular whether and how
processes spin waiting on a lock and whether that behaviour is tunable
in the kernel.

I found that on Ubuntu linux, moving status.dat to a ramdisk would
break the system in that services would mysteriously disappear only to
reappear a few seconds later.  I'm pretty sure that's because file
locking wasn't working while the file was on ramdisk.  I've not heard
of anyone else reporting that problem so my guess is it's peculiar to
Ubuntu and quite possibly to the (quite old) LTS version of Ubuntu I'm
running at the moment.  An easy way to see evidence of this is that it
also broke freshness checking so the log file was peppered with
freshness checks failing when they shouldn't have.

Another thing I noticed is that if you enable the mrtg checks of
Nagios stats (http://nagios.sourceforge.net/docs/3_0/mrtggraphs.html),
that has quite a large impact on performance (it does on my system
anyway).  As a result I have configured them to run less frequently
than the default.

------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management 
Up to 160% more powerful than alternatives and 25% more efficient. 




More information about the Developers mailing list