Large scale installation

Giorgio Zarrelli zarrelli at linux.it
Mon Jun 11 21:40:18 CEST 2012


Hi,

I suggest to review your installation. Try with the large installation
tweaks http://nagios.sourceforge.net/docs/3_0/largeinstalltweaks.html.

Then, check whether you need all your checks at 5 mins or you can move
some of them to 10 mins pace.

Then, review your check plugins: Perl plugins eat more memory and CPU
cycles then C compiled checks. If they support EPN
http://nagios.sourceforge.net/docs/3_0/embeddedperl.html, use it, it makes
your plugin faster and lighter.

Then, check your checks. Some checks return data slower then others. Let's
say, SNMP checks are not lightning fast.

Then, check your graphs. Graphing perfdata takes CPU cycles and uses
memory. Do you need all your graphs?

Then, get rid of NDOUtils. They are chocking all the way, not efficient,
clumsy, old and heavy. If you want to store your data in MySQL, use Merlin
instead.

Anyway, did you tune your MySQL? Is it causing too much I/O? Is it
munching too much RAM or CPU cycles?

Did you tune your Apache or http server? Does it cope with your needs? Is
it munching too much RAM or CPU cycles?

If you want live infos about your hosts and services, let's say to use
with Navis, grab MKlive: it's blazing fast and gives you access to the
core Nagios process.

Are you using a virtualized environment? If so, remember that I/O layer in
virtualized environments has a poor performance, use fast and real disks
and your I/O will drop dramatically.

Try to move status.dat to /dev/shm. The latter is a ram disk ready to use
and writing in ram is always faster then writing on disk.

Avoid logging too much, it increases I/O and takes CPU and RAM.

What iotop and iostat are telling you?

What do you see in top or htop?

If you can or wish, compile all from sources, it will go faster on your
system.

You can use passive checks with NSCA or NRDP to reduce load, even though I
do not like them a lot.

These are just few ideas that came to my mind.


Let's talk about sharing load.

You can use different methods:

Merlin
(http://www.op5.org/community/plugin-inventory/op5-projects/merlin): gives
you loadbalancing and redundancy. I use it for Ninja, never used for load
balancing and redundancy.

DNX (http://dnx.sourceforge.net/): Something new, it's gaining momentum,
good to offload the checks. Worth to give a try.

Mod_gearman (http://labs.consol.de/lang/de/nagios/mod-gearman/): Love at
first site :-) Easy, powerful, load balancing and fault tolerant. Compile
gearmand with memcached support and all the result checks will go directly
to ram, avoiding I/O on disk. It's really simple to setup, if one of the
workers go down, the others will share its work. Be careful: security is a
problem, there is not a good auth system, but using a VPN will solve the
problem. Efficient, I use a virtual machine with 2 cores and 2 gb of ram
to make about 5K checks. And the load is not a concern. You need more
horse power? Add a worker. You have some checks timing out due to poor
connections to the targets? Put a worker close to the target, but be
careful, the timing, let's say the rta of a ping, will be from the worker
perspective.

Well, hope it helps.










------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list