Alleviating Nagios i/o contention problem

Ciro Iriarte cyruspy at gmail.com
Mon Sep 27 17:34:32 CEST 2010


2010/9/25 Frost, Mark {PBC} <mark.frost1 at pepsico.com>:
> Greetings, listers,
>
> We've got an on-going issue with i/o contention.  There's the obvious
> problem that we've got a whole lot of things all writing to the same
> partition.  In this case, there's just one big chunk of RAID 5 disk on a
> single controller so I don't believe that making more partitions is going to
> help.
>
> On this same partition we have:
>
> 1) Nagios 3.2.1 running as the central/reporting server for a couple of
> other Nagios nodes that are sending check results via NSCA.  Approximately
> 6-7K checks.
>
> 2) pnp4nagios 0.6.2 (with rrd 1.4.2) writing graph data.
>
> There's a 2nd server configured identically to the first that's acting as a
> "hot spare" so it also receives check data from the 2 distributed nodes and
> writes its own copy of the graph data locally as well.
>
> At the moment I'm concerned about the graphdata, but because I can only see
> i/o utilization as an aggregate, I can't tell what is the worst component on
> that filesystem -- status.dat updates?  graph data?  writes to the var/spool
> directory?  We also look at continued growth so this is only going to get
> worse.
>
> These systems are quite lightly loaded from a CPU (2 dual-core CPUs) and
> memory (4GB) perspective, but the i/o to the nagios filesystem is queuing
> now.
>
> We're about to order new hardware for these servers and I want to make a
> reasonable choice.  I'd like to make some reasonable changes without
> requiring too exotic of a setup.  I believe these servers are currently Dell
> 2950s and they're all running Suse Linux 10.3 SP2.
>
> My first thought was to potentially move the graphs to a NAS share which
> would shift that i/o to the network.  I don't know how that would work
> though and it would ultimately be an experiment.
>
> What experiences do people out there have handling this kind of i/o and what
> have you done to ease it?
>
>
> Thanks very much!
>
> Mark
>

Depending on the kernel version used, you could use iotop to check
what processes are the top I/O consumers...

Regards,

-- 
Ciro Iriarte
http://cyruspy.wordpress.com
--

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list