server profiling options

Jim Avery jim at jimavery.me.uk
Fri Nov 19 20:45:05 CET 2010


On 18 November 2010 20:18, Daniel Wittenberg
<daniel.wittenberg.r0ko at statefarm.com> wrote:
> I’m looking at minimizing the CPU impact that nagios has on our server, and
> done some of the basic performance tuning stuff, but what I see right now is
> a lot of the nagios worker procs generating a lot of CPU and curious if
> there was a way people have used to watch what those processes and threads
> were doing that might be taknig the most cycles to try and reduce it?


I've just been looking at this myself.

I'm a bit suspicious about the external_command_check_interval
directive ( see http://nagios.sourceforge.net/docs/3_0/configmain.html
)

If it's set to "-1" (as mine was until recently) then  Nagios will
check external commands as often as possible.  I suspect it helps if
you set it to a definite interval, for example 15s, but check
nagiostats to make sure your command buffers don't fill up.


IME Nagios itself is usually quite light on CPU.  It's the plugins and
how frequently they run which affect performance the most.   I always
set check_interval and retry_interval as long as possible in service
definitions to spread the load as much as possible.

Some plugins can be real performance hogs too, especially
check_esx3.pl if you use that (I don't mean to dis' it, as it's a
super plugin - it just eats cpu).  Run 'top' and you will probably see
which plugins are the biggest hogs on your system.

ndo (the interface with MySQL if you have that installed) can be a
real performance hog.  That's a whole other topic!

If you're using pnp4nagios for graphing performance data, consider
setting it up in bulk mode, ideally on a separate server.  It won't
make a huge difference but might help a bit.

If it's more important to you to stop Nagios hammering your server
than it is for Nagios to work right, you can use max_concurrent_checks
to limit the number of checks Nagios can run at any time.  Keep an eye
on your service check latency if you do that though - if latency gets
too high (more than a minute or so) you will find Nagios' usefulness
diminish quite rapidly!  Personally I think you should give Nagios a
dedicated server and let it use as much CPU as it needs.

Oh, and v3.1.3 includes a fix which improves performance of the status
cgis.  I'm looking forward to trying that myself next week.

Ah, yes, if you have quite a few users, consider setting
"refresh_rate" in cgi.cfg to a longer time, otherwise everyone who
leaves a status screen open in their browser will hit your Nagios
server every 90 seconds (or whatever value it's set to on your
system).   If I recall I set mine to 180.

I'm not sure if any of this will help you, but hopefully it will give
you an idea or two.

Cheers,

Jim

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list