Nagios 2.6 still not draining command pipe fast enough

John P. Rouillard rouilj+nagiosdev at cs.umb.edu
Sun Feb 11 23:07:18 CET 2007


Hi all:

I am trying to get my external correlation engine working with nagios
2.x <http://www.cs.umb.edu/~rouilj/#secnagios>, and I just can't get
nagios to drain the command pipe fast enough. I see approx. 5% failure
rate on writing to the command pipe with an EAGAIN error.

I have increased:

  nagios.h:#define COMMAND_BUFFER_SLOTS              20480
  nagios.h:#define SERVICE_BUFFER_SLOTS             20480

from the original 1024. In the increase of the settings from 10240 to
20480, I may see a slight decrease (maybe .5%), but I think I just want to see it. I don't think it's statistically viable.

Are the patches that allow setting these variables at run time rather
than at compile time available in the 2.x series of nagios? Same
question for the patches that allow monitoring how many slots are
full.

Does anybody have any idea on how to detect why nagios is not draining
the command queue fast enough?

Since the external correlation engine is driven from the service check
results, I won't write anything if nagios isn't doing service checks,
so I assume this rules out spending 5% of nagios's time running
host/topology checks.

Does anybody have patches that would allow me to log when nagios has
stopped processing the command queue because of host checks? Then I
could compare the timestamps of the failure with the time nagios
starts/stops topology checks.

Also I do have a couple of event handlers which I believe also stop
nagios from reading the command queue, but they aren't triggering that
often. I have just disabled the event handlers in the gui interface
and am rerunning my tests with the slots set to 20480.

Is there anything I am missing that could cause the command pipe to
back up? Or is using the command pipe heavily just doomed to failure
and I will have to use Nagios 3.x when it becomes available?

Running under the debugger does slow things down enough that I can't
really tell what may be causing a problem. Don't you just love
Heisenberg problems?

Thanks for any ideas as I have pretty much exhausted my ideas
here. 

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list