Nagios 2.6 still not draining command pipe fast enough

Ethan Galstad nagios at nagios.org
Mon Feb 19 16:22:46 CET 2007


John P. Rouillard wrote:
> Hi all:
> 
> I am trying to get my external correlation engine working with nagios
> 2.x <http://www.cs.umb.edu/~rouilj/#secnagios>, and I just can't get
> nagios to drain the command pipe fast enough. I see approx. 5% failure
> rate on writing to the command pipe with an EAGAIN error.
> 
> I have increased:
> 
>   nagios.h:#define COMMAND_BUFFER_SLOTS              20480
>   nagios.h:#define SERVICE_BUFFER_SLOTS             20480
> 
> from the original 1024. In the increase of the settings from 10240 to
> 20480, I may see a slight decrease (maybe .5%), but I think I just want to see it. I don't think it's statistically viable.
> 
> Are the patches that allow setting these variables at run time rather
> than at compile time available in the 2.x series of nagios? Same
> question for the patches that allow monitoring how many slots are
> full.
> 
> Does anybody have any idea on how to detect why nagios is not draining
> the command queue fast enough?
> 
> Since the external correlation engine is driven from the service check
> results, I won't write anything if nagios isn't doing service checks,
> so I assume this rules out spending 5% of nagios's time running
> host/topology checks.
> 
> Does anybody have patches that would allow me to log when nagios has
> stopped processing the command queue because of host checks? Then I
> could compare the timestamps of the failure with the time nagios
> starts/stops topology checks.
> 
> Also I do have a couple of event handlers which I believe also stop
> nagios from reading the command queue, but they aren't triggering that
> often. I have just disabled the event handlers in the gui interface
> and am rerunning my tests with the slots set to 20480.
> 
> Is there anything I am missing that could cause the command pipe to
> back up? Or is using the command pipe heavily just doomed to failure
> and I will have to use Nagios 3.x when it becomes available?
> 
> Running under the debugger does slow things down enough that I can't
> really tell what may be causing a problem. Don't you just love
> Heisenberg problems?
> 
> Thanks for any ideas as I have pretty much exhausted my ideas
> here. 
> 
> 				-- rouilj
> John Rouillard

John -  Does this problem still occur with Nagios 2.7 or the latest 2.x 
CVS code?  A separate command file worker thread should be reading 
entries from the external command file as fast as it can read them (as 
long as their are free buffer slots).

If there aren't any external commands, the thread waits 0.5 seconds 
before checking for new commands in the file.  If you have occasional 
bursts of check results, this could be too long to wait.  You could try 
experimenting with decreasing the 0.5 second delay.  Around line 4948 of 
base/utils.c, you'll find...

/* wait a bit */
tv.tv_sec=0;
tv.tv_usec=500000;
select(0,NULL,NULL,NULL,&tv);

You could try decreasing the value of tv.tv_usec to 100000 (0.1 seconds) 
and see if that helps at all.


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list