RFC/PATCH: Handle external service check results in seperate thread

Ethan Galstad nagios at nagios.org
Fri Apr 13 13:02:41 CEST 2007


Stefan Rompf wrote:
> Hi,
> 
> like other people on this list, we've been bitten by the problem that nagios 
> fork()s subprocesses when service check results arrive via the external 
> command pipe. When nagios lags for example due to hostchecks, in most cases 
> enough forked processes pile up to bring nagios over its resource limits. 
> Even if this doesn't happen, results will be fed in the wrong order.
> 
> I've developed the following solution that is quite different to the spool 
> directory approach:
> 
> -passive service check results are added to passive_check_result_list as 
> before. However, for our use case it does not make sense to keep multiple 
> results for one service as soon as nagios starts lagging. So we have a 
> duplicate detection that keeps only the newest check result per service.
> -Instead of forking subprocesses, a permanently running thread feeds the 
> results on passive_check_result_list back via write_svc_message(). So two 
> threads of the process talk to each other via a pipe, but I didn't want to 
> make my changes too invasive ;-)
> -Instead of polling the command pipe every 0.5 seconds, select() on the file 
> descriptor is used now if there are enough external_command_buffer_slots. 
> Problem here was that with no writer on the pipe, select() endlessly signaled 
> an EOF. Fixed by opening the command pipe R/W.
> 
> The patch has been developed on nagios 2.6 and linux, afterwards forward 
> ported to current CVS. It seems to work, but needs further testing. Even 
> compilation tests on different architectures would be interesting, I'm not 
> sure how widespread the tsearch()-API is.
> 
> Thoughts?
> 
> Stefan

Sounds interesting.  I'm still leaning towards the spool directory idea, 
as it provides from resistance to problems when Nagios isn't running 
and/or the external command file pipe fills up.

One thing to watch out for is the idea of discarding old/duplicate check 
results.  This isn't always a good thing.  Consider security alerts that 
come in as passive checks.  If you discard all but the newest alert you 
could potentially miss some critical information...



Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list