FYI, nagios command file pipe may cause passive service corruption

Mooney, Ryan ryan.mooney at pnl.gov
Sat Jun 14 00:01:00 CEST 2003


Just as an FYI since I've worked around it but since the var/rw/nagios.cmd
is a pipe there is a possibility of multiple processes writing to it at the
same time which can cause the messages to interleave giving erroneous results
(lost messages, bad data, ??).  I looked through the NSCA and some other 
scripts/programs that are using that facility and didn't see anything doing 
any locking either (release or CVS).  Under "normal" traffic volumes the odds
of this happening are quite low.  I hit it quite a few times when I was pumping 
in ~2000-4000 checks/minute (burst - not sustained).

I worked around it by front ending all of my (high volume passive) checks with 
another program (syslog-ng in this case, although any one of a number of other
apps would likely work) that opens a filesystem socket and the nagios pipe and
then bundles them through for me.  The FS Socket doesn't have the same problem
since every client is a seperate fd.

>From what I've heard the 2.0 interface will be completely different, so hopefully 
this problem goes completely away then.  In the interest of not wasting developers
time on old problems so they can get the next gen out :) if anyone else is seeing an
issue here is the hack workaround I'm using:

source s_stream { unix-stream("/path/.syslog.nagios" max-connections(100) owner(nagios) group(www) perm(0640)); };
destination nagios { file("/nagios/var/rw/nagios.cmd" template("$MSG\n") sync(0) owner(nagios) group(www) perm(0660)); };
log { source(s_stream); destination(nagios); };

You can then use it in perl like:
use IO::Socket::UNIX;
$TS = IO::Socket::UNIX->new(Type => SOCK_STREAM, Peer => "/path/.syslog.nagios") || die "$!";
printf($TS "PROCESS_SERVICE_CHECK_RESULT;%s;service;%d; %s\n" $node, $result, $output);
close($TS);

The only gotcha I've seen so far is that the application thats doing the pass through 
(syslog-ng in this case) needs to re-open the nagios pipe is nagios is restarted.  I 
put a kill -HUP for syslog-ng into the nagios startup.

Another solution would be to hack all the applications that access the nagios.cmd file
perform locking on it.  The locking is only advisory though so if anyones program(s) 
didn't use it... well we'd be right back where we started.


-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list