Weird error with Nagios 2.0b4 on RHEL 4

Fred f1216 at yahoo.com
Fri Oct 28 02:10:41 CEST 2005



--- Andreas Ericsson <ae at op5.se> wrote:

> Fred wrote:
> > I may have been getting lucky with the service_message struct warning,
> however,
> > it has not seemed to have been a problem even on a system of over 1000+
> nodes
> > with 6 distributed monitors.
> > 
> >>From looking at the code, the service_message struct appears to be the data
> > structure that is created when a worker thread pulls a line off of the 
> > nagios FIFO and creates a structured work item and adds it to a queue.  The
> > message appears to be a warning that writing the data and accessing it
> between
> > threads might be at risk, however, there seem to be locks around the
> access.
> > 
> 
> Inaccurate. The service_message struct is what's being written to the 
> pipe for later processing. If only 512 bytes are written and the struct 
> is larger than that, you're in for trouble.
> 
ok, it was a quick read of the code ;-)  However, I suspect the entire
528 byte (sizeof struct) is written, it is that POSIX only promises that
the first 512 bytes are atomic.



> > I had actually built a test image where I changed the max hostname length
> from
> > 64 to 40 just to push the structure under the 512 but there were no
> apparent 
> > changes (note I was debugging what I believe to be a Linux FIFO problem
> that
> > causes some fgets() calls to complete even if they don't have a \n in the
> > buffer, essentially,
> 
> 
> fgets() is supposed to return whatever it can read if there's no newline 
> within the limits of the second arg.

I can promise that if I do large writes to a FIFO:

@array = ... lots of fifo lines
print FIFO @array;

that perl does large block writes and the fgets() *sometimes* returns
a short line and the rest of the line gets picked up in the next fgets().
Doing sysopen(), $|=1; while (@array) { print FIFO $_ } cleans things
up.  This causes smaller line writes to be written to the FIFO and seems
to workaround the problem.

Thanks for the correction.
-FredC

> 
> That being said, Nagios read()'s the fifo.
> 
> 
> > writes that fill the entire FIFO buffer at 8k cause
> > a premature completion and therefore a fifo corruption)   Turns out when I
> > shrunk the service_message struct I was able to reproduce the FIFO failures
> 
> > much more quickly ... 
> > 
> 
> This is weird. Most systems have sysconf(_SC_PAGE_SIZE) for atomic 
> writes, since that's what natural for the system. This would mean 4096 
> for Linux on i386 and shouldn't ever create fifo inconsistencies.
> 
> > I believe on EM64T the time and other substructures push the size over the
> > edge.
> > 
> 
> int on 64-bit archs are sometimes 64 bits wide. If there's no penalty in 
> doing 32-bit processing it'll be 32 bits.
> 
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by the JBoss Inc.
> Get Certified Today * Register for a JBoss Training Course
> Free Certification Exam for All Training Attendees Through End of 2005
> Visit http://www.jboss.com/services/certification for more information
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 







-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list