Weird error with Nagios 2.0b4 on RHEL 4

Andreas Ericsson ae at op5.se
Fri Oct 28 01:57:21 CEST 2005


Fred wrote:
> I may have been getting lucky with the service_message struct warning, however,
> it has not seemed to have been a problem even on a system of over 1000+ nodes
> with 6 distributed monitors.
> 
>>From looking at the code, the service_message struct appears to be the data
> structure that is created when a worker thread pulls a line off of the 
> nagios FIFO and creates a structured work item and adds it to a queue.  The
> message appears to be a warning that writing the data and accessing it between
> threads might be at risk, however, there seem to be locks around the access.
> 

Inaccurate. The service_message struct is what's being written to the 
pipe for later processing. If only 512 bytes are written and the struct 
is larger than that, you're in for trouble.

> I had actually built a test image where I changed the max hostname length from
> 64 to 40 just to push the structure under the 512 but there were no apparent 
> changes (note I was debugging what I believe to be a Linux FIFO problem that
> causes some fgets() calls to complete even if they don't have a \n in the
> buffer, essentially,


fgets() is supposed to return whatever it can read if there's no newline 
within the limits of the second arg.

That being said, Nagios read()'s the fifo.


> writes that fill the entire FIFO buffer at 8k cause
> a premature completion and therefore a fifo corruption)   Turns out when I
> shrunk the service_message struct I was able to reproduce the FIFO failures 
> much more quickly ... 
> 

This is weird. Most systems have sysconf(_SC_PAGE_SIZE) for atomic 
writes, since that's what natural for the system. This would mean 4096 
for Linux on i386 and shouldn't ever create fifo inconsistencies.

> I believe on EM64T the time and other substructures push the size over the
> edge.
> 

int on 64-bit archs are sometimes 64 bits wide. If there's no penalty in 
doing 32-bit processing it'll be 32 bits.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list