Weird error with Nagios 2.0b4 on RHEL 4

Fred f1216 at yahoo.com
Thu Oct 27 18:59:27 CEST 2005


I may have been getting lucky with the service_message struct warning, however,
it has not seemed to have been a problem even on a system of over 1000+ nodes
with 6 distributed monitors.

>From looking at the code, the service_message struct appears to be the data
structure that is created when a worker thread pulls a line off of the 
nagios FIFO and creates a structured work item and adds it to a queue.  The
message appears to be a warning that writing the data and accessing it between
threads might be at risk, however, there seem to be locks around the access.

I had actually built a test image where I changed the max hostname length from
64 to 40 just to push the structure under the 512 but there were no apparent 
changes (note I was debugging what I believe to be a Linux FIFO problem that
causes some fgets() calls to complete even if they don't have a \n in the
buffer, essentially, writes that fill the entire FIFO buffer at 8k cause
a premature completion and therefore a fifo corruption)   Turns out when I
shrunk the service_message struct I was able to reproduce the FIFO failures 
much more quickly ... 

I believe on EM64T the time and other substructures push the size over the
edge.

I haven't seen the check_ping rc=127 problem on RHEL4 with b3/b4.

-FredC

--- "Ritter, Nicholas" <nicholas.ritter at americantv.com> wrote:

> I am running Nagios 2.0b4 on Redhat Enterprise v4 Update 2 for Intel
> EMT64. I noticed that the ping checks fail with status code 127. The odd
> thing is that when I manually reschedule the command check through the
> web interface, the host is changed state to up, but upon next check the
> return code of 127 comes back. Additionally I notice that during a
> preflight check of the config (nagios -v in the shell) I see only one
> warning concerning the "service_message struct", specifically: 
>  
> Warning: Size of service_message struct (528 bytes) is >
> POSIX-guaranteed atomic write size (512 bytes). Service checks results
> may get lost or magled!
>  
> I used the DAG rpms for this Nagios install.
>  
> Any ideas on what I should look at fixing to resolve these two problems?
>  
> Nicholas
> 







-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list