Problem with some NSCA packets getting corrupted on 64-bit SLES 10

Brian A. Seklecki lavalamp at spiritual-machines.org
Sat Jan 19 18:31:42 CET 2008


MF:

Show us your ocsp_command and ochp_command mappings.  Are you calling a
piped command from checkcommands.cfg or calling an external shell
script?

I guarantee you the comma (",") in results is being mapped into a field
delimiter, which confuses nscad(8).

~~BAS 

On Thu, 2008-01-17 at 10:37 -0500, Frost, Mark {PBG} wrote:
> I've recently begun an effort to move our Nagios installation to a
> distributed architecture from a centralized one.  I had previous used
> NSCA only for a very few passive checks and it works fine on a 32-bit
> Red Hat AS 3 platform (the centralized server).
> 
> In testing on a distributed architecture (which is 64-bit Suse Linux
> Enterprise Server (SLES) 10), I seem to have a problem with NSCA.  (Note
> that all Nagios and NSCA binaries and libraries were recompiled on the
> 64-bit platform).
> 
> After I broke out all the checks to have 2 separate distributed nodes
> send to a central server, I saw a few messages like this one in the
> nagios.log file:
> 
> [1200583727] Warning:  Passive check result was received for service '0'
> on host 'HOSTXXX', but the service could not be found!
> 
> but only about every 1 out of 10 or maybe 20 results was doing this.
> That is, the rest of the results were being correctly shown as "EXTERNAL
> COMMAND" and all expected NSCA fields came up correctly (hostname,
> service desc, check result, text output).
> 
> I started having the "send_nsca" script from the distbributed nodes log
> what they were sending to a file.  When I correlate what they're sending
> with what the NSCA daemon thinks it's receiving, the client is still
> sending the correct 4 fields, but it's as if the NSCA daemon is dropping
> the 2nd field (service desc) and replacing it with the check result
> field.  So ultimately, it thinks the service name is '0'.
> 
> I can't see that this matches a pattern (i.e. always on the same hosts
> or same service checks).  All I've seen so far is that it happens
> whether I run NSCA as --single or --daemon.  It also happens even if I
> turn off one of the distributed nodes (that is, I can't see it being
> volume related).
> 
> I have turned on debugging in the NSCA daemon to see what it thinks it's
> getting and it echoes what the nagios.log shows:
> 
> SERVICE CHECK -> Host Name: 'HOSTXXX', Service Description: '0', Return
> Code: '0', Output: ' rta=0.140000 ms)'
> 
> Again, maybe only 1 out of 10.  Ultimately, this causes the server to
> run an active check as it thinks it never got a result from the
> distbributed node.
> 
> I'm still trying to dig deeper, but it seems to me that this is
> increasingly pointing to some issue with 64-bit SLES.  Or perhaps some
> variable type in NSCA daemon that's not quite right for 64-bit.  It's
> hard to tell with its intermittent nature and the fact that I have yet
> to discover a pattern.
> 
> Has anyone seen anything like this before?
> 
> Thanks
> 
> Mark
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> 
> 
> 
> 
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list