Bug: NSCA 2.4 on OS X 10.3 (Was: NSCA daemon never writes to nagios.cmd or nsca.dump)

Noah Leaman noah at mac.com
Tue Jan 27 11:06:35 CET 2004


So far everything indicates that I am dealing with a bug in the NSCA 
code running on OS X Panther. Attached is a ktrace of nsca (--single) 
receiving a send_nsca.

The bug prevents nsca from functioning at all...

On Jan 23, 2004, at 9:52 AM, Marc Powell wrote:

>>   22499 nsca     RET   sendto 59/0x3b
>>   22499 nsca     CALL  select(0x7,0xbffff280,0xbffff300,0xbffff380,0)
>>   22499 nsca     RET   select 1
>>   22499 nsca     CALL  sendto(0x5,0xbfffe680,0x3b,0,0,0)
>>   22499 nsca     GIO   fd 5 wrote 59 bytes
>>         "<30>Jan 22 16:21:37 nsca[22499]: Handling the connection..."
>
> Just FYI, this indicates that nsca _is_ sending more data to a syslog
> log file but there isn't enough information to determine which.

I don't know where it's writing "Host address checks out ok" though 
which is also in the ktrace output Also the "Connection from 127.0.0.1 
port 50882" output is not written to the syslog even though it's in the 
ktrace.

I do know where it's writing the other entries. It's in the 
/var/log/system.log which is the systems main syslog. This is the same 
syslog that nagios writes to (nagios.log also). With debug=1, in the 
syslog all I get is:

Jan 27 01:33:08 MyTiBook nsca[14231]: Handling the connection...
Jan 27 01:33:08 MyTiBook nsca[14231]: End of connection...

and in nagios.log I get nothing.  It seems pretty clear that there is 
never any external commands sent from nsca. Even if I send an external 
command that contains totally bogus info, I would still see them in 
both syslog and nagios.log:

Yet nothing is ever seen when send_nsca sends data. Also, I have 
alternate_dump_file=/Users/nagios/var/rw/nsca.dump and that file is 
never even created.

>>   22499 nsca     RET   sendto 59/0x3b
>>   22499 nsca     CALL  fcntl(0x6,0x3,0xbffff410)
>>   22499 nsca     RET   fcntl 2
>>   22499 nsca     CALL  fcntl(0x6,0x4,0x5)
>>   22499 nsca     RET   fcntl 0
>>   22499 nsca     CALL  open(0x6d28,0,0x1b6)
>>   22499 nsca     NAMI  "/dev/urandom"
>>   22499 nsca     RET   open 7
>>   22499 nsca     CALL  fstat(0x7,0xbffff0a0)
>>   22499 nsca     RET   fstat 0
>>   22499 nsca     CALL  ioctl(0x7,FIODTYPE,0xbffff0f0)
>>   22499 nsca     RET   ioctl 0
>>   22499 nsca     CALL  read(0x7,0x14000,0x20000)
>>   22499 nsca     GIO   fd 7 read 131072 bytes
>>
>> "\^E\M-%\M-Z|\M^S\M-O\M-a\M-D\M-j\M-h\M-h;A\M-9\M-e\M^Zc\^D\M-2\M^E\M-
>> m})\^F\M-4\0\^C\M-%\^_1\M-f\M^\\M-j\M-_\M-\4e\M-o\M^Bl\M-Mb\M-_'{M\
>> < *** 2878 lines of this kind of data are cut out... I assume this is
>>   the encrptoed data...
>
> Yup. 131K seems to be an awful lot but it could certainly be attributed
> to the encryption method you are using. Just as a test, I would 
> probably
> try using just simple XOR.

It was and still is set to XOR... decryption_method=1

>>   22499 nsca     RET   sendto 132/0x84
>>   22499 nsca     CALL  select(0x7,0xbffff280,0xbffff300,0xbffff380,0)
>>   22499 nsca     RET   select 1
>>   22499 nsca     CALL  recvfrom(0x6,0xbfffee70,0x2d0,0,0,0)
>>   22499 nsca     GIO   fd 6 wrote 0 bytes
>>         ""
>>   22499 nsca     RET   recvfrom 0
>>   22499 nsca     CALL  sendto(0x5,0xbfffe170,0x35,0,0,0)
>>   22499 nsca     GIO   fd 5 wrote 53 bytes
>>         "<27>Jan 22 16:21:37 nsca[22499]: End of connection..."
>
> This is actually saying -- "End of connection or could not read request
> from client..." If you can increase the number of characters that 
> ktrace
> displays you should see that (it's -s 512 for strace to display 512
> characters for example).

Not sure what you mean here. As far as I know, ktrace is displaying all 
characters. I only sent you a portion of what I captured, but made sure 
that it was a complete transaction. I edited out the 
"M-h;A\M-9\M-e\M^Zc" lines to save on readability.

> This is really a symptom of whatever the problem is, but it appears 
> that
> it might be strictly a communication problem. Looking at the code, the
> pertinent section is:
>
>        /* process all data we get from the client... */
>
>         /* read the packet from the client */
>         bytes_to_recv=sizeof(receive_packet);
>         rc=recvall(sock,(char
> *)&receive_packet,&bytes_to_recv,socket_timeout);
>
>         /* recv() error or client disconnect */
>         if(rc<=0){
>                 if(debug==TRUE)
>                         syslog(LOG_ERR,"End of connection or could not
> read request from client...");
>                 encrypt_cleanup(decryption_method, CI);
>                 close(sock);
>                 if (mode==SINGLE_PROCESS_DAEMON)
>                         return;
>                 else
>                         do_exit();
>                 }
>
> This is before it even attempts to decrypt the packet. The recvall
> function, which is really where the failure is, looks like it has a
> couple of failure modes. First is if it doesn't get all the data it's
> expecting (client disconnect?) and second is if it takes longer than a
> specific timeout value which looks to be 10 seconds. If you run
> send_nsca by hand, how long does it take to run? Have you tried 
> enabling
> debug on your remote server and see what send_nsca has to say? I'd also
> verify hard-coding of speed and duplex on both servers, make sure that
> you're not seeing any packet loss between the two machines, etc... 
> Don't
> forget about trying a different encryption method. XOR on my machines
> seems to send less than 4K as opposed to you 131K per submission.

The send_nsca command takes less than a second to run and finish so I 
dont think it's that. How do I enable debug on remote server? I added a 
debug=1 to the send_nsca.cfg file but it complains with a "Unknown 
option specified" when running send_nsca. The ktrace I sent you is with 
XOR.

When I run the send_nsca comand manually it replies with:
0 data packet(s) sent to host successfully.

Is that normal? should it be 0 data packets? There is no packet loss 
between the systems.

I have setup nsca on another OS X 10.3 system and tried to get it to 
work by sending from the same system that the nsca daemon is running on 
and I get the same problem.

To manually send it I used:

/bin/echo -e "host.domain.com\tPing\t0\tPING OK - Packet loss = 0%, RTA 
= 443.71 ms\n" | /Users/noah/nsca/send_nsca -H 127.0.0.1 -c 
/Users/noah/nsca/send_nsca.cfg

I attached an unedited ktrace/kdump output of exactly one send_nsca 
command in case it helps at all.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nsca.ktrace
Type: application/text
Size: 399430 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20040127/3f87600c/attachment.bin>


More information about the Developers mailing list