NSCA daemon never writes to nagios.cmd or nsca.dump

Marc Powell marc at ena.com
Fri Jan 23 18:52:53 CET 2004


On Thursday, January 22, 2004 6:40 PM, Noah Leaman shared with us:

> On Jan 22, 2004, at 3:12 PM, Marc Powell wrote:
> > If you're really desparate, you might try to find an OSX equivalent
> > of strace/truss to actually attach to the process and see what it's
> > doing. 
> > 
> > --
> > Marc
> 
> Here is a ktrace (translated into text by kdump) of the nsca daemon
> while receiving a few submissions. decryption_method=1. it seems to
> in fact read in the submission, and seems to do a write as well which
> contradicts the problem I am having. I am not knowledgeable enough to
> read this ktrace much more than this basic stuff. one thing I dont
> understand is the port numbers it's listing... thats not the problem
> here I'm sure:      

No, the port number listed below are the source ports on your remote
host. They'll always be random for the most part.

> 
>   22499 nsca     RET   select 1
>   22499 nsca     CALL  accept(0x3,0,0)
>   22499 nsca     RET   accept 6
>   22499 nsca     CALL  getpeername(0x6,0xbffff3e0,0xbffff400)
>   22499 nsca     RET   getpeername 0
>   22499 nsca     CALL  sendto(0x5,0xbfffe6e0,0x48,0,0,0)
>   22499 nsca     GIO   fd 5 wrote 72 bytes
>         "<31>Jan 22 16:21:37 nsca[22499]: Connection from 127.0.0.0
> port 50497"   ***NOTE: I changed this to not publish the IP

New connection... Good.

>   22499 nsca     RET   sendto 72/0x48
>   22499 nsca     CALL  sendto(0x5,0xbfffe6e0,0x3b,0,0,0)
>   22499 nsca     GIO   fd 5 wrote 59 bytes
>         "<31>Jan 22 16:21:37 nsca[22499]: Host address checks out ok"

Address check passes... Good.

>   22499 nsca     RET   sendto 59/0x3b
>   22499 nsca     CALL  select(0x7,0xbffff280,0xbffff300,0xbffff380,0)
>   22499 nsca     RET   select 1
>   22499 nsca     CALL  sendto(0x5,0xbfffe680,0x3b,0,0,0)
>   22499 nsca     GIO   fd 5 wrote 59 bytes
>         "<30>Jan 22 16:21:37 nsca[22499]: Handling the connection..."

Just FYI, this indicates that nsca _is_ sending more data to a syslog
log file but there isn't enough information to determine which.

>   22499 nsca     RET   sendto 59/0x3b
>   22499 nsca     CALL  fcntl(0x6,0x3,0xbffff410)
>   22499 nsca     RET   fcntl 2
>   22499 nsca     CALL  fcntl(0x6,0x4,0x5)
>   22499 nsca     RET   fcntl 0
>   22499 nsca     CALL  open(0x6d28,0,0x1b6)
>   22499 nsca     NAMI  "/dev/urandom"
>   22499 nsca     RET   open 7
>   22499 nsca     CALL  fstat(0x7,0xbffff0a0)
>   22499 nsca     RET   fstat 0
>   22499 nsca     CALL  ioctl(0x7,FIODTYPE,0xbffff0f0)
>   22499 nsca     RET   ioctl 0
>   22499 nsca     CALL  read(0x7,0x14000,0x20000)
>   22499 nsca     GIO   fd 7 read 131072 bytes
> 
> "\^E\M-%\M-Z|\M^S\M-O\M-a\M-D\M-j\M-h\M-h;A\M-9\M-e\M^Zc\^D\M-2\M^E\M-
> m})\^F\M-4\0\^C\M-%\^_1\M-f\M^\\M-j\M-_\M-\4e\M-o\M^Bl\M-Mb\M-_'{M\
> < *** 2878 lines of this kind of data are cut out... I assume this is
>   the encrptoed data...

Yup. 131K seems to be an awful lot but it could certainly be attributed
to the encryption method you are using. Just as a test, I would probably
try using just simple XOR.

> 22499 nsca     RET   read 131072/0x20000
>   22499 nsca     CALL  close(0x7)
>   22499 nsca     RET   close 0
>   22499 nsca     CALL  sendto(0x6,0xbffff380,0x84,0,0,0)
>   22499 nsca     GIO   fd 6 wrote 132 bytes
>         "\0\M-(\M-GK\M-)\^X<de\M-,\M-j\M^X',D\M-&[
> \M-jU\^Vo\M-q\M-&;\M-"uE\^T\M^A\M^M\M-OO<\M-.\M-$)\M-H\M-t\M-S<\M-
> i\M^]\M-DA](l].\M-Uc0_\
> < *** 3 lines of this.>
> 
> \M-[\^C*\M-,~\M-F\M-=\M-Et\^_5\M-L\^P\M^R\M-&t_\^C\^R"\M^L\M
> -03@\^Pi\^Q"

Communication back to the client.

>   22499 nsca     RET   sendto 132/0x84
>   22499 nsca     CALL  select(0x7,0xbffff280,0xbffff300,0xbffff380,0)
>   22499 nsca     RET   select 1
>   22499 nsca     CALL  recvfrom(0x6,0xbfffee70,0x2d0,0,0,0)
>   22499 nsca     GIO   fd 6 wrote 0 bytes
>         ""
>   22499 nsca     RET   recvfrom 0
>   22499 nsca     CALL  sendto(0x5,0xbfffe170,0x35,0,0,0)
>   22499 nsca     GIO   fd 5 wrote 53 bytes
>         "<27>Jan 22 16:21:37 nsca[22499]: End of connection..."

This is actually saying -- "End of connection or could not read request
from client..." If you can increase the number of characters that ktrace
displays you should see that (it's -s 512 for strace to display 512
characters for example).

This is really a symptom of whatever the problem is, but it appears that
it might be strictly a communication problem. Looking at the code, the
pertinent section is:

       /* process all data we get from the client... */

        /* read the packet from the client */
        bytes_to_recv=sizeof(receive_packet);
        rc=recvall(sock,(char
*)&receive_packet,&bytes_to_recv,socket_timeout);

        /* recv() error or client disconnect */
        if(rc<=0){
                if(debug==TRUE)
                        syslog(LOG_ERR,"End of connection or could not
read request from client...");
                encrypt_cleanup(decryption_method, CI);
                close(sock);
                if (mode==SINGLE_PROCESS_DAEMON)
                        return;
                else
                        do_exit();
                }

This is before it even attempts to decrypt the packet. The recvall
function, which is really where the failure is, looks like it has a
couple of failure modes. First is if it doesn't get all the data it's
expecting (client disconnect?) and second is if it takes longer than a
specific timeout value which looks to be 10 seconds. If you run
send_nsca by hand, how long does it take to run? Have you tried enabling
debug on your remote server and see what send_nsca has to say? I'd also
verify hard-coding of speed and duplex on both servers, make sure that
you're not seeing any packet loss between the two machines, etc... Don't
forget about trying a different encryption method. XOR on my machines
seems to send less than 4K as opposed to you 131K per submission.

Someone more familiar with the internal workings of nsca might hopefully
have more to contribute and IANAP so my interpretation of the code might
be off.

--
Marc


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list