ndo2db problems on solaris 10 (ndoutils 1.4b7)

Ton Voon ton.voon at altinity.com
Wed Feb 27 15:10:40 CET 2008


On 27 Feb 2008, at 13:18, Michael Prochaska wrote:

> truss of ndo2db (the -f option follows all children created by  
> fork()  or
> vfork()):

> root at nagios_1 # truss -f -p 6405
> 6405:   accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT) (sleeping...)
> 6405:   accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT)  = 6
> 6405:   schedctl()                                      = 0xFECA8000
> 6405:   fork1()                                         = 6419
[snipped]


>
> 6405:   lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 6419:   fork1()         (returning as child ...)        = 6405
> 6419:   getpid()                                        = 6419 [6405]
> 6405:   close(6)                                        = 0
> 6419:   lwp_self()                                      = 1
> 6419:   lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 6419:   llseek(3, 0, SEEK_CUR)                          = 0
> 6419:   close(3)                                        = 0
> 6419:   open("/usr/local/nagios/var/ndo2db.debug",
> O_RDWR|O_APPEND|O_CREAT, 0666) = 3
> 6419:   sigaction(SIGQUIT, 0xFFBFED80, 0xFFBFEE20)      = 0
> 6419:   sigaction(SIGTERM, 0xFFBFED80, 0xFFBFEE20)      = 0
> 6419:   sigaction(SIGINT, 0xFFBFED80, 0xFFBFEE20)       = 0
> 6419:   sigaction(SIGSEGV, 0xFFBFED80, 0xFFBFEE20)      = 0
> 6419:   sigaction(SIGFPE, 0xFFBFED80, 0xFFBFEE20)       = 0
> 6419:   open("/etc/netconfig", O_RDONLY|O_LARGEFILE)    = 7
> 6419:   fcntl(7, F_DUPFD, 0x00000100)                   Err#22 EINVAL
> 6419:   read(7, " # p r a g m a   i d e n".., 1024)     = 1024
> 6419:   read(7, " t s           t p i _ c".., 1024)     = 215
> 6419:   read(7, 0x000400E0, 1024)                       = 0
> 6419:   lseek(7, 0, SEEK_SET)                           = 0
> 6419:   read(7, " # p r a g m a   i d e n".., 1024)     = 1024
> 6419:   read(7, " t s           t p i _ c".., 1024)     = 215
> 6419:   read(7, 0x000400E0, 1024)                       = 0
> 6419:   close(7)                                        = 0
> 6419:   open("/dev/udp", O_RDONLY)                      = 7
> 6419:   ioctl(7, SIOCGLIFNUM, 0xFFBFEBD4)               = 0
> 6419:   close(7)                                        = 0
> 6419:   getuid()                                        = 100 [100]
> 6419:   getuid()                                        = 100 [100]
> 6419:   door_info(4, 0xFFBFE8E0)                        = 0
> 6419:   door_call(4, 0xFFBFE988)                        = 0
> 6419:   sigaction(SIGPIPE, 0xFFBFEC40, 0xFFBFECE0)      = 0
> 6419:   so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT)  
> = 7
> 6419:   brk(0x00041AF8)                                 = 0
> 6419:   brk(0x00045AF8)                                 = 0
> 6419:   fcntl(7, F_SETFL, (no flags))                   = 0
> 6419:   fcntl(7, F_GETFL)                               = 2
> 6419:   connect(7, 0xFFBFED20, 16, SOV_DEFAULT)         = 0
> 6419:   setsockopt(7, SOL_SOCKET, SO_RCVTIMEO, 0xFFBFE1B8, 8,  
> SOV_DEFAULT)
> Err#99 ENOPROTOOPT
> 6419:   setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, 0xFFBFE1B8, 8,  
> SOV_DEFAULT)
> Err#99 ENOPROTOOPT
> 6419:   brk(0x00045AF8)                                 = 0
> 6419:   brk(0x00047AF8)                                 = 0
> 6419:   setsockopt(7, ip, 3, 0xFFBFE29C, 4, SOV_DEFAULT) = 0
> 6419:   setsockopt(7, tcp, TCP_NODELAY, 0xFFBFE298, 4, SOV_DEFAULT)  
> = 0
> 6419:   setsockopt(7, SOL_SOCKET, SO_KEEPALIVE, 0xFFBFE30C, 4,
> SOV_DEFAULT) = 0
> 6419:   read(7, " 4\0\0\0\n 5 . 0 . 5 1\0".., 16384)    = 56
> 6419:   brk(0x00047AF8)                                 = 0
> 6419:   brk(0x00049AF8)                                 = 0
> 6419:   brk(0x00049AF8)                                 = 0
> 6419:   brk(0x0004BAF8)                                 = 0
> 6419:   stat64("/usr/local/mysql/share/mysql/charsets/Index.xml",
> 0xFFBFDB08) = 0
> 6419:   brk(0x0004BAF8)                                 = 0
> 6419:   brk(0x0004FAF8)                                 = 0
> 6419:   open64("/usr/local/mysql/share/mysql/charsets/Index.xml",
> O_RDONLY) = 8
> 6419:   read(8, " < ? x m l   v e r s i o".., 18173)    = 18173
> 6419:   close(8)                                        = 0
> 6419:   brk(0x0004FAF8)                                 = 0
> 6419:   brk(0x00051AF8)                                 = 0
> 6419:   brk(0x00051AF8)                                 = 0
> 6419:   brk(0x00053AF8)                                 = 0
> 6419:   write(7, " C\0\001\rA2\0\0\0\0\0 @".., 71)      = 71
> 6419:   read(7, " W\0\002FF1504 # 2 8 0 0".., 16384)    = 91
> 6419:   shutdown(7, SHUT_RDWR, SOV_DEFAULT)             = 0
> 6419:   close(7)                                        = 0
> 6419:   getpid()                                        = 6419 [6405]
> 6419:   open("/proc/6419/psinfo", O_RDONLY)             = 7
> 6419:   read(7, "02\0\0\0\0\0\001\0\01913".., 336)      = 336
> 6419:   close(7)                                        = 0
> 6419:   fstat(-1, 0xFFBFE140)                           Err#9 EBADF
> 6419:   open("/dev/conslog", O_WRONLY)                  = 7
> 6419:   fcntl(7, F_SETFD, 0x00000001)                   = 0
> 6419:   fstat(7, 0xFFBFE140)                            = 0
> 6419:   fstat(7, 0xFFBFEBA0)                            = 0
> 6419:   time()                                          = 1204118219
> 6419:   open("/usr/share/lib/zoneinfo/Europe/Vienna", O_RDONLY) = 8
> 6419:   fstat64(8, 0xFFBFDFD0)                          = 0
> 6419:   read(8, " T Z i f\0\0\0\0\0\0\0\0".., 801)      = 801
> 6419:   close(8)                                        = 0
> 6419:   getpid()                                        = 6419 [6405]
> 6419:   putmsg(7, 0xFFBFE258, 0xFFBFE24C, 0)            = 0
> 6419:   open("/var/run/syslog_door", O_RDONLY)          = 8
> 6419:   door_info(8, 0xFFBFE190)                        = 0
> 6419:   getpid()                                        = 6419 [6405]
> 6419:   door_call(8, 0xFFBFE178)                        = 0
> 6419:   close(8)                                        = 0
> 6419:   read(6, "\n\n H E L L O\n P R O T".., 511)      = 511
> 6419:       Incurred fault #6, FLTBOUNDS  %pc = 0xFF20738C
> 6419:         siginfo: SIGSEGV SEGV_MAPERR addr=0x44415441
> 6419:       Received signal #11, SIGSEGV [caught]
> 6419:         siginfo: SIGSEGV SEGV_MAPERR addr=0x44415441
> 6419:   schedctl()                                      = 0xFEC9E000
> 6419:   lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 6419:   _exit(0)
> 6405:       Received signal #18, SIGCLD, in accept() [caught]
> 6405:         siginfo: SIGCLD CLD_EXITED pid=6419 status=0x0000
> 6405:   accept(5, 0xFFBFF554, 0xFFBFF564, SOV_DEFAULT)  Err#4 EINTR
> 6405:   lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF
> [0x0000FFFF]
> 6405:   waitid(P_ALL, 0, 0xFFBFE968, WEXITED|WTRAPPED|WNOHANG) = 0
> 6405:   setcontext(0xFFBFE8E8)
> 6405:   write(2, " A c c e p t   e r r o r", 12)        = 12
> 6405:   write(2, " :  ", 2)                             = 2
> 6405:   write(2, " I n t e r r u p t e d  ".., 23)      = 23
> 6405:   write(2, "\n", 1)                               = 1
> 6405:   shutdown(5, SHUT_RDWR, SOV_DEFAULT)             Err#134  
> ENOTCONN
> 6405:   close(5)                                        = 0
> 6405:   _exit(1)
>
> is this a general bug oder has anybody ndoutils running on solaris?

Funny you should mention this as we just found a fix for Solaris for  
ndoutils 1.4b3. Note that in the accept call 11 lines up from the  
bottom there is an EINTR error from accept. We've patched the call  
around the accept so that an EINTR causes a retry and this appears to  
work around the problem. See the patch attached. My guess is that this  
occurs because the signal is received at the same time that the parent  
gets a result on accept, so accept returns with this error rather than  
handling the child signal first.

However, I notice that you have a SIGSEGV from the child process 6419.  
Our ndoutils (at the older 1.4b3) doesn't give this error. So there  
maybe other problems with 1.4b7 on Solaris that also need fixing?

Ton

http://www.altinity.com
UK: +44 (0)870 787 9243
US: +1 866 879 9184
Fax: +44 (0)845 280 1725
Skype: tonvoon

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ndoutils_solaris_eintr_in_accept.patch.txt
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20080227/41bbcb67/attachment.txt>
-------------- next part --------------


-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list