NRPE oddities

Mike Emigh maemigh at gmail.com
Fri Oct 13 16:34:56 CEST 2006


On 10/13/06, rob.moss at uk.bnpparibas.com <rob.moss at uk.bnpparibas.com> wrote:
>
> nagios-users-bounces at lists.sourceforge.net wrote on
> 12/10/2006 21:27:59:
>
>
>  > I'm running NRPE 2.5.1 on a number of machines and occasionally I'll
>  > see the processes get stuck just spinning their wheels.  It ends
>  > utilizing the CPU heavily and needs to be killed.  I haven't been able
>  > to track down what is causing this, so I wanted to ask if anyone on
>  > this list has seen this behavior before.  Also, I've seen that NRPE
>  > 2.5.2 fixes "a number of bugs" but do not know if it addresses this
>  > specifically.  I've included output from top and truss below.
>  >
>  > top:
>  >   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
>  >  20950 nagios     1  20    0 3792K 1688K run     78.6H 49.58% nrpe
>  >  20675 nagios     1  20    0 3792K 1720K run     79.0H 48.62% nrpe
>  >
>  > truss:
>  > getpid()                                        = 20675 [1]
>  > time()                                          = 1160684094
>  > getpid()                                        = 20675 [1]
>  > read(8, 0x00050FF8, 5)                          Err#11 EAGAIN
>  > getpid()                                        = 20675 [1]
>  > time()                                          = 1160684094
>  > getpid()                                        = 20675 [1]
>  > read(8, 0x00050FF8, 5)                          Err#11 EAGAIN
>  > getpid()                                        = 20675 [1]
>  > time()                                          = 1160684094
>  > getpid()                                        = 20675 [1]
>  > read(8, 0x00050FF8, 5)                          Err#11 EAGAIN
>  > getpid()                                        = 20675 [1]
>  > time()                                          = 1160684094
>  > getpid()                                        = 20675 [1]
>  > read(8, 0x00050FF8, 5)                          Err#11 EAGAIN
>  > getpid()                                        = 20675 [1]
>  > time()                                          = 1160684094
>  > getpid()                                        = 20675 [1]
>  > read(8, 0x00050FF8, 5)                          Err#11 EAGAIN
>  > getpid()                                        = 20675 [1]
>
> Hi,
>    I don't have any firm answers for you, but I have seen similar wierdness
> in NRPE on Solaris 8 and 10 when it's compiled with lots of optimisation
> flags:  -O2 -funroll-loops  etc..
>
> My only recommendation would be to recompile the same source without running
> configure again (make clean; vi Makefile; make), but remove all optimisation
> flags and install just the new nrpe binary on one of the affected servers..
>
> If you're still having problems, please go back through the truss and tell
> me where the open() call reads the file/socket which returns filehandle 8
> and paste in some more info
>
> Other things to try: If you're compiling 64bit, try 32bit.. If you have an
> older version of OpenSSL than 0.9.8b, try updating.. If you have an older
> GCC 3.x than 3.3 or 3.4 then try updating..
>
> Cheers
> rob/mossko
>
> This message and any attachments (the "message") is
> intended solely for the addressees and is confidential.
> If you receive this message in error, please delete it and
> immediately notify the sender. Any use not in accord with
> its purpose, any dissemination or disclosure, either whole
> or partial, is prohibited except formal approval. The internet
> can not guarantee the integrity of this message.
> BNP PARIBAS (and its subsidiaries) shall (will) not
> therefore be liable for the message if modified.
>
> **********************************************************************************************
>
> BNP Paribas Private Bank London Branch is authorised
> by CECEI & AMF and is regulated by the Financial Services
> Authority for the conduct of its investment business in
> the United Kingdom.
>
> BNP Paribas Securities Services London Branch is authorised
> by CECEI & AMF and is regulated by the Financial Services
> Authority for the conduct of its investment business in
> the United Kingdom.
>
> BNP Paribas Fund Services UK Limited is authorised and
> regulated by the Financial Services Authority
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
Unfortunately, I won't be able to collect all of the truss data
because I am not able to replicate this situation at will.  It just
seems to happen after NRPE has been running for a long time.  The
output which I was able to include in the previous post was just the
truss for the running process after I noticed it happening.  Thanks
for the suggestions, I'll give them a try.

Mike

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list