check_nrpe socket time

Carroll, Jim P [Contractor] jcarro10 at sprintspectrum.com
Thu Dec 5 22:51:08 CET 2002


If the NRPE problems were consistent on any given host, no matter what other
NRPE checks (to other hosts) I was adding in, then I would certainly be
suspect of said host.  But that's not the case; the timeouts were seemingly
random across all hosts.  And the Nagios host itself was starting to bog
down horribly.  I could ping it from elsewhere, but couldn't login to it via
ssh.  (I admit that I didn't try to login to the console.)  Previously open
ssh sessions closed as well (timeouts).  It seemed capable of kicking out
occasional e-mails (notifications), but that's it.  Even Apache had stopped
responding.

Some history:

- initial install of Nagios on old PC of questionable power
- created initial checks
- began implementation of NRPE as standalone daemon on several Solaris hosts
- added NRPE checks
- started to notice NRPE timeouts
- switched from standalone to inetd
- NRPE timeouts seemed to persist
- stopped rollout of NRPE
- upgraded Nagios server
- added hosts, added service checks, added NRPE checks
- at 800+ total service checks (including NRPE), everything is well
- added NRPE to several Linux hosts (via xinetd) one afternoon
- total service checks are now 1100+
- started to experience NRPE timeouts

... and the rest I've already mentioned.

jc

> -----Original Message-----
> From: Ethan Galstad [mailto:nagios at nagios.org]
> Sent: Wednesday, December 04, 2002 8:26 PM
> To: nagios-users
> Subject: RE: [Nagios-users] check_nrpe socket time
> 
> 
> Whoops - sent the original reply to the devel list on accident.
> 
> ------- Forwarded message follows -------
> 
> How is nrpe being run on the remote host?  Via inetd, xinetd, or as a 
> standalone daemon.  If under xinetd, it could be that you've hit some 
> kind of limit based on your xinetd config (per_source and max_load 
> directives come to mind) - check the man pages for xinetd.conf(5) for 
> more info.
> 
> The Nagios host may be causing excessive load (CPU/MEM/SWAP) because 
> several child processes are waiting for the check_nrpe plugin to 
> finish before they can exit.  Sounds like the nrpe daemon might be 
> backlogged on connections, which might point to some tweaking needed 
> on the remote host side.
>   
> 
> On 4 Dec 2002 at 13:24, Carroll, Jim P [Contractor] wrote:
> 
> > Yes!  I made this observation/complaint on the list a while 
> back, back when
> > I had nagios installed on an underpowered old PC.  Nobody 
> had a comment to
> > make.
> > 
> > Since then (much more recently), I've had it happen again.  
> This was when I
> > added quite a few NRPE checks across several Linux boxen, 
> bumping my total
> > service checks from 800+ to 1100+.
> > 
> > Here are things I've done since that time:
> > 
> > - posted a question to the list:  scalability of NRPE vs. NSCA
> > - set max_concurrent_checks to 200
> > - split the software disk mirror (I/O was getting hammered)
> > - increased swap (from 50% of RAM to 200% of RAM)
> > - set max_concurrent_checks to 0
> > - noticed that while NRPE checks didn't fail, system would 
> occasionally be
> > very slow
> > - set max_concurrent_checks to 400
> > 
> > I still haven't had any response to my 
> scalability/NRPE/NSCA query on this
> > list.  I haven't ruled out NSCA as possibly a better way to 
> go.  It just
> > means cobbling some scripts together.  If I knew for 
> certain that the NSCA
> > approach is an order of magnitude more scalable than using 
> NRPE, I'd jump on
> > it in a heartbeat.
> > 
> > BTW, the docs have suggestions for improving overall 
> performance.  The one
> > suggestion which stuck in my mind was to get 
> /usr/local/nagios/var onto a
> > ramdisk.  I don't think that would help in my situation, 
> but I do have the
> > option of putting it over on another spindle (the former mirror).
> > 
> > Let me know if any of my observations/suggestions help you out.
> > 
> > jc
> > 
> > > -----Original Message-----
> > > From: Kaplan, Andrew H. [mailto:AHKAPLAN at PARTNERS.ORG]
> > > Sent: Wednesday, December 04, 2002 11:16 AM
> > > To: nagios-users at lists.sourceforge.net
> > > Subject: [Nagios-users] check_nrpe socket timeout
> > > 
> > > 
> > > I've been periodically encountering check_nrpe socket timeout 
> > > errors for
> > > some time. Further checks into the status
> > > of the systems where this is occurring does not show any 
> > > apparent problems.
> > > Has anyone had similar occurrences?
> > > 
> 
> ------- End of forwarded message -------
> 
> Ethan Galstad,
> Nagios Developer
> ---
> Email: nagios at nagios.org
> Website: http://www.nagios.org
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Microsoft Visual Studio.NET 
> comprehensive development tool, built to increase your 
> productivity. Try a free online hosted session at:
> http://ads.sourceforge.net/cgi-bin/redirect.pl?micr0003en
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf




More information about the Users mailing list