AW: Strange NRPE / Nagios problem

Thomas.Zimmer at oppenheim.de Thomas.Zimmer at oppenheim.de
Thu Apr 13 20:11:24 CEST 2006


I had this problem like others, too. I paste som archive-mail concerning
this issue: (but none of them helped with my probs or i didn´t try the
solutions. the problem cleared itself, don´t ask my why), You´ll find more
searching the archives for "timeout", good luck :)
 
Andreas Ericsson wrote:
PEYRE Julien wrote:

> Hello everybody,

> 

> I'm trying to use Nagios in order to survey our databases with custom 

> plug-in. On Nagios browser, if I choose a host and I launch "Schedule 

> an immediate check of all services on this host", I have all status 

> for all services that take value " CHECK_NRPE: Socket Timeout after 10 

> seconds".

> 

> If I launch an immediate check service by service (one by one), it's 

> OK, it functions.

> 

> Any idea would be welcome !

> 

You're most likely flooding the socket receive buffers in the kernel. 

What systems are you seeing this on and how many checks are there to 

run? Most systems have an accept(2) queue size of five, so above that 

and you're in uncharted territory unless you fiddle with the 

receive-buffers directly through fcntl(2) in which case it should be 

possible to set it to some quite large value (see check_icmp.c on how to 

do this).

 
 
Another one:
Thomas.Zimmer at oppenheim.de wrote:

> Hi Andreas,

> Many thanks for the solution of the timout-prob. Do you think the 

> modification the socket receive buffers could cause undesireble 

> consequences for the system nagios is running on?

 

The program enhancing the buffers will ofcourse consume more memory. On 

some systems this comes from the kernel's pre-allocated chunks which it 

is either expensive or impossible to grow. Since it's only one program 

and one socket though it shouldn't make much difference.

 

> Any security-related issues?

 

Not with a sane implementation which most systems have these days. 

Ancient True64 had some problems, as did HPUX and UniCOS. To my 

knowledge this has been fixed though (except possibly UniCOS which I 

doubt you're running).

 
And this one 

Hello everybody, 

I'm trying to use Nagios in order to survey our databases with custom
plug-in. 
On Nagios browser, if I choose a host and I launch "Schedule an immediate
check of all services on this host", I have all status for all services that
take value " CHECK_NRPE: Socket Timeout after 10 seconds".

If I launch an immediate check service by service (one by one), it's OK, it
functions. 

Any idea would be welcome ! 

Thanks in advance, 
Julien. 

Thomas Zimmer 
Produktservice & Betrieb 
Betrieb & Support 
Sal. Oppenheim jr. & Cie., Frankfurt a. Main 
Telefon: +49 69 7134 5192 
Internet: http://www.oppenheim.de <http://www.oppenheim.de/>  
E-Mail: thomas.zimmer at oppenheim.de 

-----Ursprüngliche Nachricht-----
Von: nagios-users-admin at lists.sourceforge.net
[mailto:nagios-users-admin at lists.sourceforge.net] Im Auftrag von Larry
Ludlow
Gesendet: Donnerstag, 13. April 2006 19:54
An: nagios-users at lists.sourceforge.net
Betreff: Re: [Nagios-users] Strange NRPE / Nagios problem


Here are some of my configs...

comands.cfg
define command{
command_name    check_sun_disk1
command_line    $USER1$/check_nrpe -n -H $HOSTADDRESS$ -t 30 -c check_disk1
}

nrpe.cfg
command[check_disk1]=/export/nagios/libexec/check_disk -w 20 -c 10 -p
/dev/vx/dsk/rootvol 

service for this particular host
define service {
        use check_sun_disk1
        service_description /
        check_command check_sun_disk1
        host_name ########
        servicegroups Sun 
        contact_groups LAdmins
}

I can run this command manually. When nagios performs the check I get a
socket time out...

I am getting very frustrated... I have used nagios for a few years now.. and
this is the 1st time I have ran into this problem... 

there are no firewalls, iptables, filtering or anyhting running on these
boxes yet....





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060413/5b6efac6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Zimmer, Thomas (Produktservices und Betrieb).vcf
Type: application/octet-stream
Size: 254 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060413/5b6efac6/attachment.obj>


More information about the Users mailing list