timeouts from one machine and not another

Andreas Ericsson ae at op5.se
Wed Jul 7 11:16:48 CEST 2004


David Bishop wrote:
> I have two nagios servers checking basically the same machines (we don't
> need no stinking failover).  However, from one machine (A), I get a lot of
> time-out errors on certain machines (it times out when checking smtp and 
> ftp) and on the other, I don't.  If I try it from the command-line (just 
> telnetting to the ports), it hangs for a long time (long being greater 
> than 10 seconds) but finally connects.  However, connecting to the same 
> machine from B, it's instantaneous.  Normally I'd suspect the network 
> connection between the two machines (client and A), but a reverse 
> connection works very quickly (connecting to A's smtp port), and they 
> are both on underutilized 1.5Mb lines.

A client of ours had this problem because of an overloaded router that 
was supposed to send a NEXT_HOP but couldn't always manage the load. 
When the traffic went the other way, the overloaded router was never in 
the picture and connections worked beautifully every time (that had us 
scratching our heads for quite some time).

Another thing that can cause this is NIC-setting autodetection. Some 
not-so-nice switches and OS's try to renegotiate ethernet settings 
(duplex and speed). This normally causes the interface on both switch 
and server to go dormant until the negotiation is complete.

> Ping time between them (either 
> way) averages slightly over 100ms. The only real difference that I can
> think of between A and B is that A is running FreeBSD 5.2.1 and B is
> Debian/Sid.  The clients are all also running Debian (if that matters).
> 

It shouldn't really. FreeBSD has a legacy of perfectly working IP stack, 
so the problem is most likely in the network.

> Help, please :-(
> 

Things to check for;
Are the timeouts happening only during certain hours, and in that case 
when, and what else is happening during those hours (backups, people at 
work, lots of web-/ftp-server hits)?

Things to try;
Switch places and Nagios-configuration on the two machines. If A still 
fumbles you know the problem resides on the server. If B fumbles, it's 
in the network.


> D.A.Bishop
> 

-- 
Sourcerrer / Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list