timeouts when using secondary dns

Steve Shipway s.shipway at auckland.ac.nz
Thu Nov 9 23:25:42 CET 2006


We dealt with this by installing a local caching-only nameserver on the
Nagios host itself.  This also took a lot of the load off of the main
nameservers.   So, resolv.conf was set to use 127.0.0.1 by default and
have our normal name servers as secondaries.  A nice sideeffect was that
it vastly sped up the name resolution.
 
Steve
 

--
Steve Shipway
ITSS, University of Auckland
(09) 3737 599 x 86487
s.shipway at auckland.ac.nz



 


________________________________

	From: nagios-users-bounces at lists.sourceforge.net
[mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of stucky
	Sent: Friday, 10 November 2006 6:57 a.m.
	To: Az
	Cc: nagios
	Subject: Re: [Nagios-users] timeouts when using secondary dns
	
	
	Yey !! That totally did it. Thx AZ I hadn't even considered
messing with the resolver cuz I was sure it was a nagios issue so I had
to fix nagios.
	If that wasn't a text book example of how well mailinglists can
work then I don't know what is... 
	
	thx
	
	
	On 11/7/06, Az <az at whoever.org> wrote: 

		stucky wrote:
		> I use the check_by_ssh plugin for most of my stuff and
I noticed that
		> if the primary nameserver is unavailable nagios starts
freaking out.
		> All of a sudden all plugins time out. I tested it
using the 'host' 
		> command and it only takes about 1 second longer to
lookup hosts using
		> the secondary nameserver.
		> The default timeout for check_by_ssh is 10 seconds. I
cranked it up to
		> 30 and still I get timeouts. I'm not sure I understand
that one. 
		> Has anyone else seen this.
		We had a similar issue in that our primary DNS was doing
strange things,
		and it quite often took 5 or even 10 seconds to perform
a DNS lookup.
		What we were seeing was 70% of service checks (and
subsequently host 
		checks) failing by timing out. The key was the multiple
of 5 seconds.
		The resolver timeout on, say, RHEL3 is based on
RES_TIMEOUT in
		resolv.h... which was 5 seconds.
		
		We added the following to our resolv.conf , and found
the problems went away:
		
		    options timeout:2 rotate
		
		This sets the timeout for waiting for a reply to 2
seconds, and tells
		the resolve to rotate through your 'nameserver' entries
rather than
		always hitting #1, then #2, etc.
		
		Cheers.
		
		
		
		
		




	-- 
	stucky 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20061110/1bdb97ff/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list