Nagios-users digest, Vol 1 #838 - 3 msgs

ABostick at mydoconline.com ABostick at mydoconline.com
Mon Sep 9 18:25:07 CEST 2002


jc,

I wrote the check_log2 plugin precisely to overcome the timeout issues you
are experiencing.  I figured using seek was faster, etc, etc...

However, even after using it, I still get timeouts on occasion.  I have not
been able to resolve this issue.  I believe it is a Solaris 8 issue.

I am running Solaris 8 on my clients (where check_log2 runs) and Mandrake
8.2 on my server (where nagios runs).  

My work around, which is really ugly, was to recompile the check_nrpe plugin
and change the timeout status to OK from CRITICAL.  Ultimately, this isn't
hurting me because I do enough straight network checks to know if something
is really timing out or not.

I also raised the timeout values to 30 seconds on both sides.  The
check_nrpe plugin allows you to do this on the command line.  The nrpe
daemon has no command line feature for this.  It is hardcoded in source and
you have to change the .h file and recompile.  Moving to 30 seconds
significantly reduced the number of timeouts I received but did not
completely eliminate them.  I still get maybe 1 or 2 a day per host.  They
seem to come at completely random times as well.

I had the same problems with the regular check_log which I implied above.  I
am using the Sun freeware perl package.  Maybe the problem lies in the perl
interpreter or maybe in Solaris 8 file access?  I actually tried running
some truss's on the problem but since it is sporadic it is hard to
troubleshot.  I needed a test case where you could reproduce the problem
each and every time and I never found one of those.

I have been running in this fashion for a few months now.  If you can figure
out why the timeouts occur, please let me know!

BTW, in addition to seeking in the log file, check_log2 also allows for
negation regular expressions to help filter through a lot of the junk you
get in the logs.  For instance, I will check for '\.notice' in the logs but
may want to filter out all the notices from sshd with a -n 'sshd', etc,
etc...

Good luck!

Hope this helps,
Aaron


> Message: 3
> From: "Carroll, Jim P" <jcarro10 at sprintspectrum.com>
> To: nagios-users at lists.sourceforge.net
> Date: Fri, 6 Sep 2002 11:04:16 -0500 
> Subject: [Nagios-users] NRPE and check_log2
> 
> I'm currently testing the check_log2 plugin (Perl script, in 
> the contrib
> directory) on 2 test hosts, using NRPE.  (My reason for using 
> check_log2
> instead of check_log is because of the fact that check_log2 
> doesn't need to
> copy the entire logfile, that it uses seek and tell.)
> 
> The problem I'm having is that occasionally I'm getting 
> connection timeouts.
> I suppose I could raise the timeout in the service 
> definition, but I don't
> see why I should need to.
> 
> Why?  Because I'm also running a handful of other NRPE tests 
> on those 2 test
> hosts, and have never once experienced a connection timeout.  
> Therein lies
> the puzzle.
> 
> If it matters, Nagios is running under RH Linux 7.3, and the 
> test hosts
> being monitored via NRPE are both running Solaris8.
> 
> Thoughts?  Suggestions for troubleshooting this?
> 
> jc


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390




More information about the Users mailing list