Nagios centralized server BUG???

Russell Scibetti russell at quadrix.com
Wed Jan 15 21:05:23 CET 2003


Something that could be worth testing.

What is your value of service_reaper_frequency in nagios.cfg?  It 
defaults to 10, which means that nagios reads its internal plugin 
communications pipe every 10 seconds.  Changing it to 5 helped me a lot, 
but I was running almost exclusively active checks.  I think that 
setting, and they setting for how often Nagios reads the command pipe 
are the two things you're going to want to play with the most.

Russell Scibetti

Steven L. Kohrs wrote:

>What's the status on this problem?  I'm experiencing the same thing on
>RH 6.2.  I've read the tuning docs, but I can't find anything about
>handling this.  Is it a parallelization issue?  I set
>max_concurrent_checks=100.  After restarting Nagios 50 minutes ago, I've
>got 208 processes.  I had a similar problem on a remote host, but
>setting max_concurrent_checks=5 took care of it.
>
>I've got one central server performing 30 active checks and accepting
>about 900 passive checks from 30 remote servers via NSCA.  
>
>I believe the problem, which Gerald described below, is a result of too
>many processes running.  I can't hardly perform a 'ps' command when
>Nagios runs away.  How can it update a service status?
>
>Thanks,
>
>Steve Kohrs
>
>From: Burnson, Richard <rburnson at cp...>
>RE: Nagios centralized server BUG???  
>2003-01-03 15:10
> 
>I tried running Nagios on RH 8.0 as well.  (Part of my plan to setup a
>distributed system, see previous e-mail)   I left the existing server
>running RH 7.2, and on a duplicate machine with the exact same hardware
>I installed RH 8.0.  (Dual 1 Ghz processes and 1 GB RAM, in the same
>model server)  I installed nagios and moved the configs over from the
>7.2 box. While the 7.2 box has run w/o a hitch for 1.5 years, the RH 8.0
>box would run out of memory and the kernel would kill the nagios
>process(es).  So I blew away Rh 8.0 and installed 7.2 on the box, and
>was able to run the Nagios setup the same as the original.  Not sure
>what gives, but it seems like 8.0 has some bugs in it that red Hat needs
>to still work out.  So my recommendation is to run it on 7.2 or 7.3
>until 8.x is stable.
>  
>Richard
>  
>-----Original Message-----
>From: Gerald Wichmann [mailto:gwichman at za...] 
>Sent: Friday, January 03, 2003 4:46 PM
>To: Nagios (E-mail)
>Subject: [Nagios-users] Nagios centralized server BUG???
>  
>Well I'm about to give up and install this central server on another
>box. Running it on RH8 and it's driving me nuts. I have 1 central server
>accepting only passive service checks. Also 2 distributed servers which
>submit passive checks to the centralized server's nsca daemon. Watching
>/var/log/messages I can clearly see all the EXTERNAL COMMANDS being
>submitted exactly as I'd expect them to. All services are reporting and
>showing up OK including Ping. Yet when I look at "host detail" or
>"service detail" something doesn't mesh.. Either there's a bug in nagios
>or I seriously have something wacky going on here..
>  
>Despite the fact that all services report ok, under "host details" I
>have a variety of servers showing up as RED/DOWN.. Last Check is recent.
>Status Information is always "CRITICAL - Plugin timed out after 10
>seconds". Status is either UNREACHABLE (most of them), or DOWN (1 of
>them).
>  
>Ok so I click on "service details".. over there all services report "OK"
>and green. For some odd reason the Ping services are old in the last
>checked column. Like 7 hours.. Even though I can watch /var/log/messages
>and see that I'm receiving PING updates as OK regularly.. The other
>services mostly have recent updates but there are a lot of them that are
>1,2, and even 3 hours out of date. Why is my services detail page so out
>of date?
>  
>Someone points out that I may have multiple nagios servers running on
>the machine and well yes that's partially true. Initially when I start
>nagios it spawns one nagios -d process but soon they start to multiply.
>Long term I have seen them climb up to 4000 which seems excessive to me.
>Far as I can tell they don't reduce in numbers nor do they seem to go
>much higher then 4000. We're running netsaint in a much larger
>distributed environment here checking hundreds and hundreds of services
>and it also spawns multiple netsaint processes.. but not as many.. seems
>to top out usually around 500.. so as far as I can tell this behavior of
>multiple processes is normal.
>  
>So what the hell is going on here? Does anyone out there run a
>distributed environment with a centralized server?
>  
>Gerald Wichmann
>Senior Systems Development Engineer
>Zantaz, Inc.
>925.598.3099 (w)
>
>
>
>
>
>-------------------------------------------------------
>This SF.NET email is sponsored by: Take your first step towards giving 
>your online business a competitive advantage. Test-drive a Thawte SSL 
>certificate - our easy online guide will show you how. Click here to get 
>started: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0027en
>_______________________________________________
>Nagios-users mailing list
>Nagios-users at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nagios-users
>
>

-- 
Russell Scibetti
Quadrix Solutions, Inc.
http://www.quadrix.com
(732) 235-2335, ext. 7038





-------------------------------------------------------
This SF.NET email is sponsored by: A Thawte Code Signing Certificate 
is essential in establishing user confidence by providing assurance of 
authenticity and code integrity. Download our Free Code Signing guide:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0028en




More information about the Users mailing list