Nagios centralized server BUG???

Burnson, Richard rburnson at cps.k12.il.us
Sat Jan 4 00:10:22 CET 2003


I tried running Nagios on RH 8.0 as well.  (Part of my plan to setup a
distributed system, see previous e-mail)   I left the existing server
running RH 7.2, and on a duplicate machine with the exact same hardware I
installed RH 8.0.  (Dual 1 Ghz processes and 1 GB RAM, in the same model
server)  I installed nagios and moved the configs over from the 7.2 box.
While the 7.2 box has run w/o a hitch for 1.5 years, the RH 8.0 box would
run out of memory and the kernel would kill the nagios process(es).  So I
blew away Rh 8.0 and installed 7.2 on the box, and was able to run the
Nagios setup the same as the original.  Not sure what gives, but it seems
like 8.0 has some bugs in it that red Hat needs to still work out.  So my
recommendation is to run it on 7.2 or 7.3 until 8.x is stable.
 
Richard
 
-----Original Message-----
From: Gerald Wichmann [mailto:gwichman at zantaz.com] 
Sent: Friday, January 03, 2003 4:46 PM
To: Nagios (E-mail)
Subject: [Nagios-users] Nagios centralized server BUG???
 
Well I'm about to give up and install this central server on another box.
Running it on RH8 and it's driving me nuts. I have 1 central server
accepting only passive service checks. Also 2 distributed servers which
submit passive checks to the centralized server's nsca daemon. Watching
/var/log/messages I can clearly see all the EXTERNAL COMMANDS being
submitted exactly as I'd expect them to. All services are reporting and
showing up OK including Ping. Yet when I look at "host detail" or "service
detail" something doesn't mesh.. Either there's a bug in nagios or I
seriously have something wacky going on here..
 
Despite the fact that all services report ok, under "host details" I have a
variety of servers showing up as RED/DOWN.. Last Check is recent. Status
Information is always "CRITICAL - Plugin timed out after 10 seconds". Status
is either UNREACHABLE (most of them), or DOWN (1 of them).
 
Ok so I click on "service details".. over there all services report "OK" and
green. For some odd reason the Ping services are old in the last checked
column. Like 7 hours.. Even though I can watch /var/log/messages and see
that I'm receiving PING updates as OK regularly.. The other services mostly
have recent updates but there are a lot of them that are 1,2, and even 3
hours out of date. Why is my services detail page so out of date?
 
Someone points out that I may have multiple nagios servers running on the
machine and well yes that's partially true. Initially when I start nagios it
spawns one nagios -d process but soon they start to multiply. Long term I
have seen them climb up to 4000 which seems excessive to me. Far as I can
tell they don't reduce in numbers nor do they seem to go much higher then
4000. We're running netsaint in a much larger distributed environment here
checking hundreds and hundreds of services and it also spawns multiple
netsaint processes.. but not as many.. seems to top out usually around 500..
so as far as I can tell this behavior of multiple processes is normal.
 
So what the hell is going on here? Does anyone out there run a distributed
environment with a centralized server?
 
Gerald Wichmann
Senior Systems Development Engineer
Zantaz, Inc.
925.598.3099 (w)
 


This e-mail has been captured and archived by the ZANTAZ Digital Safe(tm)
service. For more information, visit us at www.zantaz.com. 
IMPORTANT: This electronic mail message is intended only for the use of the
individual or entity to which it is addressed and may contain information
that is privileged, confidential or exempt from disclosure under applicable
law. If the reader of this message is not the intended recipient, or the
employee or agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
telephone or directly reply to the original message(s) sent. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20030103/b2fbc89a/attachment.html>


More information about the Users mailing list