Database stores and distributed monitoring

Ben bench at silentmedia.com
Tue Jul 13 17:35:36 CEST 2004


On Jul 12, 2004, at 5:10 PM, Andreas Ericsson wrote:
>
> You could try applying the chained hash patch. It makes a great 
> speedup. It's in by default in 2.0, which is stable enough for us to 
> run in production.

Can I get just this patch and apply it to the stable 1.2 code base, or 
should I just run 2.0? I'd prefer the former, but I don't know where to 
get the patch?

>> And
>> it did, but not enough. So now, even though I'm using a quad xeon 
>> with 4GB
>> of ram for the nagios machine (the database lives elsewhere), now I'm
>> thinking that maybe I should set up a distributed monitoring scheme 
>> too.
>
> Move the database to reside on the same server as Nagios runs on. 
> Unless you're using embedded perl (which is really buggy) it shouldn't 
> be a problem at all.

Well, that server also does MRTG monitoring, so every 5 minutes it gets 
pretty slammed.

>> But I'm a bit confused. If the distributed monitoring software can 
>> write data into a database, and the CGIs read data out of the 
>> database.... do I need to use the nsca tool to do passive service 
>> checks?
>
> Not necessarily, but it's the only way to make configuration sit on 
> only one host. That won't do any good with regards to the CGI's 
> though, since it still has to parse the same amount of information.

Yeah, my hope was that most of the time the CGIs were spending was with 
file IO, and that putting that data in a database would open up room 
for Mo' Bettah(tm)(r)(c)($) queries that would reduce the time spent 
data groveling. But I guess this isn't as much the case as I had hoped.

> On the other hand; I can't imagine any one person being responsible 
> for all those 2500 servers, so it might be prudent to have several 
> separate installations of Nagios. Large networks will always be a bit 
> sluggish to display in the webinterface because of the sheer amount of 
> data it needs to read every time. The price large companies pay for 
> success, I guess.

We're not personally responsible for rebooting them all, but we are the 
one team responsible for making sure they're still working, and telling 
our outsourced server monkeys to go reboot them if they have issues.

I'll try the patch and see how it goes, but it seems to me, knowing 
squat about the internals of the CGIs :) that things should still be 
pretty snappy with this many hosts, at least until you start to load a 
page containing all the hosts. What are the known resource-intensive 
times, and if I want to start dinking around with code and/or database 
optimizations, should I work from the 1.2 or the 2.0 branch? How stable 
is the head branch, anyway?



-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list