Database stores and distributed monitoring

Andreas Ericsson ae at op5.se
Thu Jul 15 10:48:48 CEST 2004


Ben wrote:
> 
> On Jul 12, 2004, at 5:10 PM, Andreas Ericsson wrote:
> 
>>
>> You could try applying the chained hash patch. It makes a great 
>> speedup. It's in by default in 2.0, which is stable enough for us to 
>> run in production.
> 
> 
> Can I get just this patch and apply it to the stable 1.2 code base, or 
> should I just run 2.0? I'd prefer the former, but I don't know where to 
> get the patch?
> 

Sorry, I don't have it.

>>> And
>>> it did, but not enough. So now, even though I'm using a quad xeon 
>>> with 4GB
>>> of ram for the nagios machine (the database lives elsewhere), now I'm
>>> thinking that maybe I should set up a distributed monitoring scheme too.
>>
>>
>> Move the database to reside on the same server as Nagios runs on. 
>> Unless you're using embedded perl (which is really buggy) it shouldn't 
>> be a problem at all.
> 
> 
> Well, that server also does MRTG monitoring, so every 5 minutes it gets 
> pretty slammed.
> 

Skip MRTG in favour of cacti (http://raxnet.net), implement Ben Clewetts 
perf-parse and the cactid poller daemon, and create dataqueries that 
fetch data from the perfparse database.

>>> But I'm a bit confused. If the distributed monitoring software can 
>>> write data into a database, and the CGIs read data out of the 
>>> database.... do I need to use the nsca tool to do passive service 
>>> checks?
>>
>>
>> Not necessarily, but it's the only way to make configuration sit on 
>> only one host. That won't do any good with regards to the CGI's 
>> though, since it still has to parse the same amount of information.
> 
> 
> Yeah, my hope was that most of the time the CGIs were spending was with 
> file IO, and that putting that data in a database would open up room for 
> Mo' Bettah(tm)(r)(c)($) queries that would reduce the time spent data 
> groveling. But I guess this isn't as much the case as I had hoped.
> 
>> On the other hand; I can't imagine any one person being responsible 
>> for all those 2500 servers, so it might be prudent to have several 
>> separate installations of Nagios. Large networks will always be a bit 
>> sluggish to display in the webinterface because of the sheer amount of 
>> data it needs to read every time. The price large companies pay for 
>> success, I guess.
> 
> 
> We're not personally responsible for rebooting them all, but we are the 
> one team responsible for making sure they're still working, and telling 
> our outsourced server monkeys to go reboot them if they have issues.
> 
> I'll try the patch and see how it goes, but it seems to me, knowing 
> squat about the internals of the CGIs :) that things should still be 
> pretty snappy with this many hosts, at least until you start to load a 
> page containing all the hosts. What are the known resource-intensive 
> times, and if I want to start dinking around with code and/or database 
> optimizations, should I work from the 1.2 or the 2.0 branch? How stable 
> is the head branch, anyway?
> 
> 

The main thing about 1.2 without the chained hash patch is that for 
every new object it finds, it does a strcmp on every other object of the 
same type. It also sorts everything alphabetically this way, so it adds 
up to a lot of pointer-shifting and expensive bytecomparison operations 
(gcc's optimizations also fail to work properly due to common 
naming-standards). The chained hash-patch makes those strcmps 
unnecessary since the hash it produces is supposed to be unique. With 
2500 hosts (depending on how they're sorted) it would have to make 
anywhere from zero to 1+2+3+4+5+6+7+8+9...+2499 strcmps just for the 
hosts, each followed by some pointershifting.

Head branch is very stable. We use it in production and plan on shipping 
it to our customers as of october (it's been in beta testing at various 
networks for 3 months already with no problems so far).
2.0 also has another CGI speedup option. At startup, all the objects are 
cached in one file, where they are presorted and pre-expanded, so the 
cgi's doesn't have to do any of that. If you keep that file on a 
RAM-disk and write an eventbroker to log things to database (cgi's needs 
to be modified as well), you'd have up'ed GUI performance somewhere 
around 60 or 70 times.

-- 
Sourcerer / Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list