Nagios 3 Performance Monitoring

Andreas Ericsson ae at op5.se
Fri Oct 26 17:49:07 CEST 2007


Hendrik Bäcker wrote:
> Hi List,
> 
> ### Now the complete Mail ###
> 
> since a few days I was testing some performance issues with Nagios 3
> (current CVS Version).
> 
> For nicer graphing I've written a small & dirty Perl script to parse
> some relevant data from the nagiostats binary.
> 
> Output of the plugin is:
> 
> 1. STDOUT: OK - output | perfdata
> 2. (optional) Output + Performancedata printed directly the the external
> command pipe of Nagios.
> 
> I am running a relativ huge installation with up to 5 instances (for
> load balancing) on one hardwareserver (yes - that works).
> 
> Some Backgrounddata:
> 
> Instance 1: 371 / 2156 (Hosts/Services)
> Instance 2: 206 / 1405 (Hosts/Services)
> Instance 3: 381 / 3147 (Hosts/Services)
> Instance 4:   3 /   54 (Hosts/Services)
> Instance 5: 299 / 3233 (Hosts/Services)
> 
> I have enabled the "use_large_installation_tweaks" feature for all
> instance and was realy happy to see that I have _no_ latency at all.
> 
> But after 7-9 hours running time I see that the host/service check
> throuput went down, the host/servicecheck execution time wents up (x2.5)
> and latency comes up too.
> 

Are you using embedded perl? If so, turn that off.


> After the beginnings of the latency the graph seems to see no end. It
> goes up to 700 seconds for my fifth instance, I guess it will increase
> if I hadn't restartet the nagios process.
> 

Run it for just one instance. If you're debugging something, it doesn't
make sense to run it on a resource-starved system.


> 
> I guess the 'performance trouble' seems to be a 'during runtime'
> problem. So I am looking for some blowing up tasks in the code, my
> actual guess is the update_check_stats() in base/utils.c which es
> executed on every service check und more than one time for every host
> check i think.
> 
> My idea is, that after a while the data structure for stats reaches a
> amount that will take too much time for update and therefor the
> execution time increases.

In C, data structures are constant size, so it's a bit unclear what you
mean by this.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/




More information about the Developers mailing list