Nagios 3.0.4 performance issue

Andreas Ericsson ae at op5.se
Thu Nov 20 10:43:33 CET 2008


Alloo, Vincent wrote:
> Andreas,
> Here is an extract of my setup:
> 
> define servicegroup{
> 	servicegroup_name	nrpe_services
> 	alias			NRPE Services
> }
> 
> define servicedependency{
> 	host_name			svxnagios02
> 	service_description		check_uname
> 	dependent_servicegroup_name	nrpe_services
>     notification_failure_criteria	w,u,c
> }
> 
> define service {
> use                            unix_24_7
> host_name                      svxnagios02
> service_description            check_uname
> check_command                  check_nrpe_ssl!uname!0
> notification_options           c,r
> process_perf_data	       0
> }
> 
> And a bunch of:
> define service {
> use                         	unix_24_7
> hostgroup_name              	sol-servers,linux-servers,sol-zone-servers,sol-servers-with_hotspare
> service_description          	CPU load
> check_command                	check_nrpe_ssl!check_load!5,4,3!6,5,4
> servicegroups			nrpe_services
> }
> .....(3600 services within the nrpe_services service group)
> 

Oh. Are you proxying all your NRPE checks through some other system? I
can't imagine why this would be a good idea, but to each his own, I suppose.

With this configuration, each of the 3600 services should each depend on
exactly one other service, so the problem I initially foresaw is not in place.
However, like Sascha mentioned, Nagios instead seems to run that extra check
before any of the other 3600 service checks.

I'll need to run some manual testing on this. Since you've only specified
"notification_failure_criteria", Nagios should be able to avoid checking
the service being depended on until it's trying to send a notification. In
fact, it should probably switch the checking order around so that the service
being depended upon is checked *after* the dependent service. That would
solve your problem until NRPE starts failing. After that, there's no help
for it, but then you should definitely see some service check cache hits
which will at least make the load on the system bearable. I'll try to find
some time to look into this next week at the latest.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list