Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.

Kyle O'Donnell kyleodonnell at gmail.com
Wed Jan 28 08:32:03 CET 2009


I use service deps.  Most of my services are nrpe checks and I create
a dep on nrpe.  If a check comes back critical (or which ever state
you choose to execute the dep) it does an nrpe check,  if nrpe returns
critical (or whichever state you choose) it stops executing the
services dependant on nrpe.

My load is less than 2 on a machine with 800 hosts and 6000 services.

Active host checks are disabled.

As for ping I don't check as a service only a host check which gets
executed if any service turns critical.

You can use check_ssh as the host check command instead of ping if you
prefer as well.


On 1/27/09, Mathieu Gagné <mgagne at iweb.com> wrote:
> Hi,
>
>
> Rahul Nabar wrote:
>> I set up my nagios system to monitor 256 odd nodes each with about 6
>> services (direct and NRPE). It is working fine but my load averages have
>> started edging upwards. Not critical yet but I wanted some tips to make
>> things more efficient and see if there are things I might have done
>> ineffeciently.
>
> We have +2000 hosts and +4700 services configured on one of our Nagios
> instance. Load average is between 1.3 an 2.0 which I find acceptable.
>
> Our hardware is the following: Core2 Duo 4300 @ 1.80GHz with 2GB of RAM.
>
>> One of the points I identified is this: I am doing a ping and ssh check
>> on each server. This seems redundant. Is there a way to set it up so that:
>> Do a ssh check; if this succeds obviously ping is ok. If it fails do a
>> ping check and report on that.
>
> "check-host-alive" is only triggered when a service associated with the
> host changes state.
>
> However, I personally consider PING to be a service in itself,
> monitoring the network performance/quality.
>
> PING can still answer but with degraded performances (packet loss, poor
> response time). You probably want to be informed about such problems.
> (ie. in case of a (D)DoS where your network port is maxed out)
>
>> How about the other way around too? I have a bunch of NRPE checks:
>> load_average, total-processes, scratch and home dir usage, pbs_mom,
>> ntp_time. If ssh fails then there is obviously no reason to try these
>> other checks right? But I think the monitoring_host wastes its cycles
>> still trying them (based on the "Last Check" time)
>
> The SSH service state can be CRITICAL while all the other services are
> still OK. (ie. ssh server misconfiguration) You probably want to be
> informed about it too.
>
>> Any tips how I can achieve these effeciency tweaks? Or is there a
>> problem in my strategy? Any other performance tweaks so that I can
>> squeeze every ounce of Nagios performace?
>>
>> Already I am using NRPE rather than check_by_sshh since I was told the
>> latter might be ineffecient for the monitoring host load usage.
>
> What kind of server are you using?
>
> Also, what's the check_interval? A 1 minute interval might put the
> server on its knee since it would be scheduling and executing 1536
> checks per minute. (as per your informations)
>
> There's a lot of factors that could impact Nagios performance and you
> should be aware of all of them. Reading the documentation and
> understanding the impact of each configuration would be a good start.
>
> --
> Mathieu
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list