Ways and tweaks to make nagios more efficient. load average on monitoring host edging up.

Jake jakepaulus at gmail.com
Wed Jan 28 00:58:28 CET 2009


On Tue, Jan 27, 2009 at 6:20 PM, Rahul Nabar <rpnabar at gmail.com> wrote:

> I set up my nagios system to monitor 256 odd nodes each with about 6
> services (direct and NRPE). It is working fine but my load averages have
> started edging upwards. Not critical yet but I wanted some tips to make
> things more efficient and see if there are things I might have done
> ineffeciently.
>
> One of the points I identified is this: I am doing a ping and ssh check on
> each server. This seems redundant. Is there a way to set it up so that:
> Do a ssh check; if this succeds obviously ping is ok. If it fails do a ping
> check and report on that.
>
>
> How about the other way around too? I have a bunch of NRPE checks:
> load_average, total-processes, scratch and home dir usage, pbs_mom,
> ntp_time. If ssh fails then there is obviously no reason to try these other
> checks right? But I think the monitoring_host wastes its cycles still trying
> them (based on the "Last Check" time)
>

I use ping as both a service check and a host check because i want to ping
all of the time to measure latency, etc. I wouldn't think so much about
eliminating service checks that aren't directly redundant as much as making
sure the checks you do are as fast as possible.

Specifically, look for any service check that takes longer than a second.
Also make sure your timeouts are set low as this can easily be a source for
high load averages - e.g. if you consider 500ms latency on the ping service
to be critical then why not set your timeout value to one or two seconds
instead of 10 (which is the default for check_ping). That single change for
check_ping made a huge difference for me and that was before I started even
looking at other services like my check_dell-hardware and check_hp-hardware
which were awfully slow prior to rewriting them (now available on
nagiosexchange.)


-- 
Jake Paulus
JakePaulus at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20090127/6ddf2dc3/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list