Average Check latency and execution time growth - 3.2.3

Stuart Browne stuart.browne at ausregistry.com.au
Mon Oct 24 01:49:46 CEST 2011


> -----Original Message-----
> From: Max Schubert [mailto:maxs at webwizarddesign.com]
> Sent: Sunday, 9 October 2011 2:19 AM
> Subject: Re: [Nagios-users] Average Check latency and execution time
> growth - 3.2.3

Sorry for the delay in response, went on break for a few weeks.

> What minor RHEL rev are you running?  We had one poller that was
> running RHEL 5.3 that had constantly increasing latency - a Compaw /
> AMD based host.  None of the optimizations / configuration changes we
> made to the other pollers we ran at the time seemed to help this one -
> we updated the poller in-box from 5.3 to 5.4 and voila - issue gone.

Fully up-to-date EL5.7.

> As Joerge mentioned, probably was a memory leak / bug in a library the
> parent Nagios poller process was using, we never did determine which
> one and we haven't hit that same issue since then with any 5.4 or 5.5
> pollers.

Embedded perl is still in use on this box (too many perl-written plugins to change it without serious thought).

> Even with stable software we end up bouncing our pollers every 2-3
> days - 1) because we have an active customer base who make config
> changes often and 2) because we take the metrics from the checks and
> put them in a time series data warehouse that is sensitive to interval
> skew...any poller that hits 10 seconds latency has to be bounced.
> 
> We are at 12 pollers or so right now and we will be up to almost 20 by
> next year at this time.

Sounds fun ;)

> Max
> 
> On 10/2/11, Stuart Browne <stuart.browne at ausregistry.com.au> wrote:
> > Hi,
> >
> > I know this topic has been covered many times, but I've tried those
> tweaks
> > and I have the remaining issue.
> >
> > After a few days, the latency on checks explodes.  It goes along quite
> > happily with small values, then after (about) 3 days, the values rise
> quite
> > sharply.  I've recently been graphing performance statistics
> (nagiostats,
> > mrtg) and as you can see by the two attachments (day, week), it's rather
> > surprising.
> >
> > We restart Nagios every few days (for other reasons) so thankfully the
> issue
> > never gets completely out of control, but as you can see, it gets a bit
> > crazy.
> >
> > I can't think of any combination of settings that would cause such
> growth
> > after such a long period of time.  Does anybody have any knowledge as to
> why
> > it would suddenly increase after running for days without issue?
> >
> > Basic Nagios system stats:
> > 	2 x dual-core Xeon 5160 (3Ghz)
> > 	6GB Memory
> > 	4 x SAS, RAID1 (hardware, BBU, LVM over RAID1)
> > 	RHEL5, fully patched
> > 	Load average between 0.5 and 3.2
> >
> > 'nagios -s /etc/nagios/nagios.cfg' output (trimmed):
> >
> > HOST SCHEDULING INFORMATION
> > ---------------------------
> > Total hosts:                     252
> > Total scheduled hosts:           252
> > Host inter-check delay method:   SMART
> > Average host check interval:     300.00 sec
> > Host inter-check delay:          1.19 sec
> > Max host check spread:           30 min
> > First scheduled check:           Mon Oct  3 14:31:17 2011
> > Last scheduled check:            Mon Oct  3 14:36:15 2011
> >
> >
> > SERVICE SCHEDULING INFORMATION
> > -------------------------------
> > Total services:                     1575
> > Total scheduled services:           1386
> > Service inter-check delay method:   SMART
> > Average service check interval:     878.40 sec
> > Inter-check delay:                  0.63 sec
> > Interleave factor method:           SMART
> > Average services per host:          6.25
> > Service interleave factor:          6
> > Max service check spread:           30 min
> > First scheduled check:              Mon Oct  3 14:33:43 2011
> > Last scheduled check:               Mon Oct  3 14:48:21 2011
> >
> > CHECK PROCESSING INFORMATION
> > ----------------------------
> > Check result reaper interval:       5 sec
> > Max concurrent service checks:      Unlimited
> >
> >
> > PERFORMANCE SUGGESTIONS
> > -----------------------
> > I have no suggestions - things look okay.

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning at Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list