Instrumenting Nagios

eponymous alias eponymousalias at yahoo.com
Thu May 21 18:30:14 CEST 2009


Sounds like what you want is a way to wait
until you know the program is in a bad way,
and only then turn on the profiling.  Perhaps
there is some mechanism such as described here:
http://www.cs.utah.edu/flux/oskit/html/oskit-wwwch32.html
that might do the trick.  (I'm not saying that
particular implementation is appropriate or
set up in the standard gprof package; I have
not looked in detail.  All I'm saying is that
you might look for such a facility.)

If you can find such a mechanism, you might
need to send Nagios a custom command to get it
to enable or disable the profiling, so you
have control from the outside.

--- On Thu, 5/21/09, Steven D. Morrey <smorrey at ldschurch.org> wrote:

> From: Steven D. Morrey <smorrey at ldschurch.org>
> Subject: Re: [Nagios-devel] Instrumenting Nagios
> To: "Nagios Developers List" <nagios-devel at lists.sourceforge.net>
> Date: Thursday, May 21, 2009, 6:48 AM
> gprof doesn't like Nagios.
> It generates a new profile data for each fork.
> I have 30,000 service checks on 3,000 hosts that run each
> hour.
> Even then it's ok for 30 minutes or an hour, but when you
> are trying to debug something that takes 2 or 3 days to
> show, it becomes nearly impossible to manage.
> oprofile buggered the entire system on my development boxes
> (SLES 9 on VMWare).
> Hence the need to instrument just the important parts.
> Unless you folks know of some switch or another I can pass
> in at compile time to get the profile data to be
> manageable.
> 
> Thanks!
> 
> Sincerely,
> Steve
> 
> ________________________________________
> From: eponymous alias [eponymousalias at yahoo.com]
> Sent: Wednesday, May 20, 2009 7:50 PM
> To: Nagios Developers List
> Subject: Re: [Nagios-devel] Instrumenting Nagios
> 
> To the extent that such delays may be partly
> due to general cost of computing, profiling the
> entire nagios binary would not be a bad idea.
> gprof is your friend.
> 
> --- On Tue, 5/19/09, Steven D. Morrey <smorrey at ldschurch.org>
> wrote:
> 
> > From: Steven D. Morrey <smorrey at ldschurch.org>
> > Subject: [Nagios-devel] Instrumenting Nagios
> > To: "nagios-devel at lists.sourceforge.net"
> <nagios-devel at lists.sourceforge.net>
> > Date: Tuesday, May 19, 2009, 11:11 AM
> > Hi Everyone,
> >
> > We're trying to track down a high latency issue we're
> > having with our Nagios system and I'm hoping to get
> some
> > advice from folks.
> > Here's what’s going on.
> >
> > We have a system running Nagios 2.12 and DNX 0.19
> (latest)
> > This setup is comprised of 1 main nagios server and 3
> DNX
> > "worker nodes".
> >
> > We have 29000+ service checks across about 2500 hosts.
> Over
> > the last year we average about 250 or more services
> alarming
> > at any given time. We also have on average about 10
> hosts
> > down at any given time.
> >
> > My original thought was that perhaps DNX was slowing
> down,
> > maybe a leak or something so I instrumented DNX, by
> timing
> > from the moment it's handed a job until it posts the
> results
> > into the circular results buffer.
> > This figure holds steady at 3.5s.
> >
> > I am pretty sure all checks are getting executed (at
> least,
> > all the ones that are enabled) eventually. Just more
> and
> > more slowly over time.
> > Clearly, some checks are being delayed by something or
> even
> > many things.  What I'd like to do is to perhaps
> extend
> > nagiostats to gather information about why latency is
> at the
> > level it is, to see if we can't determine why Nagios
> is
> > waiting to run these checks.
> >
> > What should we be looking at, either in the event loop
> or
> > outside of it, to get a good overview of how what and
> why
> > nagios is doing what it's doing?
> >
> > We are thinking of adding counters to the different
> events
> > (both high and low) in an attempt to determine the
> source of
> > the latency in detail. For example, if the average
> check
> > latency is 100 seconds, being able to show that 30 of
> that
> > was spent doing notifications, and 20 seconds spent
> doing
> > service reaping, etc. That way we can know where we
> need to
> > make optimizations.
> >
> > I'm thinking that if we can instrument the following
> events
> > we should have most of our bases covered (note some of
> these
> > may already be instrumented)...
> >
> > log file rotations,
> > external command checks,
> > service reaper events,
> > program shutdown,
> > program restart,
> > orphan check,
> > retention save,
> > status save,
> > service result freshness,
> > host result freshness,
> > expired downtime check,
> > check rescheduling,
> > expired comment check
> > host check
> > service check
> >
> > Is there anything else that could or should be
> instrumented
> > that could give us a good view in what nagios is doing
> thats
> > causing service checks to be executed further and
> further
> > away from when they were scheduled?
> >
> > Are these complete? Do these make sense to instrument
> and
> > would they be useful in determining what is
> contributing to
> > check latency?
> >
> >
> > Thanks in advance!
> >
> > Sincerely,
> > Steve
> >
> >
> >  NOTICE: This email message is for the sole use
> of the
> > intended recipient(s) and may contain confidential
> and
> > privileged information. Any unauthorized review, use,
> > disclosure or distribution is prohibited. If you are
> not the
> > intended recipient, please contact the sender by reply
> email
> > and destroy all copies of the original message.
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Crystal Reports - New Free Runtime and 30 Day Trial
> > Check out the new simplified licensing option that
> enables
> >
> > unlimited royalty-free distribution of the report
> engine
> > for externally facing server and web deployment.
> > http://p.sf.net/sfu/businessobjects
> > _______________________________________________
> > Nagios-devel mailing list
> > Nagios-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-devel
> >
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Register Now for Creativity and Technology (CaT), June 3rd,
> NYC. CaT
> is a gathering of tech-side developers & brand
> creativity professionals. Meet
> the minds behind Google Creative Lab, Visual Complexity,
> Processing, &
> iPhoneDevCamp asthey present alongside digital heavyweights
> like Barbarian
> Group, R/GA, & Big Spaceship. http://www.creativitycat.com
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
> 
> 
>  NOTICE: This email message is for the sole use of the
> intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized review, use,
> disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email
> and destroy all copies of the original message.
> 
> 
> 
> ------------------------------------------------------------------------------
> Register Now for Creativity and Technology (CaT), June 3rd,
> NYC. CaT
> is a gathering of tech-side developers & brand
> creativity professionals. Meet
> the minds behind Google Creative Lab, Visual Complexity,
> Processing, & 
> iPhoneDevCamp asthey present alongside digital heavyweights
> like Barbarian
> Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
> 


      

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list