Passive monitoring is running slow?

Jonathan Call jcall at verio.net
Thu May 3 00:31:26 CEST 2007



> -----Original Message-----
> From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-
> bounces at lists.sourceforge.net] On Behalf Of Marc Powell
> Sent: Wednesday, May 02, 2007 3:39 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Passive monitoring is running slow?
> 
> 
> 
> > -----Original Message-----
> > From: nagios-users-bounces at lists.sourceforge.net
[mailto:nagios-users-
> > bounces at lists.sourceforge.net] On Behalf Of Jonathan Call
> > Sent: Wednesday, May 02, 2007 10:07 AM
> > To: nagios-users at lists.sourceforge.net
> > Subject: Re: [Nagios-users] Passive monitoring is running slow?
> >
> >
> >
> > > -----Original Message-----
> > > From: Thomas Guyot-Sionnest [mailto:dermoth at aei.ca]
> > > Sent: Tuesday, May 01, 2007 4:29 PM
> > > To: Jonathan Call
> > > Cc: nagios-users at lists.sourceforge.net
> > > Subject: Re: [Nagios-users] Passive monitoring is running slow?
> > >
> > > On 01/05/07 05:15 PM, Jonathan Call wrote:
> > > > I have set up a distributed monitoring system per the Nagios
> > > documentation.
> > > >
> > > > I initially tested it out by having the distributed server
monitor
> > only
> > > 24 or so services on about 8 hosts. There didn't seem to be any
> > problems.
> > > >
> > > > I then cranked it up to 427 services on 81 hosts. I'm watching
the
> > > distributed server right now and there is hardly any system load
but
> > the
> > > Service Check Latency seems extremely high:
> > > >
> > > > Metric			Min.		Max.		Average
> > > > Check Execution Time:  	0.05 sec	1.67 sec	0.701
> > sec
> > > > Check Latency:		60.40 sec	287.36 sec	184.514
> > sec
> > > > Percent State Change:	0.00%		0.00%		0.00%
> > > >
> > > > This is resulting in 50% or less of the service checks
completing
> in
> > the
> > > 5 minutes or less timeframe.
> > > >
> 
> 
> > So this is a know design failure in Nagios then? I'm fairly new to
> 
> Absolutely not.
> 
> > Nagios and I am completely dumbfounded at this. If you can't service
> > even a quarter (and probably even a tenth) of the amount of hosts
and
> > services on a distributed server than you can on a regular active
> server
> > then what is the point of having a distributed model at all?
> 
> I have 5 data collector machines running nagios
> -and- cricket for thousands of services each with nagios reporting all
> results back to two central hosts as documented. Average latency is
> 0.689 seconds and Max of 3.65 seconds right now. The distributed
server
> should be performing exactly like a regular active server as far as
> latency stats are concerned. You're either starving nagios for
resources
> needed to run its active checks (run ~nagios/bin/nagios -s
> ~nagios/etc/nagios.cfg to see recommended settings) or, less likely,
> something is wrong with your submit-check-result. If you submit a
result
> from the command line, does it complete in a timely manner? If you
> disable OCSP does the latency go away? Basic troubleshooting dictates
> you should try methodically enabling features on your distributed
> machine to turn it from an active-only server to active submitting
check
> results via OCSP.
> 
> Disable OCSP program-wide (nagios.cfg)
> Test

With OCSP disabled service check latency is under half a second.

> Enable OCSP but have your OCSP script do everything except call
> send_nsca
> Test

With the send_nsca line commented out (basically calling an empty shell
script) service check latency is under half a second as well.

> Enable send_nsca in your OCSP script.
> Test

Service Latency times spike again.
Watching top for a few minutes reveals a LOT of send_nsca processes
being spawned but few checks actually running. Of course the SNMP checks
themselves run very quickly but there always seems to be a send_nsca
client running.  Not the same one either, always a different PID.

I timed the script itself (copied right off the Nagios documentation
website) and it executes in a timely manner as well:
0.000u 0.009s 0:00.71 0.0%      0+0k 0+0io 0pf+0w

> 
> 
> Do you have regular host checks enabled? Post the output of nagios -v
> and nagios -s.

Scheduled host checks are not enabled.

Nagios -v output:
Nagios 2.9
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 04-10-2007
License: GPL

Reading configuration data...

Running pre-flight check on configuration data...

Checking services...
        Checked 427 services.
Checking hosts...
        Checked 81 hosts.
Checking host groups...
        Checked 8 host groups.
Checking service groups...
        Checked 0 service groups.
Checking contacts...
        Checked 1 contacts.
Checking contact groups...
        Checked 1 contact groups.
Checking service escalations...
        Checked 0 service escalations.
Checking service dependencies...
        Checked 0 service dependencies.
Checking host escalations...
        Checked 0 host escalations.
Checking host dependencies...
        Checked 0 host dependencies.
Checking commands...
        Checked 16 commands.
Checking time periods...
        Checked 1 time periods.
Checking extended host info definitions...
        Checked 0 extended host info definitions.
Checking extended service info definitions...
        Checked 0 extended service info definitions.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the
pre-flight check

Nagios -s output:

Nagios 2.9
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 04-10-2007
License: GPL

Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---------------------------
Total hosts:                     81
Total scheduled hosts:           0
Host inter-check delay method:   SMART
Average host check interval:     0.00 sec
Host inter-check delay:          0.00 sec
Max host check spread:           30 min
First scheduled check:           N/A
Last scheduled check:            N/A


SERVICE SCHEDULING INFORMATION
-------------------------------
Total services:                     427
Total scheduled services:           427
Service inter-check delay method:   SMART
Average service check interval:     300.00 sec
Inter-check delay:                  0.70 sec
Interleave factor method:           SMART
Average services per host:          5.27
Service interleave factor:          6
Max service check spread:           30 min
First scheduled check:              Wed May  2 22:16:00 2007
Last scheduled check:               Wed May  2 22:21:02 2007


CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval:      10 sec
Max concurrent service checks:      Unlimited


PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.


System Information:

FreeBSD 6.2 (SMP) with Nagios 2.9 installed from source (ports) and NSCA
installed from source (ports)
Dual 2.66 GHz XEON 2 U system with 4GB of RAM.

I have also tried out Thomas' suggestion of using the results check
abilities of Nagios with a named pipe. That script/implementation works
just fine as well.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list