huge performance problems, nagios perparse

Mieden, Rick van der rick.vandermieden at orangemail.nl
Tue Jun 28 08:50:20 CEST 2005


All,

I've solved my performance problems. It was caused by the performance
data in combination with perfparse. When I stopped the performance data
I reached a latency of 0,4 sec. 
Of course the question is why......

Below my configuration related to perfparse and performance data. Any
remarks on anything what I did wrong and could cause the heavy
performance load would be nice. I also add-ed the perparse-users list,
perhaps somebody from that list can have a look at it?

Regards

Rick

cfg_file=/usr/local/nagios/etc/nagios_perfparse.cfg
perfdata_timeout=5
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata 
host_perfdata_file=/usr/local/nagios/var/hostperf.log
service_perfdata_file=/usr/local/nagios/var/serviceperf.log
host_perfdata_file_mode=w
service_perfdata_file_mode=w

my /usr/local/nagios/etc/nagios_perfparse.cfg looks like:

define command {
command_name                   process-service-perfdata
command_line
/usr/local/nagios/bin/perfparse_nagios_pipe_command.pl
/usr/local/nagios/var/perfdata-service.log "$TIMET$" "$HOSTNAME$"
"$SERVICEDESC$" "$SERVICEOUTPUT$" "$SERVICESTATE$" "$SERVICEPERFDATA$"
}

define command {
command_name                   process-host-perfdata
command_line
/usr/local/nagios/bin/perfparse_nagios_pipe_command.pl
/usr/local/nagios/var/perfdata-host.log
"$TIMET$" "$HOSTNAME$" "$HOSTOUTPUT$" "$HOSTPERFDATA$"
}



-----Original Message-----
From: Hendrik Baecker [mailto:b00mer at gmx.net] 
Sent: Monday, June 27, 2005 15:31
To: Mieden, Rick van der
Cc: nagios-users at lists.sourceforge.net; marcus.hildenbrand at sap.com
Subject: Re: [Nagios-users] huge performance problems

Mieden, Rick van der schrieb:

> Thanks for the responses, I tweaked it a bit, but still have a bad
> latency with 174 hosts and 2360 services. )I tuned it down from 540
> sec to 224 seconds. My plugins are fine, they are really fast on
> commandline. I also have noticed that the latency drops to 4 secs if I
> have around 1700 services running. So it looks like Nagios has some
> problems when the amount of services go over 2000 over something like
> that.
>
> I'v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even
> that option does not do anything better with the latency. I also have
> read that there are many people who has far more hosts and services
> checks than I have without any performance problems. So I'd love to
> see their nagios.cfg, or would like to know what the trick is.
>
> Regards,
>
> Rick
>
Hi,

nearly the same on our side. Nagios with 1900 Services runs with max.
2-4 seconds Latency. But beware if you want more...

I heard from this people too which have more than 2000 Services but most
of them are doing a kind of distributed monitoring I think.

Regards,
Hendrik

> -----Original Message-----
> *From:* Hendrik Baecker [mailto:b00mer at gmx.net]
> *Sent:* Thursday, June 23, 2005 15:50
> *To:* Mieden, Rick van der
> *Cc:* nagios-users at lists.sourceforge.net
> *Subject:* Re: [Nagios-users] huge performance problems
>
> Hi,
>
> one year ago we have had nearly the same performance Problems too.
>
> It seems that the scheduler of nagios roles over itself if the count
> of services is to big. Therefore we decided to install another nagios
> process with different configs in a different directory. So we
> splitted our nagios like our networks. One Nagios (nagios-1) for
> Network A and another one (nagios-2) for Network B.
>
> So our count of services per nagios instance was decreased and it runs
> so far so good.
>
> All this was under version 1.2.
>
> In the past I posted some questions about our problem but there were
> no good answer on it, so today I just only know that it works for us.
>
> So far for this.
> I hope nobody will geek me when I take your post to describe some
> problems we now have on testing above doing with different instances
> on the same host with nagios 2.02b.
>
> When I fire up my instance "nagios-1" with around 1600 Service Checks
> it runs very fine with nearly no latency.
> But when I fire up the "nagios-2" with around 1850 services this
> instance runs very fast to latencies around 100 seconds.
> When I now stop the first instance the latencies on the second one
> decrease down to < 5 seconds.
>
> Perhaps some of the developer can tell me if I am right in theory that
> (one of) the working thread(s) with the scheduling queue can see the
> other scheduling queue? Are the possibly the same?
>
> I am not a programmer but I can think about following: Starting
> nagios-1 will create the scheduling queue and gives it to RAM. So far
> so good. There it is and the worker runs through it and executes the
> checks.
> I am now afraid that when I start my second nagios process this will
> also create the scheduling queue into the system RAM but that the two
> proceses don't have their own queues... Hope that anybody understand
> what I mean.
>
> Best regards
> Hendrik
>
> Mieden, Rick van der schrieb:
>
> We have heavy performance problems with Nagios. We monitor 174 hosts,
> with 2255 services and an average latency off 400 seconds!!!! Off
> course that's not exceptable.
>
> I use perl plugins with ssh and snmp plugins. I'v compiled nagios with
> perlcache and embedded-perl enabled. The server is a sparc server with
> 2 x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level)
>
> I played around with all kind of parameters and read the tuning docs
> for nagios.
>
> Below the output of "nagios -s nagios.cfg":
>
> Nagios 2.0b3
>
> Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org
> <http://www.nagios.org>)
>
> Last Modified: 04-03-2005
>
> License: GPL
>
> Projected scheduling information for host and service
>
> checks is listed below. This information assumes that
>
> you are going to start running Nagios with your current
>
> config files.
>
> HOST SCHEDULING INFORMATION
>
> ---------------------------
>
> Total hosts: 174
>
> Total scheduled hosts: 0
>
> Host inter-check delay method: SMART
>
> Average host check interval: 0.00 sec
>
> Host inter-check delay: 0.00 sec
>
> Max host check spread: 30 min
>
> First scheduled check: N/A
>
> Last scheduled check: N/A
>
> SERVICE SCHEDULING INFORMATION
>
> -------------------------------
>
> Total services: 2255
>
> Total scheduled services: 2255
>
> Service inter-check delay method: SMART
>
> Average service check interval: 222.47 sec
>
> Inter-check delay: 0.10 sec
>
> Interleave factor method: SMART
>
> Average services per host: 12.96
>
> Service interleave factor: 13
>
> Max service check spread: 30 min
>
> First scheduled check: Wed Jun 22 15:05:08 2005
>
> Last scheduled check: Wed Jun 22 15:08:50 2005
>
> CHECK PROCESSING INFORMATION
>
> ----------------------------
>
> Service check reaper interval: 5 sec
>
> Max concurrent service checks: 200
>
> PERFORMANCE SUGGESTIONS
>
> -----------------------
>
> I have no suggestions - things look okay.
>
> And a nagiostat output:
>
> CURRENT STATUS DATA
>
> ----------------------------------------------------
>
> Status File: /usr/local/nagios/var/status.dat
>
> Status File Age: 0d 0h 0m 13s
>
> Status File Version: 2.0b3
>
> Program Running Time: 0d 32h 0m 13s
>
> Total Services: 2255
>
> Services Checked: 2255
>
> Services Scheduled: 2255
>
> Active Service Checks: 2255
>
> Passive Service Checks: 0
>
> Total Service State Change: 0.000 / 5.860 / 0.003 %
>
> *Active Service Latency: 386.526 / 414.446 / 394.100 %*
>
> Active Service Execution Time: 0.062 / 60.349 / 1.428 sec
>
> Active Service State Change: 0.000 / 5.860 / 0.003 %
>
> *Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255*
>
> Passive Service State Change: 0.000 / 0.000 / 0.000 %
>
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
>
> Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13
>
> Services Flapping: 0
>
> Services In Downtime: 0
>
> Total Hosts: 174
>
> Hosts Checked: 174
>
> Hosts Scheduled: 0
>
> Active Host Checks: 174
>
> Passive Host Checks: 0
>
> Total Host State Change: 0.000 / 0.000 / 0.000 %
>
> Active Host Latency: 0.000 / 0.000 / 0.000 %
>
> Active Host Execution Time: 0.137 / 1.109 / 0.582 sec
>
> Active Host State Change: 0.000 / 0.000 / 0.000 %
>
> Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9
>
> Passive Host State Change: 0.000 / 0.000 / 0.000 %
>
> Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
>
> Hosts Up/Down/Unreach: 174 / 0 / 0
>
> Hosts Flapping: 0
>
> Hosts In Downtime: 0
>
> Anybody an idea what went wrong here? There must be something......
>
> Regards,
>
> Rick
>
> ===========================================================
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct te informeren door het bericht te retourneren. Hoewel Orange
> maatregelen heeft genomen om virussen in deze email of attachments te
> voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> aangezien Orange niet aansprakelijk is voor computervirussen die
> veroorzaakt zijn door deze email.
>
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps
> to ensure that this email and attachments are free from any virus, you
> do need to verify the possibility of their existence as Orange can
> take no responsibility for any computer virus which might be
> transferred by way of this email.
>
> ===========================================================
>
> ===========================================================
>
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> direct te informeren door het bericht te retourneren. Hoewel Orange
> maatregelen heeft genomen om virussen in deze email of attachments te
> voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> aangezien Orange niet aansprakelijk is voor computervirussen die
> veroorzaakt zijn door deze email.
>
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps
> to ensure that this email and attachments are free from any virus, you
> do need to verify the possibility of their existence as Orange can
> take no responsibility for any computer virus which might be
> transferred by way of this email.
>
> ===========================================================
>



===========================================================

De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender direct te informeren door het bericht te retourneren. Hoewel Orange maatregelen heeft genomen om virussen in deze email of attachments te voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn aangezien Orange niet aansprakelijk is voor computervirussen die veroorzaakt zijn door deze email.

The information contained in this message may be confidential and is intended to be only for the addressee. Should you receive this message unintentionally, please do not use the contents herein and notify the sender immediately by return e-mail. Although Orange has taken steps to ensure that this email and attachments are free from any virus, you do need to verify the possibility of their existence as Orange can take no responsibility for any computer virus which might be transferred by way of this email.

===========================================================


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list