AW: huge performance problems, nagios perparse

Sand Philipp Philipp.Sand at sycor.de
Tue Jun 28 09:07:33 CEST 2005


I'm not using the perfparse pipe method, but I think this may bet he reason for your latency. As far as I understood, the pipe method is not suiteable when you have a lot of performance data, because for every data it opens a new process, when you haven't configured Nagios with embedded perl and perlcache.
Maybe you should try out another method. I'm running perfparse with the "Periodic Nagios Log Parse" with about 2000 Service checks...

_____________________________
 
Philipp Sand
OC-CC-TEC-SYS
 
SYCOR GmbH
Heinrich-von-Stephan-Straße 1-5
D - 37073 Göttingen
 
Telefon    +49 (0) 551 - 490 - 0
Telefax    +49 (0) 551 - 490 - 232468
 
philipp.sand at sycor.de
www.sycor.de
------------------------------------------------
 
> -----Ursprüngliche Nachricht-----
> Von: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] Im Auftrag von Mieden, Rick van der
> Gesendet: Dienstag, 28. Juni 2005 08:50
> An: nagios-users at lists.sourceforge.net; perfparse-
> users at lists.sourceforge.net
> Betreff: RE: [Nagios-users] huge performance problems, nagios perparse
> 
> All,
> 
> I've solved my performance problems. It was caused by the performance
> data in combination with perfparse. When I stopped the performance data
> I reached a latency of 0,4 sec.
> Of course the question is why......
> 
> Below my configuration related to perfparse and performance data. Any
> remarks on anything what I did wrong and could cause the heavy
> performance load would be nice. I also add-ed the perparse-users list,
> perhaps somebody from that list can have a look at it?
> 
> Regards
> 
> Rick
> 
> cfg_file=/usr/local/nagios/etc/nagios_perfparse.cfg
> perfdata_timeout=5
> process_performance_data=1
> host_perfdata_command=process-host-perfdata
> service_perfdata_command=process-service-perfdata
> host_perfdata_file=/usr/local/nagios/var/hostperf.log
> service_perfdata_file=/usr/local/nagios/var/serviceperf.log
> host_perfdata_file_mode=w
> service_perfdata_file_mode=w
> 
> my /usr/local/nagios/etc/nagios_perfparse.cfg looks like:
> 
> define command {
> command_name                   process-service-perfdata
> command_line
> /usr/local/nagios/bin/perfparse_nagios_pipe_command.pl
> /usr/local/nagios/var/perfdata-service.log "$TIMET$" "$HOSTNAME$"
> "$SERVICEDESC$" "$SERVICEOUTPUT$" "$SERVICESTATE$" "$SERVICEPERFDATA$"
> }
> 
> define command {
> command_name                   process-host-perfdata
> command_line
> /usr/local/nagios/bin/perfparse_nagios_pipe_command.pl
> /usr/local/nagios/var/perfdata-host.log
> "$TIMET$" "$HOSTNAME$" "$HOSTOUTPUT$" "$HOSTPERFDATA$"
> }
> 
> 
> 
> -----Original Message-----
> From: Hendrik Baecker [mailto:b00mer at gmx.net]
> Sent: Monday, June 27, 2005 15:31
> To: Mieden, Rick van der
> Cc: nagios-users at lists.sourceforge.net; marcus.hildenbrand at sap.com
> Subject: Re: [Nagios-users] huge performance problems
> 
> Mieden, Rick van der schrieb:
> 
> > Thanks for the responses, I tweaked it a bit, but still have a bad
> > latency with 174 hosts and 2360 services. )I tuned it down from 540
> > sec to 224 seconds. My plugins are fine, they are really fast on
> > commandline. I also have noticed that the latency drops to 4 secs if I
> > have around 1700 services running. So it looks like Nagios has some
> > problems when the amount of services go over 2000 over something like
> > that.
> >
> > I'v read something with the USE_MEMORY_PERFORMANCE_TWEAKS. But even
> > that option does not do anything better with the latency. I also have
> > read that there are many people who has far more hosts and services
> > checks than I have without any performance problems. So I'd love to
> > see their nagios.cfg, or would like to know what the trick is.
> >
> > Regards,
> >
> > Rick
> >
> Hi,
> 
> nearly the same on our side. Nagios with 1900 Services runs with max.
> 2-4 seconds Latency. But beware if you want more...
> 
> I heard from this people too which have more than 2000 Services but most
> of them are doing a kind of distributed monitoring I think.
> 
> Regards,
> Hendrik
> 
> > -----Original Message-----
> > *From:* Hendrik Baecker [mailto:b00mer at gmx.net]
> > *Sent:* Thursday, June 23, 2005 15:50
> > *To:* Mieden, Rick van der
> > *Cc:* nagios-users at lists.sourceforge.net
> > *Subject:* Re: [Nagios-users] huge performance problems
> >
> > Hi,
> >
> > one year ago we have had nearly the same performance Problems too.
> >
> > It seems that the scheduler of nagios roles over itself if the count
> > of services is to big. Therefore we decided to install another nagios
> > process with different configs in a different directory. So we
> > splitted our nagios like our networks. One Nagios (nagios-1) for
> > Network A and another one (nagios-2) for Network B.
> >
> > So our count of services per nagios instance was decreased and it runs
> > so far so good.
> >
> > All this was under version 1.2.
> >
> > In the past I posted some questions about our problem but there were
> > no good answer on it, so today I just only know that it works for us.
> >
> > So far for this.
> > I hope nobody will geek me when I take your post to describe some
> > problems we now have on testing above doing with different instances
> > on the same host with nagios 2.02b.
> >
> > When I fire up my instance "nagios-1" with around 1600 Service Checks
> > it runs very fine with nearly no latency.
> > But when I fire up the "nagios-2" with around 1850 services this
> > instance runs very fast to latencies around 100 seconds.
> > When I now stop the first instance the latencies on the second one
> > decrease down to < 5 seconds.
> >
> > Perhaps some of the developer can tell me if I am right in theory that
> > (one of) the working thread(s) with the scheduling queue can see the
> > other scheduling queue? Are the possibly the same?
> >
> > I am not a programmer but I can think about following: Starting
> > nagios-1 will create the scheduling queue and gives it to RAM. So far
> > so good. There it is and the worker runs through it and executes the
> > checks.
> > I am now afraid that when I start my second nagios process this will
> > also create the scheduling queue into the system RAM but that the two
> > proceses don't have their own queues... Hope that anybody understand
> > what I mean.
> >
> > Best regards
> > Hendrik
> >
> > Mieden, Rick van der schrieb:
> >
> > We have heavy performance problems with Nagios. We monitor 174 hosts,
> > with 2255 services and an average latency off 400 seconds!!!! Off
> > course that's not exceptable.
> >
> > I use perl plugins with ssh and snmp plugins. I'v compiled nagios with
> > perlcache and embedded-perl enabled. The server is a sparc server with
> > 2 x 1.1 Ghz CPU and 1024 RAM. (Solaris 8, latest patch-level)
> >
> > I played around with all kind of parameters and read the tuning docs
> > for nagios.
> >
> > Below the output of "nagios -s nagios.cfg":
> >
> > Nagios 2.0b3
> >
> > Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org
> > <http://www.nagios.org>)
> >
> > Last Modified: 04-03-2005
> >
> > License: GPL
> >
> > Projected scheduling information for host and service
> >
> > checks is listed below. This information assumes that
> >
> > you are going to start running Nagios with your current
> >
> > config files.
> >
> > HOST SCHEDULING INFORMATION
> >
> > ---------------------------
> >
> > Total hosts: 174
> >
> > Total scheduled hosts: 0
> >
> > Host inter-check delay method: SMART
> >
> > Average host check interval: 0.00 sec
> >
> > Host inter-check delay: 0.00 sec
> >
> > Max host check spread: 30 min
> >
> > First scheduled check: N/A
> >
> > Last scheduled check: N/A
> >
> > SERVICE SCHEDULING INFORMATION
> >
> > -------------------------------
> >
> > Total services: 2255
> >
> > Total scheduled services: 2255
> >
> > Service inter-check delay method: SMART
> >
> > Average service check interval: 222.47 sec
> >
> > Inter-check delay: 0.10 sec
> >
> > Interleave factor method: SMART
> >
> > Average services per host: 12.96
> >
> > Service interleave factor: 13
> >
> > Max service check spread: 30 min
> >
> > First scheduled check: Wed Jun 22 15:05:08 2005
> >
> > Last scheduled check: Wed Jun 22 15:08:50 2005
> >
> > CHECK PROCESSING INFORMATION
> >
> > ----------------------------
> >
> > Service check reaper interval: 5 sec
> >
> > Max concurrent service checks: 200
> >
> > PERFORMANCE SUGGESTIONS
> >
> > -----------------------
> >
> > I have no suggestions - things look okay.
> >
> > And a nagiostat output:
> >
> > CURRENT STATUS DATA
> >
> > ----------------------------------------------------
> >
> > Status File: /usr/local/nagios/var/status.dat
> >
> > Status File Age: 0d 0h 0m 13s
> >
> > Status File Version: 2.0b3
> >
> > Program Running Time: 0d 32h 0m 13s
> >
> > Total Services: 2255
> >
> > Services Checked: 2255
> >
> > Services Scheduled: 2255
> >
> > Active Service Checks: 2255
> >
> > Passive Service Checks: 0
> >
> > Total Service State Change: 0.000 / 5.860 / 0.003 %
> >
> > *Active Service Latency: 386.526 / 414.446 / 394.100 %*
> >
> > Active Service Execution Time: 0.062 / 60.349 / 1.428 sec
> >
> > Active Service State Change: 0.000 / 5.860 / 0.003 %
> >
> > *Active Services Last 1/5/15/60 min: 155 / 1044 / 2255 / 2255*
> >
> > Passive Service State Change: 0.000 / 0.000 / 0.000 %
> >
> > Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
> >
> > Services Ok/Warn/Unk/Crit: 2242 / 0 / 0 / 13
> >
> > Services Flapping: 0
> >
> > Services In Downtime: 0
> >
> > Total Hosts: 174
> >
> > Hosts Checked: 174
> >
> > Hosts Scheduled: 0
> >
> > Active Host Checks: 174
> >
> > Passive Host Checks: 0
> >
> > Total Host State Change: 0.000 / 0.000 / 0.000 %
> >
> > Active Host Latency: 0.000 / 0.000 / 0.000 %
> >
> > Active Host Execution Time: 0.137 / 1.109 / 0.582 sec
> >
> > Active Host State Change: 0.000 / 0.000 / 0.000 %
> >
> > Active Hosts Last 1/5/15/60 min: 1 / 2 / 2 / 9
> >
> > Passive Host State Change: 0.000 / 0.000 / 0.000 %
> >
> > Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
> >
> > Hosts Up/Down/Unreach: 174 / 0 / 0
> >
> > Hosts Flapping: 0
> >
> > Hosts In Downtime: 0
> >
> > Anybody an idea what went wrong here? There must be something......
> >
> > Regards,
> >
> > Rick
> >
> > ===========================================================
> >
> > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> > alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> > direct te informeren door het bericht te retourneren. Hoewel Orange
> > maatregelen heeft genomen om virussen in deze email of attachments te
> > voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> > aangezien Orange niet aansprakelijk is voor computervirussen die
> > veroorzaakt zijn door deze email.
> >
> > The information contained in this message may be confidential and is
> > intended to be only for the addressee. Should you receive this message
> > unintentionally, please do not use the contents herein and notify the
> > sender immediately by return e-mail. Although Orange has taken steps
> > to ensure that this email and attachments are free from any virus, you
> > do need to verify the possibility of their existence as Orange can
> > take no responsibility for any computer virus which might be
> > transferred by way of this email.
> >
> > ===========================================================
> >
> > ===========================================================
> >
> > De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
> > alleen bestemd voor de geadresseerde. Indien u dit bericht onterecht
> > ontvangt, wordt u verzocht de inhoud niet te gebruiken en de afzender
> > direct te informeren door het bericht te retourneren. Hoewel Orange
> > maatregelen heeft genomen om virussen in deze email of attachments te
> > voorkomen, dient u ook zelf na te gaan of virussen aanwezig zijn
> > aangezien Orange niet aansprakelijk is voor computervirussen die
> > veroorzaakt zijn door deze email.
> >
> > The information contained in this message may be confidential and is
> > intended to be only for the addressee. Should you receive this message
> > unintentionally, please do not use the contents herein and notify the
> > sender immediately by return e-mail. Although Orange has taken steps
> > to ensure that this email and attachments are free from any virus, you
> > do need to verify the possibility of their existence as Orange can
> > take no responsibility for any computer virus which might be
> > transferred by way of this email.
> >
> > ===========================================================
> >
> 
> 
> 
> ===========================================================
> 
> De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is alleen
> bestemd voor de geadresseerde. Indien u dit bericht onterecht ontvangt,
> wordt u verzocht de inhoud niet te gebruiken en de afzender direct te
> informeren door het bericht te retourneren. Hoewel Orange maatregelen
> heeft genomen om virussen in deze email of attachments te voorkomen, dient
> u ook zelf na te gaan of virussen aanwezig zijn aangezien Orange niet
> aansprakelijk is voor computervirussen die veroorzaakt zijn door deze
> email.
> 
> The information contained in this message may be confidential and is
> intended to be only for the addressee. Should you receive this message
> unintentionally, please do not use the contents herein and notify the
> sender immediately by return e-mail. Although Orange has taken steps to
> ensure that this email and attachments are free from any virus, you do
> need to verify the possibility of their existence as Orange can take no
> responsibility for any computer virus which might be transferred by way of
> this email.
> 
> 
> ===========================================================
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=ick
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list