Rép. : RE: [Nagios-users] [Solved] Nagios Performance Data shows checks aren't being completed

Serveur-Faucon Surveillance SrvFaucon at cslaval.qc.ca
Wed Mar 8 17:53:14 CET 2006


You are welcome, I so think it is irresponsible or anti-openess when someone do not post the solution ;)

We are using DNS names all the way except in some exceptions like core routers. Since we have 300 hosts to check and more services, managing 300 IP's that do change sometimes is too much time consuming.

In this particular situation, the server hosting the dns service was at 100% from time to time. I'll be noticed now when it is 90% or more in a space of 10 minutes or more :)

One partial solution was also to put --with-ipv6=no while compiling the plugins. Somehow, when no options are present, when the query goes, there is 1 ipv4 packet and 3 ipv6 packets (retrys).

On the other hand, Steve Shipway suggestion of having a local DNS sound good. Or mabe there is a way to keep  arp cache longer? But this goes in GNU/Linux and not in Nagios :)



---------------------------------------------------
Alexandre Racine - Gardien Virtuel - Sécurité Informatique www.gardienvirtuel.com
Montréal, Québec, Canada

>>> "Marc Powell" <marc at ena.com> 2006-03-07 17:17:20 >>>
Thanks for posting back to the list. Are you using names or IP's for the host address in your definitions? Was that the lookup that was affecting your performance? We've always used IP's to try to be as independent of other systems as possible and I highly recommend it.

--
Marc

> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Serveur-Faucon Surveillance
> Sent: Tuesday, March 07, 2006 4:02 PM
> To: kate.harris at gmail.com; nagios-users at lists.sourceforge.net;
> kat at totkat.org 
> Subject: [Nagios-users] [Solved] Nagios Performance Data shows checks
> aren't being completed
> 
> I found my bug.
> 
> The DNS server was experimenting lags, slowing down Nagios.
> 
> There you go.
> 
> 
> 
> 
> 
> ---------------------------------------------------
> Alexandre Racine - Gardien Virtuel - Sécurité Informatique
> www.gardienvirtuel.com 
> Montréal, Québec, Canada
> 
> >>> kate.harris at gmail.com 2006-03-07 06:32:48 >>>
> I had a similar problem and thought I had fixed it.
> 
> My situation is that I have 922 services to check (at the moment, I need
> to
> ramp up to over 2,500 but the latency problem is a show-stopper at the
> moment).  I'm using a very low-spec Dell running Solaris 10 with
> Nagios 2.0to do it.  Using default settings, I was initially getting
> average check
> latencies of the order of 5-6 seconds which was fine, but after a day or
> so
> of no Nagios restarts, that figure would rocket to 100 seconds and stay
> there, not ever re-checking the majority of the services, with re-
> scheduled
> check times staying in the past, until I did a nagios reload.
> 
> There was one directive which solved the stale re-check times:-
> check_for_orphaned_services=1
> 
> Also, I reduced a couple of timeout values so that Nagios stopped wasting
> time on checks which were bound to fail:-
> service_check_timeout=30
> host_check_timeout=30
> event_handler_timeout=30
> notification_timeout=30
> 
> Given that the load on the machine doesn't appear to go over 0.50, I've
> allowed infinite concurrent services checks now, increased from 400, but
> that appears to be making no difference at all. And I left the reaper
> frequency at 10 seconds.  So now the checks were being re-scheduled for
> times in the future, and the latencies stopped running away quite so
> dramatically.
> 
> This is the state of things at the moment:-
> 
> Active Service Checks: Time Frame Checks Completed
> <= 1 minute: 107 (11.6%)
> <= 5 minutes: 593 (64.3%)
> <= 15 minutes: 922 (100.0%)
> <= 1 hour: 922 (100.0%)
> Since program start:   922 (100.0%)
> 
>  Metric Min. Max. Average
> Check Execution Time:   0.06 sec 19.70 sec 0.139 sec
> Check Latency: 0.00 sec 17.19 sec 2.164 sec
> Percent State Change: 0.00% 0.00% 0.00%
> 
> 
> Passive Service Checks: Time Frame Checks Completed
> <= 1 minute: 0 (0.0%)
> <= 5 minutes: 0 (0.0%)
> <= 15 minutes: 0 (0.0%)
> <= 1 hour: 0 (0.0%)
> Since program start:   0 (0.0%)
> 
>  Metric Min. Max. Average
> Percent State Change:   0.00% 0.00% 0.00%
> 
> 
> Active Host Checks: Time Frame Checks Completed
> <= 1 minute: 1 (0.9%)
> <= 5 minutes: 4 (3.6%)
> <= 15 minutes: 5 (4.5%)
> <= 1 hour: 5 (4.5%)
> Since program start:   11 (9.8%)
> 
>  Metric Min. Max. Average
> Check Execution Time:   0.02 sec 13.52 sec 0.170 sec
> Check Latency: 0.00 sec 8.16 sec 0.073 sec
> Percent State Change: 0.00% 0.00% 0.00%
> 
> 
> Passive Host Checks: Time Frame Checks Completed
> <= 1 minute: 0 (0.0%)
> <= 5 minutes: 0 (0.0%)
> <= 15 minutes: 0 (0.0%)
> <= 1 hour: 0 (0.0%)
> Since program start:   0 (0.0%)
> 
>  Metric Min. Max. Average
> Percent State Change:   0.00% 0.00% 0.00%
> 
> However, the latencies are creeping upwards again, albeit very very slowly
> and at some point I think I'll have to do a reload just to get the
> checking
> back on track again.
> 
> Has anyone got any ideas on where I should be looking to make this better?



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/nagios-users 
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list