Severe peformance issue during major network outage

Aidan Anderson mail at aidananderson.co.uk
Fri May 11 21:25:17 CEST 2007


Ton Voon wrote:
> On 11 May 2007, at 19:03, Jim Avery wrote:
>
>   
>> On 11/05/07, Aidan Anderson <mail at aidananderson.co.uk> wrote:
>>
>>     
>>> A lot of people have mentioned using fping to speed things up but  
>>> if my
>>> average service latency is only 0.479 seconds in normal  
>>> circumstances, I
>>> can't see how tweaking this will help in a major outage situation.
>>>       
>> check_ping won't finish until it's done all the pings, and the pings
>> are (if I recall) always at one second intervals.  This means that if
>> you've configured check_ping to do (let's say) 5 pings, the check_ping
>> plugin will always take at least 5 seconds to complete.
>>
>> If the check_ping is being run as a host check rather than a service
>> check, my understanding is that this is the only thing Nagios will be
>> doing; it doesn't do anything else concurrently (correct me if I'm
>> wrong people).
>>     
>
> Correct. We noticed this some time ago too: http://altinity.blogs.com/ 
> dotorg/2006/05/immediate_perfo.html
>
> If you do stick to using check_ping, use -p 1 which is sub second  
> response time.
>
>   
First of all, thank-you for the replies!

The majority of devices that I monitor are routers/vpn devices and I 
have (on the documentation's advice) not set active checks on the hosts 
and instead I've added check_ping as a service on each of these hosts to 
do 5 pings as follows:

check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

For the host check I already use as you suggested a check_ping that only 
does one ping as follows:

check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

My understanding was that if the service check failed it would then 
abandon the service check altogether and move onto the host check which 
is only 1 ping.  The fact that the service checks are parallelised 
should mean that it shouldn't matter that there are 5 pings and the host 
check is only 1 ping which should resolve the bottleneck of serialised 
host checks.  I'm at a loss as to why performance has been impacted so 
severely.

Maybe I need to abandon the service checks altogether and just have a 
host check.  I'm reluctant to do this because I get very useful 
information from 5 pings, ie packet loss and high rta which is 
particularly handy for checking volatile links such as ADSL.  Maybe that 
is the trade-off, fast host checking with no useful stats or slow host 
checking with useful stats.

regards,
Aidan





-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list