Host List Sorting issues - WAN links with lots of hosts

Ben Sykes ben.sykes at transpire.com.au
Thu Mar 21 01:30:51 CET 2013


I think I answered my own question.

The host_inter_check_delay_method just needs tweaking to higher than the
smart value (currently 1.5 sec because of our reasonably high number
600-700 hosts) to stagger the checks over a more reasonable period...

http://nagios.sourceforge.net/docs/3_0/configmain.html#host_inter_check_delay_method

Its not ideal as it'll be checking each device sequentially by alphabetical
order in roughly the same time period, but it will at least give more of a
chance for each device to get its ICMP replies back before the next one is
asked.

Ben

On Thu, Mar 21, 2013 at 9:41 AM, Ben Sykes <ben.sykes at transpire.com.au>wrote:

> Hi All,
>
> Long time user, first time poster on the list.
>
> I have a fairly large distributed monitoring setup currently in pilot
> that's monitoring a variety of devices at remote branch sites.
>
> The hostnames at all these sites are very similar since the naming
> standard includes the branch ID at the start of the hostname.
>
> What we are seeing is Nagios' scheduler is using a sorted list to drive
> the host check scheduling decisions, which means all or the majority of
> devices at a particular site are being checked at once. With all those ICMP
> packets going down a long thin WAN link that may be close to 100%
> utilisation, we are seeing all the devices at the site go down at once,
> then come back up as soon as the next check is run in a more staggered
> manner.
>
> I have checked the source code and the host list Nagios uses is sorted
> after reading the config files and the scheduler routine simply walks the
> linked list of hosts, and adds them into the schedule.
>
> My ideas to solve it...
>
> - Modify check_icmp with a wrapper script or similar that adds a random
> delay to the ping check to avoid the mass of packets (OK but will still
> lead to events where all the randomness adds up)
> - Modify the Nagios source code and recompile to remove the sorting of
> host lists (suboptimal)
> - Increase the thresholds for ping timeouts etc (doesn't really let us
> track latency of each site as it's then affected by the ping grouping)
>
> Any ideas from the community that'd be useful?
>
> Thanks
>
> Ben Sykes
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130321/44387d48/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RSImage.png
Type: image/png
Size: 8044 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130321/44387d48/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RSImage.png
Type: image/png
Size: 2554 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130321/44387d48/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RSImage.png
Type: image/png
Size: 6690 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130321/44387d48/attachment-0002.png>
-------------- next part --------------
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list