Host List Sorting issues - WAN links with lots of hosts

Ben Sykes ben.sykes at transpire.com.au
Wed Mar 20 23:41:40 CET 2013


Hi All,

Long time user, first time poster on the list.

I have a fairly large distributed monitoring setup currently in pilot
that's monitoring a variety of devices at remote branch sites.

The hostnames at all these sites are very similar since the naming standard
includes the branch ID at the start of the hostname.

What we are seeing is Nagios' scheduler is using a sorted list to drive the
host check scheduling decisions, which means all or the majority of devices
at a particular site are being checked at once. With all those ICMP packets
going down a long thin WAN link that may be close to 100% utilisation, we
are seeing all the devices at the site go down at once, then come back up
as soon as the next check is run in a more staggered manner.

I have checked the source code and the host list Nagios uses is sorted
after reading the config files and the scheduler routine simply walks the
linked list of hosts, and adds them into the schedule.

My ideas to solve it...

- Modify check_icmp with a wrapper script or similar that adds a random
delay to the ping check to avoid the mass of packets (OK but will still
lead to events where all the randomness adds up)
- Modify the Nagios source code and recompile to remove the sorting of host
lists (suboptimal)
- Increase the thresholds for ping timeouts etc (doesn't really let us
track latency of each site as it's then affected by the ping grouping)

Any ideas from the community that'd be useful?

Thanks

Ben Sykes

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130321/13a93be5/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list