Excessive Latency (solved!) Nagios v1.2

Steve Shipway s.shipway at auckland.ac.nz
Wed Apr 6 23:48:50 CEST 2005


>From reading the various postings here, I now have solved my latency
problems.  As was indicated by another user (sorry, I do not remember any
names) the culprit is host checks.

As soon as we have a host down -- even if acknowledged or in scheduled
downtime -- then latency starts to creep up indefinitely until there are
*no* hosts down.  This is because hosts checks are high priority, and are
serial rather than parallel.

We have >400 hosts in our Nagios, so something is always in scheduled
downtime, it seems.  The Latency was getting up to 10 mins after a long
weekend.

To fix it, I did the following:
1) Change the max_check_attempts for the default host template to 2 (was
10).  This helped substantially, and is the main fix.
2) When a host is expected to be down for a long time (1+ day), disable host
checks for that host, even if it is in scheduled downtime.  However it seems
that sometimes the host checks are being run anyway...
3) Try to educate users to inform me when a host is being decommissioned,
rather than them just disabling alerts for it and forgetting about it.  This
is the hardest.
4) Add freshness checks for many services.  This doesn't help much since the
queue (low priority) is not being processed due to the backlog of host
checks (high priority)

I looked into changing the Nagios code to make it not obsess over host
checks and still process *all* the lower priority checks, but I'd rather not
make code modifications and I'm not 100% sure what to change!

Hope this helps other people in this situation...

Steve

---
Steve Shipway: ITSS, University of Auckland
Email: s.shipway at auckland.ac.nz  Web: http://www.steveshipway.org/  
** We can only discover new oceans when we have the **
** courage to lose sight of the shore.              **
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Steve Shipway.vcf
Type: text/x-vcard
Size: 154 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050407/47233f72/attachment.vcf>


More information about the Users mailing list