[naemon-users] Host checks occuring too fast

Robert Brockway robert at timetraveller.org
Tue Jul 13 06:56:10 CEST 2021


Hi all.  I have the following settings specified in the prod-linux-server 
host template:

check_interval		3
max_check_attempts	5
retry_interval		3

In addition interval_length is at the default value of 60 so both 
intervals above are measured in minutes.

Despite this, host checks are occuring too fast.  An example is below but 
this has happened many times.  The problem is that Naemon is waking up 
staff during transient network failures.  Our infrastructure has 
redundancy and hosts are configured to reboot as a result of various 
failure modes.  The application is robust and copes with all this fine.

As a result I don't want anyone woken up until the host has been down for 
about 15 minutes.

Example checks on a down host as presented by Thruk:

[2021-07-11 18:52:44] HOST ALERT: 
bob;DOWN;HARD;5;CRITICAL - Socket timeout after 10 seconds
[2021-07-11 18:51:14] HOST ALERT: 
bob;DOWN;SOFT;4;CRITICAL - Socket timeout after 10 seconds
[2021-07-11 18:50:59] HOST ALERT: 
bob;DOWN;SOFT;3;CRITICAL - Socket timeout after 10 seconds
[2021-07-11 18:50:43] HOST ALERT: 
bob;DOWN;SOFT;2;CRITICAL - Socket timeout after 10 seconds
[2021-07-11 18:50:12] HOST ALERT: 
bob;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds

NB: The hostname isn't really called 'bob'.

I thought perhaps that host freshness was the problem so I turned that 
off but it hasn't made a difference. We don't currently have any passive 
checks so I think it is safe to turn off host freshness.

I'm going to set first_notification_delay to 10 minutes as a 
work-around.  Even a 10 minute delay will be a lot better than what is 
happening now.

Any help greatly appreciated.

Cheers,

Rob


More information about the Naemon-users mailing list