Nagios Core 3.2.3 host check retry interval

Chris Beattie cbeattie at geninfo.com
Tue Nov 16 21:59:38 CET 2010


I noticed something curious.  It looks like Nagios 3.2.3 is making
on-demand host checks faster than the retry_interval should allow.  The
interval_length is set to 60 and the retry_interval is set to 1.  Nagios
and the plugins were compiled from source on CentOS 5.5 x64.

 

I'm not sure if this is related to Yu Watanabe's problem
(http://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg34042
.html) because I didn't start having it until after I upgraded to 3.2.3.

 

Here are some alerts from October when I was running Nagios 3.2.1.
There were service alerts too, but the host checks do not occur less
than one minute from each other:

 

                --------------------------------------------------

                [10-10-2010 06:41:29] HOST ALERT: wwwhost;UP;HARD;1;PING
OK - Packet loss = 0%, RTA = 50.10 ms

                [10-10-2010 06:28:40] HOST ALERT:
wwwhost;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%

                [10-10-2010 06:27:29] HOST ALERT:
wwwhost;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%

                [10-10-2010 06:26:19] HOST ALERT:
wwwhost;DOWN;SOFT;1;PING CRITICAL - Packet loss = 100%

                --------------------------------------------------

 

Here's some from earlier this month, after I'd switched from check_ping
to check_icmp.  Again, there were service alerts, but the host checks
are still about a minute apart:

 

                --------------------------------------------------

[11-07-2010 21:55:53] HOST ALERT: wwwhost;UP;SOFT;2;OK - 10.3.1.11: rta
4.480ms, lost 0%

                [11-07-2010 21:54:43] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta nan, lost 100%

                --------------------------------------------------

[11-09-2010 23:40:15] HOST ALERT: wwwhost;UP;SOFT;2;OK - 10.3.1.11: rta
1.018ms, lost 0%

[11-09-2010 23:39:15] HOST ALERT: wwwhost;DOWN;SOFT;1;CRITICAL -
10.3.1.11: rta 650.987ms, lost 80%

                --------------------------------------------------

 

On November 12th, I upgraded to Nagios 3.2.3 and the 1.4.15 plugins, and
got this later that evening.  The host checks were only about 20 seconds
apart:

 

                --------------------------------------------------

[11-12-2010 23:46:43] SERVICE ALERT: wwwhost;Counter: IIS Web
Connections;OK;SOFT;2;Web Sessions: 2

[11-12-2010 23:45:14] HOST ALERT: wwwhost;UP;SOFT;2;OK - 10.3.1.11: rta
0.985ms, lost 0%

[11-12-2010 23:44:53] HOST ALERT: wwwhost;DOWN;SOFT;1;CRITICAL -
10.3.1.11: rta 355.633ms, lost 80%

[11-12-2010 23:44:44] SERVICE ALERT: wwwhost;Counter: IIS Web
Connections;WARNING;SOFT;1;No data was received from host!

                --------------------------------------------------

 

Two days later, it looked like it was behaving properly:

 

                --------------------------------------------------

                [11-14-2010 23:44:57] HOST ALERT: wwwhost;UP;SOFT;2;OK -
10.3.1.11: rta 1.338ms, lost 0%

                [11-14-2010 23:44:27] SERVICE ALERT: wwwhost;Service:
Snare;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds

                [11-14-2010 23:44:27] SERVICE ALERT: wwwhost;Service:
RServer3;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds

                [11-14-2010 23:43:34] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 860.577ms, lost 80%

                [11-14-2010 23:43:22] SERVICE ALERT: wwwhost;Service:
Epilog;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds

                --------------------------------------------------

                [11-14-2010 08:56:55] HOST ALERT: wwwhost;UP;SOFT;2;OK -
10.3.1.11: rta 2.633ms, lost 0%

                [11-14-2010 08:55:45] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 518.822ms, lost 80%

                [11-14-2010 08:55:36] SERVICE ALERT: wwwhost;Counter:
IIS Web Connections;WARNING;SOFT;1;No data was received from host!

                --------------------------------------------------

 

Last night, however, the host got rechecked at short intervals:

 

                --------------------------------------------------

                [11-15-2010 23:56:09] HOST ALERT:
wwwhost;UP;SOFT;3;WARNING - 10.3.1.11: rta 89.448ms, lost 40%

                [11-15-2010 23:55:39] HOST ALERT:
wwwhost;DOWN;SOFT;2;CRITICAL - 10.3.1.11: rta 984.594ms, lost 80%

                [11-15-2010 23:55:21] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 738.100ms, lost 80%

                [11-15-2010 23:55:09] SERVICE ALERT:
wwwhost;CPU;WARNING;SOFT;1;No data was received from host!

                [11-15-2010 23:54:00] HOST FLAPPING ALERT:
wwwhost;STARTED; Host appears to have started flapping (23.0% change >
20.0% threshold)

                [11-15-2010 23:53:59] HOST ALERT:
wwwhost;UP;HARD;1;WARNING - 10.3.1.11: rta 183.851ms, lost 60%

                [11-15-2010 23:53:29] HOST ALERT:
wwwhost;DOWN;HARD;3;CRITICAL - 10.3.1.11: rta nan, lost 100%

                [11-15-2010 23:53:29] SERVICE ALERT: wwwhost;Counter:
IIS Web Connections;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10
seconds

                [11-15-2010 23:53:09] HOST ALERT:
wwwhost;DOWN;SOFT;2;CRITICAL - 10.3.1.11: rta 418.803ms, lost 80%

                [11-15-2010 23:52:49] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 1618.861ms, lost 80%

                [11-15-2010 23:51:59] HOST ALERT: wwwhost;UP;SOFT;2;OK -
10.3.1.11: rta 0.893ms, lost 0%

                [11-15-2010 23:51:41] HOST ALERT:
wwwhost;DOWN;SOFT;1;CRITICAL - 10.3.1.11: rta 724.543ms, lost 80%

                [11-15-2010 23:51:29] SERVICE ALERT: wwwhost;Counter:
IIS Web Connections;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10
seconds


Nothing in this message is intended to make or accept an offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately.

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20101116/f6e50e93/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list