Problem with availability calculation

Erik De Cock erik at xperimental.net
Wed Nov 12 12:19:58 CET 2003


Hello,

We have discovered a problem with the availability calculation through the webinterface.

The problem is caused by the fact that if the host is down, the services are not checked anymore.
'normally' this is not a problem, because nagios will detect that the service is down before it checks the host.
However, if the server comes up a few times for a short time, and goes down again, it is possible that (according to the logfile), the host is DOWN, and a service is UP. (and both stay that way for hours)
The result of this is that in the final report, the host was up 95% of the time, and the service is up 99%..

In my opinion, the availability algorithms should take the "host downtime" into account when calculating service availability (even if this specific problem should be caused by misconfigurations on my part.)

To be able to fix this in the short term, is there a way to make nagios keep checking services even while the host is down ? 

For your information, the complete logfile is attached below. it is quite complex and I can't say I find it very logical...

Thanks a lot,

Erik De Cock
IBM Belgium

PS: This logfile has been changed to make all SOFT OK's HARD OK's (s/OK;SOFT/OK;HARD/), but that's another problem ...

Tue Oct 14 15:34:29 CEST 2003 HOST ALERT: server1;DOWN;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Tue Oct 14 15:34:39 CEST 2003 HOST ALERT: server1;DOWN;SOFT;2;CRITICAL - Plugin timed out after 10 seconds
Tue Oct 14 15:34:50 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds

Tue Oct 14 15:34:51 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;HARD;1;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"

Tue Oct 14 15:35:09 CEST 2003 SERVICE ALERT: server1;PING;CRITICAL;HARD;1;CRITICAL - Plugin timed out after 9 seconds
Wed Oct 15 11:56:39 CEST 2003 HOST ALERT: server1;UP;HARD;1;(No output!)
Wed Oct 15 11:56:39 CEST 2003 SERVICE ALERT: server1;PING;OK;HARD;1;(No output!)
Wed Oct 15 11:58:39 CEST 2003 SERVICE ALERT: server1;WINS;OK;HARD;1;(No output!)

Wed Oct 15 12:46:58 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds
Wed Oct 15 12:47:58 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds

Wed Oct 15 23:46:59 CEST 2003 HOST ALERT: server1;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 153.79 ms
Wed Oct 15 23:47:00 CEST 2003 SERVICE ALERT: server1;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 167.23 ms
Wed Oct 15 23:47:59 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;1;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Wed Oct 15 23:48:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;2;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Wed Oct 15 23:49:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;3;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Wed Oct 15 23:50:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;4;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Wed Oct 15 23:51:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;5;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"

Thu Oct 16 00:08:08 CEST 2003 HOST ALERT: server1;DOWN;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:08:18 CEST 2003 HOST ALERT: server1;DOWN;SOFT;2;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:08:28 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:08:30 CEST 2003 SERVICE ALERT: server1;PING;CRITICAL;HARD;1;CRITICAL - Plugin timed out after 9 seconds

Thu Oct 16 00:13:58 CEST 2003 HOST ALERT: server1;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 178.42 ms
Thu Oct 16 00:14:00 CEST 2003 SERVICE ALERT: server1;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 163.97 ms
Thu Oct 16 00:16:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;1;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:18:08 CEST 2003 HOST ALERT: server1;DOWN;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:18:18 CEST 2003 HOST ALERT: server1;DOWN;SOFT;2;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:18:28 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:20:08 CEST 2003 SERVICE ALERT: server1;PING;CRITICAL;HARD;1;CRITICAL - Plugin timed out after 9 seconds
Thu Oct 16 00:31:59 CEST 2003 HOST ALERT: server1;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 160.14 ms
Thu Oct 16 00:32:00 CEST 2003 SERVICE ALERT: server1;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 149.05 ms
Thu Oct 16 00:32:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;1;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:33:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;2;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:34:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;3;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:35:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;4;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:36:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;5;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:47:08 CEST 2003 HOST ALERT: server1;DOWN;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:47:18 CEST 2003 HOST ALERT: server1;DOWN;SOFT;2;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:47:28 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 00:47:29 CEST 2003 SERVICE ALERT: server1;PING;CRITICAL;HARD;1;CRITICAL - Plugin timed out after 9 seconds
Thu Oct 16 00:49:59 CEST 2003 HOST ALERT: server1;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 145.19 ms
Thu Oct 16 00:50:00 CEST 2003 SERVICE ALERT: server1;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 145.76 ms
Thu Oct 16 00:52:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;1;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:53:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;2;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:54:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;3;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:55:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;4;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 00:56:58 CEST 2003 SERVICE ALERT: server1;WINS;CRITICAL;SOFT;5;Failed. WINS "1.2.3.4" failed to resolve "dc1", the domain controller(s) of "domain". Got "name_query failed to find name dc1#20"
Thu Oct 16 01:56:18 CEST 2003 HOST ALERT: server1;DOWN;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 01:56:28 CEST 2003 HOST ALERT: server1;DOWN;SOFT;2;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 01:56:39 CEST 2003 HOST ALERT: server1;DOWN;HARD;3;CRITICAL - Plugin timed out after 10 seconds
Thu Oct 16 01:56:40 CEST 2003 SERVICE ALERT: server1;PING;CRITICAL;HARD;1;CRITICAL - Plugin timed out after 9 seconds

Thu Oct 16 01:58:58 CEST 2003 HOST ALERT: server1;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 145.70 ms
Thu Oct 16 01:59:00 CEST 2003 SERVICE ALERT: server1;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 161.65 ms
Thu Oct 16 02:00:48 CEST 2003 SERVICE ALERT: server1;WINS;OK;HARD;1;Ok. Found controllers named "dc1" in response to "domain#1C" name queryfrom WINS named "1.2.3.4".




-------------------------------------------------------
This SF.Net email sponsored by: ApacheCon 2003,
16-19 November in Las Vegas. Learn firsthand the latest
developments in Apache, PHP, Perl, XML, Java, MySQL,
WebDAV, and more! http://www.apachecon.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list