What the...

Russell Scibetti russell at quadrix.com
Thu Oct 10 20:40:32 CEST 2002
Previous message: What the...
Next message: What the...
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The only time nagios will stop doing service checks at the 
normal_check_interval for that service is if that service has a 
servicedependency that's execution failure criteria is true.

Otherwise, service checks will continue as planned.  The way nagios 
knows that a host has come back up is if any service on that host has 
recovered to OK.  While a host and its services are down, when a service 
check occurs, it won't go through all the retries (already in a hard 
state - no need to retry), but it will check the service once,

Also, do you have aggressive_host_checking enabled in your nagios.cfg? 
 The only reason I can guess that the host check is also occurring when 
the service check occurs is that you have that setting enabled. 
 Otherwise a host will only get checked after the first service check 
failure (when the host is still up).

Hope this helps.

-Russell

Bishop, Dean wrote:

> First, sorry bout the subject i realize that it is inappropriate.  it 
> does, however capture my initial response.
>
> We are in the midst of many nightmares concurrently: smoking servers, 
> irreplaceable data lost, network latency, cold lunch, sore finger, you 
> know the whole gambut at once.
>
> apologies to all.
>
> here is another entry from my logs.  Each host is dependant on the 
> previously numbered host (e.g. Marshall-McLuhan-0561SW2A_4-HS7 is the 
> parent of Marshall-McLuhan-0561SW2A_5-HS7 who is the parent of 
> Marshall-McLuhan-0561SW2A_6-HS7, etc.
>
> why, once Marshall-McLuhan-0561SW2A_14-HS7 is determined to be 
> UNREACHABLE (due to the failure of Marshall-McLuhan-0561SW2A_4-HS7), 
> is the service checked on Marshall-McLuhan-0561SW2A_14-HS7?
>
>
>
> [1034172479] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_14-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed 
> out after 18 seconds
> [1034172516] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_7-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed 
> out after 18 seconds
> [1034172552] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_6-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed 
> out after 18 seconds
> [1034172588] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_5-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed 
> out after 18 seconds
> [1034172624] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_4-HS7;DOWN;SOFT;1;CRITICAL - Plugin timed 
> out after 18 seconds
> [1034172644] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_4-HS7;DOWN;HARD;2;CRITICAL - Plugin timed 
> out after 18 seconds
> [1034172644] HOST NOTIFICATION: 
> nagiosadmin;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172645] HOST NOTIFICATION: 
> Marco;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172645] HOST NOTIFICATION: 
> Kevin-NonCritical;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;notify-by-epager;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172645] HOST NOTIFICATION: 
> Kevin;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172646] HOST NOTIFICATION: 
> Keith-NonCritical;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;notify-by-epager;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172646] HOST NOTIFICATION: 
> Keith;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172646] HOST NOTIFICATION: 
> Ben;Marshall-McLuhan-0561SW2A_4-HS7;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 18 seconds
> [1034172647] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_5-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin 
> timed out after 18 seconds
> [1034172647] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_6-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin 
> timed out after 18 seconds
> [1034172647] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_7-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin 
> timed out after 18 seconds
> [1034172647] HOST ALERT: 
> Marshall-McLuhan-0561SW2A_14-HS7;UNREACHABLE;HARD;2;CRITICAL - Plugin 
> timed out after 18 seconds
> [1034172647] SERVICE ALERT: Marshall-McLuhan-0561SW2A_14-HS7;Port 
> Check-23;CRITICAL;HARD;1;Socket timeout after 10 seconds
>
>
> -----Original Message-----
> From: Bishop, Dean
> Sent: Thursday, October 10, 2002 1:04 PM
> To: 'nagios-users at lists.sourceforge.net'
> Subject: What the *&#( !!
> Importance: High
>
>
> Can someone explain this to me??
>
>
> why in the world is the service for testserver01.tcdsb.org being 
> checked after the host has been determined down?
> also why is the host being checked before the service??
>
>
>
>
> [root at NMS var]# tail nagios.log -n 3000 |grep testserver01
>
> [1034266896] HOST ALERT: testserver01.tcdsb.org;UP;HARD;1;(Host 
> assumed to be up)
> [1034266896] SERVICE ALERT: testserver01.tcdsb.org;Misc Servers - Port 
> Check 135;OK;HARD;1;TCP OK - 0 second response time on port 135
> [1034267924] HOST ALERT: testserver01.tcdsb.org;DOWN;SOFT;1;CRITICAL - 
> Plugin timed out after 8 seconds
> [1034267933] HOST ALERT: testserver01.tcdsb.org;DOWN;HARD;2;CRITICAL - 
> Plugin timed out after 8 seconds
> [1034267933] HOST 
> NOTIFICATION:nagiosadmin;testserver01.tcdsb.org;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 8 seconds
> [1034267934] HOST 
> NOTIFICATION:Keith;testserver01.tcdsb.org;DOWN;host-notify-by-email;CRITICAL 
> - Plugin timed out after 8 seconds
> [1034267934] SERVICE ALERT: testserver01.tcdsb.org;Misc Servers - Port 
> Check 135;CRITICAL;HARD;1;Socket timeout after 2 seconds
> [1034268938] HOST ALERT: testserver01.tcdsb.org;UP;HARD;1;PING OK - 
> Packet loss = 0%, RTA = 0.61 ms
> [1034268938] HOST 
> NOTIFICATION:nagiosadmin;testserver01.tcdsb.org;UP;host-notify-by-email;PING 
> OK - Packet loss = 0%, RTA = 0.61 ms
> [1034268938] HOST 
> NOTIFICATION:Keith;testserver01.tcdsb.org;UP;host-notify-by-email;PING 
> OK - Packet loss = 0%, RTA = 0.61 ms
> [1034268938] SERVICE ALERT: testserver01.tcdsb.org;Misc Servers - Port 
> Check 135;OK;HARD;1;TCP OK - 0 second response time on port 135
>
> [root at NMS var]#
>

-- 
Russell Scibetti
Quadrix Solutions, Inc.
http://www.quadrix.com
(732) 235-2335, ext. 7038


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20021010/8b035878/attachment.html>
Previous message: What the...
Next message: What the...
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list