service notification when host is down

Morris, Patrick patrick.morris at hp.com
Wed Feb 17 17:52:53 CET 2010


Samuel Bancal wrote:
> Nagios Core 3.2.0
> nagios-plugins-1.4.14
> Ubuntu server 8.04.3 LTS
>
> Hi,
>
> I'm encountering problems to configure the notifications in case a 
> server is no more responding to PING (ICMP).
> I don't understand why Nagios is jumping over steps when it's doing 
> service-check "icmp".
> Here is the config :
>
> define host{
>   use                    generic-server
>   host_name              server1
>   alias                  server1
>   address                the.ip.the.ip
>   hostgroups             prod-servers
>   contact_groups         group1
>   check_command          check-host-alive
>   check_period           24x7
>   check_interval         5
>   retry_interval         1
>   max_check_attempts     4
>   notification_period    24x7
>   notification_interval  60
>   notification_options   d,u,r
> }
>
> define service{
>   use                     generic-service
>   host_name               server1
>   service_description     ICMP
>   check_command           check_icmp!100.0,20%!500.0,60%
>   max_check_attempts      4
>   normal_check_interval   5
>   retry_check_interval    1
>   notification_options    w,u,c,r
>   notification_interval   60
>   notification_period     24x7
> }
> [...]
> define command{
>   command_name    check-host-alive
>   command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 
> 5000.0,100% -p 5
> }
> define command{
>   command_name    check_icmp
>   command_line    $USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$ -c 
> $ARG2$ -p 5
> }
> [...]
>
> Here is an example of history that I get :
> Service Critical[2010-02-16 11:33:13] SERVICE ALERT: 
> server1;ICMP;CRITICAL;SOFT;1;CRITICAL - the.ip.the.ip: rta nan, lost 100%
> Host Down[2010-02-16 11:33:43] HOST ALERT: server1;DOWN;SOFT;1;(Host 
> Check Timed Out)
> Service Critical[2010-02-16 11:34:13] SERVICE ALERT: 
> server1;ICMP;CRITICAL;HARD;1;CRITICAL - the.ip.the.ip: rta nan, lost 100%
> Host Down[2010-02-16 11:34:43] HOST ALERT: server1;DOWN;SOFT;2;(Host 
> Check Timed Out)
> Host Down[2010-02-16 11:35:23] HOST ALERT: server1;DOWN;SOFT;3;(Host 
> Check Timed Out)
> Host Down[2010-02-16 11:36:33] HOST ALERT: server1;DOWN;HARD;4;(Host 
> Check Timed Out)
> Host Up[2010-02-16 11:37:43] HOST ALERT: server1;UP;HARD;1;PING OK - 
> Packet loss = 0%, RTA = 0.67 ms
> Service Ok[2010-02-16 11:39:13] SERVICE ALERT: 
> server1;ICMP;OK;HARD;1;OK - the.ip.the.ip: rta 0.943ms, lost 0%
>
> Or later :
> Host Down[2010-02-16 11:42:03] HOST ALERT: server1;DOWN;SOFT;1;(Host 
> Check Timed Out)
> Host Down[2010-02-16 11:43:13] HOST ALERT: server1;DOWN;SOFT;2;(Host 
> Check Timed Out)
> Service Critical[2010-02-16 11:44:13] SERVICE ALERT: 
> server1;ICMP;CRITICAL;HARD;1;CRITICAL - the.ip.the.ip: rta nan, lost 100%
> Host Down[2010-02-16 11:44:43] HOST ALERT: server1;DOWN;SOFT;3;(Host 
> Check Timed Out)
> Host Up[2010-02-16 11:45:53] HOST ALERT: server1;UP;SOFT;4;PING OK - 
> Packet loss = 0%, RTA = 0.64 ms
> Service Ok[2010-02-16 11:49:13] SERVICE ALERT: 
> server1;ICMP;OK;HARD;1;OK - the.ip.the.ip: rta 0.948ms, lost 0%

If you're asking why Nagios runs a host check when it sees the service 
fail a check, that's normal behavior.

When a service check fails, the first thing Nagios will do is look to 
see if the service failed because the host is down.

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list