Host and Service checks (was: Fail error message - more interesting)

Rob Nelson rob at capband.net
Tue Jun 24 15:27:29 CEST 2003


>Look at the service/host check logic carefully. If you have a service,
>there must be a host check behind it. Conversely, (IIRC) if you have a
>host check, it will always be in an assumed up state until a service
>check goes down, triggering the host check.
>
>Long and short of it is, each host needs both a host check and one or
>more service checks, otherwise you will find you have hosts that do not
>come back up even after the service check has been fixed.

Can you expand on this some? I think I understand what you're saying, but I 
want to be clear. Let's assume the following checks on "node1"

from hosts.cfg:
===========
define host{
         use                     generic-host
         host_name               node1.sitename.com
         alias                   Node 1
         address                 10.10.12.1
         check_command           check-host-alive
         max_check_attempts      10
         notification_interval   120
         notification_period     24x7
         notification_options    d,u,r
         }
===========

from services.cfg:
============
define service{
         use                     generic-service
         host_name               node1.sitename.com
         service_description     ping
         is_volatile             0
         check_period            24x7
         max_check_attempts      3
         normal_check_interval   5
         retry_check_interval    2
         contact_groups          contact-group1
         notification_interval   120
         notification_period     24x7
         notification_options    w,u,c,r
         check_command           check_ping!1000.0,40%!2000.0,80%
         }
============

And of course, from checkcommands.cfg:
=============
# 'check_ping' command definition
define command{
         command_name    check_ping
         command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c 
$ARG2$ -p 10
         }

# 'check-host-alive' command definition
define command{
         command_name    check-host-alive
         command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% 
-c 5000.0,100% -p 5 -t 30
         }
=============

I modified check-host-alive because we're monitoring wireless ap's (which 
drop icmp packets priority if they get even slightly busy) over a VPN. If I 
leave it at default, often times the first icmp packet from a host gets 
dropped.



Could you or someone else elaborate on what exactly Nagios will do when it 
turns on, as in a timeline of host and service checks? Assume the host is 
up initially, drops to critical after an hour, rises to warning 30 minutes 
after that, and 30 minutes after that (2 hours from start), goes back to 
normal. I think I had some misconceptions about when host/service checks 
were performed but I just want to make sure I don't read you wrong and pick 
up more misconceptions :)

Rob Nelson
Network Administrator, Capitol Broadband
C: 919-369-1874
rob at capband.net 



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list