parent/child setup not working

David Miller nagios at d.sparks.net
Sat Jan 6 02:51:28 CET 2007


Andy Shellam (Mailing Lists) wrote:
> If I understand it right, your host checks should not be scheduled - 
> but your service checks are.
> So, every time a service requires checking and Nagios finds the 
> service is down, it checks the host to see if the host is down.  If it 
> is, then it suppresses notifications for the service and instead goes 
> into the host's notification handling.
That makes some sense - all the hosts with host checks don't send 
unwanted notices.

I'm still missing something though - unless the design is that:

host checks aren't executed if the parent is down, but if no host check 
is specified the host is still presumed up.

and

service checks are performed as long as the host is known or presumed to 
be up.

But that would makes the parent relationship specification fairly useless.

>
> However I'm not sure if this is the case for escalated service 
> notifications.  You have a notification_interval set - try commenting 
> this out (or setting to 0) and see if you then get the same thing 
> happening.

It's a required field, so commenting it out doesn't work.  I set it to 
0, deleted the default route, and got the same result; Three notices 
that the pix was down, a five minute wait, and a notice that this host 
was down.


--- David

>
> Andy.
>
>
> David Miller wrote:
>> Andy Shellam (Mailing Lists) wrote:
>>
>> Arghh!  Sorry for the previous, content free reply.
>>
>> The service entry is;
>>
>> define service{
>>        use                             generic-service         ; Name 
>> of service template to use
>>        hostgroup_name                  webservers
>>        service_description             Check Simple Webservers
>>        is_volatile                     0
>>        check_period                    24x7
>>        max_check_attempts              5
>>        normal_check_interval           5
>>        retry_check_interval            2
>>        contact_groups                  ops
>>        notification_interval           120
>>        notification_period             24x7
>>        notification_options            w,u,c,r
>>        check_command                   check_http
>>        }
>>      But the point is, unless I'm missing something, that the service 
>> should not be checked at all if the parent is down.
>>
>> Thanks!
>>
>> --- David
>>
>>> Hi David,
>>>
>>> I'm not clued up on parent/child relationships between hosts, 
>>> however one thing I believe might be happening is that the example 
>>> of the alert you've sent for the service - it might be a "reminder" 
>>> notification that the service is still down.  (Perhaps as a result 
>>> of escalation settings?)
>>>
>>> I think this is because it has a delay in the state variable - ie. 
>>> "CRITICAL for xxxxx" as opposed to just "CRITICAL."
>>>
>>> What's the definition for that service?
>>>
>>> Andy.
>>>
>>>
>>> David Miller wrote:
>>>> Hi;
>>>>
>>>> I'm not sure what I'm doing wrong.
>>>>
>>>> Running nagios 2.5 on debian-stable.  I have the nagios server in 
>>>> one data center monitoring 30ish servers in another data center.
>>>>
>>>> In the hosts.cfg file I have a gateway (firewall) defined:
>>>>
>>>> define host {
>>>>         use                     generic-host    ; Name of host 
>>>> template to use
>>>>         host_name               pix
>>>>         alias                   PIX
>>>>         address                 x.y.z.2
>>>>         check_command           check-host-alive
>>>>         max_check_attempts      1
>>>>         notification_interval   1
>>>>         notification_period     24x7
>>>>         notification_options    d,u,r
>>>>         }
>>>>
>>>>
>>>> I then use that as a parent to all the hosts I want to monitor in 
>>>> the remote data center.  Those have host entries like this;
>>>>
>>>>
>>>> define host {
>>>>         use                     generic-host    ; Name of host 
>>>> template to use
>>>>         host_name               logweb1
>>>>         alias                   Logweb1
>>>>         address                 logweb1.foo.com
>>>>         parents                 pix
>>>>         max_check_attempts      1
>>>>         active_checks_enabled   0
>>>>         notification_interval   1
>>>>         notification_period     24x7
>>>>         notification_options    d,r
>>>>         }
>>>>
>>>> As I read the documentation, when nagios detects that host "pix" is 
>>>> down that it won't check or report on host logweb1.
>>>>
>>>> If the network connection is broken, however, by deleting the 
>>>> default route, I get three messages that the pix is down that look 
>>>> like this:
>>>>
>>>> Subject:** PROBLEM alert 1 - PIX host is DOWN **
>>>>
>>>> ***** Nagios  *****
>>>>
>>>> Notification Type: PROBLEM
>>>> Host: PIX
>>>> State: DOWN for 0d 0h 0m 0s
>>>> Address: 66.151.232.2
>>>> Info:
>>>>
>>>> CRITICAL - Network unreachable (x.y.z.2)
>>>>
>>>> Date/Time: Fri Jan 5 16:17:48 EST 2007
>>>>
>>>> ACK by: Comment:
>>>>
>>>> And a few minutes later I get notice on the child server:
>>>>
>>>> Subject: ** PROBLEM alert 1 - Logweb1/Check Simple Webservers is 
>>>> CRITICAL **
>>>>
>>>> ***** Nagios  *****
>>>>
>>>> Notification Type: PROBLEM
>>>>
>>>> Service: Check Simple Webservers
>>>> Host: Logweb1
>>>> State: CRITICAL for 0d 0h 8m 6s
>>>> Address: logweb1.foo.com
>>>>
>>>> Info:
>>>>
>>>> Network is unreachable
>>>>
>>>> Date/Time: Fri Jan 5 16:29:28 EST 2007
>>>>
>>>> ACK by: Comment:
>>>>
>>>> What am I doing wrong?
>>>>
>>>> Thanks in advance,
>>>>
>>>> --- David
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------- 
>>>>
>>>> Take Surveys. Earn Cash. Influence the Future of IT
>>>> Join SourceForge.net's Techsay panel and you'll get the chance to 
>>>> share your
>>>> opinions on IT & business topics through brief surveys - and earn cash
>>>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV 
>>>>
>>>> _______________________________________________
>>>> Nagios-users mailing list
>>>> Nagios-users at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>>> ::: Please include Nagios version, plugin version (-v) and OS when 
>>>> reporting any issue. ::: Messages without supporting info will risk 
>>>> being sent to /dev/null
>>>>
>>>>
>>>>
>>>>
>>>>   
>>>
>>>
>>
>>
>> !DSPAM:37,459ee03d137101012410913!
>>
>>
>
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list