host-down notification can take 50 mins to be sent

stucky stucky101 at gmail.com
Fri Jun 15 10:37:11 CEST 2007


Guys

I'm trying the latest stable 2.x version (2.9) and on top of the 2 already
existing default host templates I added a 3rd one since the documentation
states that there is no limit.

I added a host and started monitoring. When I took it down it took between 2
- 5 mins for the host down notification to come in.
However, later on I rebooted again and this time nothing came in. The nagios
log showed nothing about wanting to send a notification either. The box came
back without any
notification.
I took it down again later and waited - after 50 minutes I got a host down
notification. When I brought the host back I almost immediately got a host
up notification.

I removed one of the the templates to change the recursion level of the host
templates from 3 to 2 and tried again. I did 3 tests and all came back fine
this time. I always got the notification
within 5 minutes max.
Then I added the 3rd template back again to see whether it had to do with
that but now I can't reproduce this. I did 2 tests and both were fine.

I don't feel that I can trust nagios now though. I've been using it for a
few years now since version 1.2 and I've never seen this behaviour before.
However, I've also never used more than 1 host/service template. This time I
wanted to make more use of the object inheritance logic to shorten my cfg
but somehow I feel it causes problems.
How deep is the template recursion for most of you folks ?

Here are the templates I was using when the 50 min delay happened

Hosts :

# Host templates

define host{
        name                            generic-host
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        failure_prediction_enabled      1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_period             24x7
        register                        0
        }

define host{
        name                            generic-linux
        use                             generic-host
        check_period                    24x7
        max_check_attempts              10
        check_command                   check-host-alive
        notification_interval           120
        notification_options            d,u,r
        register                        0
        }

define host{
        name                            prod
        use                             generic-linux
        contact_groups                  sysadmins,psst
        register                        0
        }

define host{
        name                            nonprod
        use                             generic-linux
        contact_groups                  sysadmins
        register                        0
        }

Then I use either the prod or nonprod template for all my hosts.

same with services :

# Service templates

define service{
        name                            generic-service
        active_checks_enabled           1
        passive_checks_enabled          1
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 0
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        failure_prediction_enabled      1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        is_volatile                     0
        register                        0
        }

define service{
        name                            generic-checks
        use                             generic-service
        check_period                    24x7
        max_check_attempts              4
        normal_check_interval           5
        retry_check_interval            1
        notification_options            w,u,c,r
        notification_interval           60
        notification_period             24x7
        register                        0
        }


define service{
        name                            prod
        use                             generic-checks
        contact_groups                  sysadmins,psst
        register                        0
        }

define service{
        name                            nonprod
        use                             generic-checks
        contact_groups                  sysadmins
        register                        0
        }

Here I also use prod or nonprod as templates for my services.

I'm gonna test the more tomorrrow but I'm worried that if a host goes down I
might not get notified again until 50 mins later or maybe never who knows ?
It doesn't seem to behave the same way every time but as far as I see it the
service checks are every 5 minutes so within that time frame I should get a
notification.
Parallel checks is turned on as well.

Has anyone seen similar delays ?

-- 
stucky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070615/4f5c4f46/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list