host down notification but no host up notification ?

stucky stucky101 at gmail.com
Wed Jun 13 03:35:59 CEST 2007


Guys

I'm testing nagios 3.0a and I'm thinking there is a notification bug.

I have the following config:

define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
        }

define contact{
        name                            generic-contact         ; The name
of this contact template
        service_notification_period     24x7                    ; service
notifications can be sent anytime
        host_notification_period        24x7                    ; host
notifications can be sent anytime
        service_notification_options    w,u,c,r,f,s             ; send
notifications for all service states, flapping events, and scheduled
downtime events
        host_notification_options       d,u,r,f,s               ; send
notifications for all host states, flapping events, and scheduled downtime
events
        service_notification_commands   notify-service-by-email ; send
service notifications via email
        host_notification_commands      notify-host-by-email    ; send host
notifications via email
        register                        0                       ; DONT
REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }

define contact{
        contact_name                    astuck
        use                             generic-contact
        alias                           SysAdmin1
        email                           {my email}
        }

define contactgroup{
        contactgroup_name       admins
        alias                   SysAdmins
        members                 astuck
        }

define host{
        name                            generic-host    ; The name of this
host template
        notifications_enabled           1               ; Host notifications
are enabled
        event_handler_enabled           1               ; Host event handler
is enabled
        flap_detection_enabled          1               ; Flap detection is
enabled
        failure_prediction_enabled      1               ; Failure prediction
is enabled
        process_perf_data               1               ; Process
performance data
        retain_status_information       1               ; Retain status
information across program restarts
        retain_nonstatus_information    1               ; Retain non-status
information across program restarts
        notification_period             24x7            ; Send host
notifications at any time
        register                        0               ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }

define host{
        name                            generic-linux
        use                             generic-host
        check_period                    24x7
        check_interval                  5
        retry_interval                  1
        max_check_attempts              10
        check_command                   check-host-alive
        notification_interval           120
        notification_options            d,u,r
        register                        0
        }

define host{
        name                            nonprod
        use                             generic-linux
        contact_groups                  admins
        register                        0
        }

define host{
        use                     nonprod
        host_name               lithium
        alias                   Oracle Dev 2
        address                 lithium
        }

As far as I see it I should get all host/service notification 24/7. However,
when I reboot 'lithium' I get a host down notification but when it comes
back
I don't get anything.
I turned on notification debugging :

[1181695731.149796:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1181695731.149852:032.0] Notification viability test passed.
[1181695731.149861:032.1] Current notification number: 1
[1181695731.149867:032.2] Creating list of contacts to be notified.
[1181695731.149873:032.1] Host notification will NOT be escalated.
[1181695731.149879:032.2] Adding contact 'astuck' to notification list.
[1181695731.149985:032.2] ** Attempting to notifying contact 'astuck'...
[1181695731.149994:032.2] ** Checking host notification viability for
contact 'astuck'...
[1181695731.150005:032.2] ** Host notification viability for contact
'astuck' PASSED.
[1181695731.150014:032.2] ** Notifying contact 'astuck'
[1181695731.150071:032.2] Raw Command: /usr/bin/printf "%b" "***** Nagios
*****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState:
$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time:
$LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert:
$HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
[1181695731.150078:032.2] Processed Command: /usr/bin/printf "%b" "*****
Nagios *****\n\nNotification Type: PROBLEM\nHost: lithium\nState:
DOWN\nAddress: lithium\nInfo: (No output returned from host
check)\n\nDate/Time: Tue Jun 12 17:48:51 PDT 2007\n" | /bin/mail -s "**
PROBLEM Host Alert: lithium is DOWN **" {my email}
[1181695731.194505:032.0] No contacts were notified.  Next possible
notification time: Tue Jun 12 19:48:51 2007
[1181695731.194527:032.0] 1 contacts were notified.[1181695741.047809:032.0]
** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1,
Last Notification: Tue Jun 12 17:48:51 2007
[1181695741.047834:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695741.047843:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695741.047850:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695751.160027:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695751.160058:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695751.160068:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695751.160074:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695811.210449:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695811.210479:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695811.210489:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695811.210495:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695821.068538:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695821.068569:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695821.068580:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695821.068586:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695821.068895:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695821.068915:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695821.068924:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695821.068931:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695831.174383:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695831.174418:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695831.174427:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695831.174434:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695831.174731:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695831.174745:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695831.174754:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695831.174760:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695851.144314:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695851.144338:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695851.144347:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695851.144354:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696025.034559:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /tmp', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696025.034582:032.1] We shouldn't notify about this recovery.
[1181696025.034589:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696031.130428:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'LOAD', Type: 0, Current State: 0, Last Notification:
Wed Dec 31 16:00:00 1969
[1181696031.130452:032.1] We shouldn't notify about this recovery.
[1181696031.130460:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696031.131081:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /usr/local', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696031.131095:032.1] We shouldn't notify about this recovery.
[1181696031.131102:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696111.052735:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'CFENVD', Type: 0, Current State: 0, Last Notification:
Wed Dec 31 16:00:00 1969
[1181696111.052759:032.1] We shouldn't notify about this recovery.
[1181696111.052766:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696111.052971:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'PERC CONTROLLER', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696111.052984:032.1] We shouldn't notify about this recovery.
[1181696111.052992:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696111.053334:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'CFEXECD', Type: 0, Current State: 0, Last Notification:
Wed Dec 31 16:00:00 1969
[1181696111.053348:032.1] We shouldn't notify about this recovery.
[1181696111.053355:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696121.163710:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'MEM', Type: 0, Current State: 0, Last Notification: Wed
Dec 31 16:00:00 1969
[1181696121.163738:032.1] We shouldn't notify about this recovery.
[1181696121.163746:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696121.163984:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /var', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696121.163998:032.1] We shouldn't notify about this recovery.
[1181696121.164005:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696141.130999:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696141.131023:032.1] We shouldn't notify about this recovery.
[1181696141.131031:032.0] Notification viability test failed.  No
notification will be sent out.

Clearly, nagios decided that I shouldn't get a host up notification. I just
don't understand why. From the log files I'd say the following logic takes
place :

1. Host goes down - service check fails
2. Nagios checks to see if host is down - YES
3. Because of step 2. no service notifications are sent
4. Host down notification is sent instead
5. Host comes back
6. Service checks start recovering - no service recovery notification is
sent since no service problem notifications were sent in the first place.
7. Host is assumed to be up since service is up
8. Hence - no host up notification.

First I thought my host up notification might not make it through one of the
notification filters but according to the log there is NO HOST check after
the reboot therefore
there is no host notification attempt.
Looks to me like a design bug but I wanna make sure I'm not getting this
wrong. It just doesn't make sense to me that I wouldn't be notified
about a host coming back. I understand the part about the services.

INTERESTING: I have rebooted a few times and it appears that sometimes I do
get host up notifications but most of the time I don't so it seems to have
to do with
when exactly the reboot occurs.
Also, I turned off flapping globally but no difference.

Anyone seen this behaviour ?
-- 
stucky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070612/a4088a2b/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list