Notification Problem

Shaun Martin smartin at akazaresearch.com
Tue Sep 2 19:01:33 CEST 2008


Hi Marc,

Thanks for looking at my problem as it is driving me nuts why one hosts
alerts and the other does not. I have provided the requested info below.

> Please post --
> the service template definition
define service{
        name                            generic-service     ; The 'name' of
this service templ
ate
        active_checks_enabled           1               ; Active service
checks are enable
d
        passive_checks_enabled          1                   ; Passive
service checks are enabl
ed/accepted
        parallelize_check               1               ; Active service
checks should be 
parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1               ; We should obsess
over this servi
ce (if necessary)
        check_freshness                 0               ; Default is to NOT
check service 
'freshness'
        notifications_enabled           1               ; Service
notifications are enable
d
        event_handler_enabled           1               ; Service event
handler is enabled
        flap_detection_enabled          1               ; Flap detection is
enabled
        failure_prediction_enabled      1               ; Failure prediction
is enabled
        process_perf_data               1               ; Process
performance data
        retain_status_information       1               ; Retain status
information across
 program restarts
        retain_nonstatus_information    1               ; Retain non-status
information ac
ross program restarts
        is_volatile                     0               ; The service is not
volatile
        check_period                    24x7            ; The service can be
checked at an
y time of the day
        max_check_attempts              3            ; Re-check the service
up to 3 tim
es in order to determine its final (hard) state
        normal_check_interval           10            ; Check the service
every 10 minut
es under normal conditions
        retry_check_interval            2            ; Re-check the service
every two m
inutes until a hard state can be determined
        contact_groups                  admins            ; Notifications
get sent out to ev
eryone in the 'admins' group
    notification_options        w,u,c,r            ; Send notifications
about warning
, unknown, critical, and recovery events
        notification_interval           60            ; Re-notify about
service problems
 every hour
        notification_period             24x7            ; Notifications can
be sent out at
 any time
         register                        0               ; DONT REGISTER
THIS DEFINITION - 
ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


# Local service definition template - This is NOT a real service, just a
template!

define service{
    name                local-service         ; The name of this service
templat
e
    use                generic-service        ; Inherit default values from
the 
generic-service definition
        max_check_attempts              4            ; Re-check the service
up to 4 tim
es in order to determine its final (hard) state
        normal_check_interval           5            ; Check the service
every 5 minute
s under normal conditions
        retry_check_interval            1            ; Re-check the service
every minut
e until a hard state can be determined
        register                        0               ; DONT REGISTER THIS
DEFINITION - 
ITS NOT A REAL SERVICE, JUST A TEMPLATE!
    }
define service{
        name                            local-service-isovera           ;
The name of this service
 template
        use                             generic-service         ; Inherit
default values from the
generic-service definition
        max_check_attempts              3                       ; Re-check
the service up to 4 tim
es in order to determine its final (hard) state
        normal_check_interval           5                       ; Check the
service every 5 minute
s under normal conditions
        retry_check_interval            1                       ; Re-check
the service every minut
e until a hard state can be determined
        register                        0                       ; DONT
REGISTER THIS DEFINITION -
ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        contact_groups            isovera
    }


> the service definition
> the host template definition
define host{
        name                            generic-host    ; The name of this
host template
        notifications_enabled           1           ; Host notifications are
enabled
        event_handler_enabled           1           ; Host event handler is
enabled
        flap_detection_enabled          1           ; Flap detection is
enabled
        failure_prediction_enabled      1           ; Failure prediction is
enabled
        process_perf_data               1           ; Process performance
data
        retain_status_information       1           ; Retain status
information across program
 restarts
        retain_nonstatus_information    1           ; Retain non-status
information across pro
gram restarts
    notification_period        24x7        ; Send host notifications at any
time
        register                        0           ; DONT REGISTER THIS
DEFINITION - ITS NOT
A REAL HOST, JUST A TEMPLATE!
        }


define host{
        name                            linux-server-isovera    ; The name
of th
is host template
        use                             generic-host    ; This template
inherits
 other values from the generic-host template
        check_period                    24x7            ; By default, Linux
host
s are checked round the clock
        check_interval                  5               ; Actively check the
hos
t every 5 minutes
        retry_interval                  1               ; Schedule host
check re
tries at 1 minute intervals
        max_check_attempts              10              ; Check each Linux
host 
10 times (max)
        check_command                   check-host-alive ; Default command
to check Linux hosts
        notification_period             24x7       ; Linux admins hate to be
woken up, so we only
notify during the day
                                                        ; Note that the
notification_period variab
le is being overridden from
                                                       ; the value that is
inherited from the gene
ric-host template!
        notification_interval           120             ; Resend
notifications every 2 hours
        notification_options            d,u,r           ; Only send
notifications for specific hos
t states
        contact_groups                  isovera          ; Notifications get
sent to the admins by
 default
        register                        0               ; DONT REGISTER THIS
DEFINITION - ITS NOT
A REAL HOST, JUST A TEMPLATE!
        }


> the host and service definition
############################################################################
###
############################################################################
###
#
# HOST DEFINITION
#
############################################################################
###
############################################################################
###

# Define a host for the local machine

define host{
        use                     linux-server
        host_name               www.localhost.org
        alias                   www.localhost.org
        address                 198.xxx.xxx.xx
        }


############################################################################
###
############################################################################
###
#
# SERVICE DEFINITIONS
#
############################################################################
###
############################################################################
###


# Define a service to "ping" the local machine


# Define a service to check the disk space of the root partition
# on the local machine.  Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             / Partition
    check_command            check_nrpe!check_disk1
        }

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             /usr Partition
    check_command            check_nrpe!check_disk4
        }

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             /var Partition
    check_command            check_nrpe!check_disk5
        }
define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             /var/run Partition
    check_command            check_nrpe!check_disk9
        }
define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             /opt Partition
    check_command            check_nrpe!check_disk8
        }
define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             /tmp Partition
    check_command            check_nrpe!check_disk3
        }
define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             /home Partition
    check_command            check_nrpe!check_disk6
        }

# Define a service to check the number of currently logged in
# users on the local machine.  Warning if > 20 users, critical
# if > 50 users.

define service{
        use                             local-service
        host_name                       www.localhost.org
        service_description             Current Users
    check_command            check_nrpe!check_users
        }


# Define a service to check the number of currently running procs
# on the local machine.  Warning if > 250 processes, critical if
# > 400 users.
define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             Local Processes
    check_command            check_nrpe!check_local_procs
        }

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             Total Processes
    check_command            check_nrpe!check_total_procs
        }

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             Zombie Processes
    check_command            check_nrpe!check_zombie_procs
        }


# Define a service to check the load on the local machine.

define service{
        use                             local-service-isovera
        host_name                      www.localhost.org
        service_description             Current Load
    check_command            check_nrpe!check_local_load
        }



# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is
free

define service{
        use                             local-service
        host_name                       www.localhost.org
        service_description             Memory Usage
    check_command            check_nrpe!check_mem
        }

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             Swap Usage
    check_command            check_nrpe!check_local_swap
        }




# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may
have SSH enabled.

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             SSH
    check_command            check_ssh
        }



# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may
have HTTP enabled.

define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             HTTP
    check_command            check_http
        }


define service{
        use                             local-service-isovera
        host_name                       www.localhost.org
        service_description             Check Domain
    check_command            check_domain!biosciednet.org
    }


> the contactgroup definition

define contactgroup{
        contactgroup_name       isovera
        alias                   Nagios Administrators
        members                 sdemi,nagiosadmin
        }

efine contactgroup{
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 nagiosadmin
        }
> the contact definition

define contact{
        name                            generic-contact        ; The name of
th
is contact template
        service_notification_period     24x7            ; service notifi
cations can be sent anytime
        host_notification_period        24x7            ; host notificat
ions can be sent anytime
        service_notification_options    w,u,c,r,f,s        ; send notificat
ions for all service states, flapping events, and scheduled downtime events
        host_notification_options       d,u,r,f,s        ; send notificat
ions for all host states, flapping events, and scheduled downtime events
        service_notification_commands   notify-service-by-email    ; send
service n
otifications via email
        host_notification_commands      notify-host-by-email    ; send host
noti
fications via email
        register                        0               ; DONT REGISTER
THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
        }

define contact{
        contact_name                    sdemi
        use                             generic-contact
        alias                           Isovera Admin2
        email                           sdemi at isovera.com
        }

define contact{
        contact_name                    nagiosadmin        ; Short name of
user
    use                generic-contact        ; Inherit default values from
gene
ric-contact template (defined above)
        alias                           Nagios Admin        ; Full name of
user

        email                           smartin at akazaresearch.com    ;
<<***** CHANGE THIS TO Y
OUR EMAIL ADDRESS ******
        }




> the Service State Information from the web gui when it should have
> sent a notification (click on the service name)
Service State Information
Current Status:    
  WARNING  
 (for 3d 22h 36m 52s)
Status Information:    USERS WARNING - 1 users currently logged in
Performance Data:    users=1;1;10;0
Current Attempt:    1/4  (HARD state)
Last Check Time:    09-02-2008 12:55:00
Check Type:    ACTIVE
Check Latency / Duration:    0.205 / 0.186 seconds
Next Scheduled Check:      09-02-2008 13:00:00
Last State Change:    08-29-2008 14:22:33
Last Notification:    N/A (notification 0)
Is This Service Flapping?
  NO  
 (0.00% state change)
In Scheduled Downtime?
  NO  
Last Update:    09-02-2008 12:59:16  ( 0d 0h 0m 9s ago)
Active Checks:    
  ENABLED  
Passive Checks:    
  ENABLED  
Obsessing:    
  ENABLED  
Notifications:    
  ENABLED  
Event Handler:    
  ENABLED  
Flap Detection:    
  ENABLED  

> any nagios.log entries for the service when it should have sent a
> notification

That is the thing if I click on notifications it does not even report
sending one. So I know it is now an email issue as nagios never even tries
to send a notification. The weirdest part is I have other hosts using the
exact same templates that do send and log that they send notifications for
hosts and services. This box only sends and logs host notifications.

Thanks for all your help.

Thanks,
Shaun



On 8/29/08 6:00 PM, "Marc Powell" <marc at ena.com> wrote:

> 
> On Aug 29, 2008, at 1:48 PM, Shaun Martin wrote:
> 
>> Hi All,
>> 
>> So I am using all templates and I just added a new host today. Using
>> the same service and host templates as all my other hosts. Those
>> other hosts send me out notifications when a service hits warning or
>> critical. My new host only seems to send out host notifications and
>> not service notifications. Like I said the same template is used for
>> this new host and my old hosts for services. I am using the nrpe
>> agent, the only real difference about this machine is it is Sun OS,
>> but the check_nrpe ­c returns the same values as it does on a linux
>> box so I do not think that is the issue. Also I noticed even though
>> the service has been checked many times with a warning or critical
>> status, the current attempt never progresses of of one. I do a check
>> service now and wait for it to finish and the attempt is still 1/4.
>> So I do not know is that is the underlying issue. I have restarted
>> nagios after ever configuration change and I did a ps to make sure
>> their was no instance running before starting backup. Since I am
>> using templates I do not know what my problem is as every other
>> service notifies except services on this host which is using the
>> same host and service templates. I am bout to pull my hair out any
>> help would be appreciated.
> 
> 
> You'll have to provide more detailed information. The above is too
> vague to figure out what the underlying issue is. You've covered some
> of the bases though.
> 
> Please post --
> the service template definition
> the service definition
> the host template definition
> the host definition
> the contactgroup definition
> the contact definition
> the Service State Information from the web gui when it should have
> sent a notification (click on the service name)
> any nagios.log entries for the service when it should have sent a
> notification
> 
> --
> Marc
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

-- 
Shaun Martin
Systems Administrator
Akaza Research
smartin at akazaresearch.com
Office: (617) 621-8585 x 13
Cell: (978) 360-3402
www.akazaresearch.com <http://www.akazaresearch.com/>
www.openclinica.org <http://www.openclinica.org/> 


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list