Host monitoring

Grant Lowe glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org
Fri Oct 24 22:19:57 CEST 2008


Hi Andy,

How about this, to provide a clue, but maybe add confusion, too    I have four UNIX boxes that I'm monitoring.  two are ongoing and having this problem that I've asked your help with; they send email once an hour.  One box sends me email a few times an hour.  The fourth box is just fine and never sends me anything - it's always up and has no reason to notify me.




----- Original Message ----
From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org; nagios-user Mailinglist <nagios-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org>
Sent: Thursday, October 23, 2008 1:48:33 PM
Subject: Re: [Nagiosplug-help] Host monitoring

Hi Grant,

That is weird - according to that log file, Nagios hasn't notified you 
at all today (it should say HOST/SERVICE NOTIFICATION for every 
notification it sends out.)

However, your services are alerting on every OK result - if you convert 
the timestamps for your ping service you'll notice it's every 5 minutes 
- which I'm guessing is your service check interval.

I have absolutely no idea why Nagios thinks that an OK state is an alert 
though.  Does anyone with more experience than me have any ideas?  
(Copied in to nagios-users as is it seems more an issue with Nagios than 
the plugins.)  It could be something dead simple but I'm not seeing it!

Thanks,

Andy

Grant Lowe wrote:
> Hi Andy,
>
> This is peculiar.  I look at the GUI and it says, from the first day I installed Nagios:
>
> Alert Notifications 
> File: 
> /usr/local/nagios/var/archives/nagios-09-23-2008-00.log  
> Notification detail level for all 
> hosts: 
> All notificationsAll service notificationsAll host notificationsService customService acknowledgementsService warningService unknownService criticalService 
> recoveryService flappingHost 
> customHost acknowledgementsHost downHost unreachableHost recoveryHost 
> flapping 
> Older Entries First:      
> Host
> Service
> Type
> Time
> Contact
> Notification Command
> Information
> No notifications have been recorded in this archived log 
> file
>
> But if I look at the file in question, I see this:
>
> nagios-09-23-2008-00.log:[1222095978] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.38 ms
> nagios-09-23-2008-00.log:[1222096228] SERVICE ALERT: blarney;ssh;OK;HARD;1;SSH OK -  (protocol 1.5)
> nagios-09-23-2008-00.log:[1222096278] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.28 ms
> nagios-09-23-2008-00.log:[1222096578] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.61 ms
> nagios-09-23-2008-00.log:[1222096878] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.29 ms
> nagios-09-23-2008-00.log:[1222097178] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.27 ms
> nagios-09-23-2008-00.log:[1222097478] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.31 ms
> nagios-09-23-2008-00.log:[1222097778] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.35 ms
> nagios-09-23-2008-00.log:[1222098078] SERVICE ALERT: blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 3.22 ms
>
> This is weird.  Does this help you to help me?
>
>
> ----- Original Message ----
>
> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
> Sent: Thursday, October 23, 2008 11:31:21 AM
> Subject: Re: [Nagiosplug-help] Host monitoring
>
> Hi Grant,
>
> Use the Nagios GUI - it's the "Alert History" option in the Reporting 
> menu - navigate back to when you first received the e-mails for that 
> host and see what the status change was like.  e.g. here's a sample from 
> mine when my co-lo host's router had a reboot overnight:
>
> [22-10-2008 01:29:41] HOST ALERT: Telehouse Router 2;UP;HARD;1;PING OK - 
> Packet loss = 0%, RTA = 0.34 ms
> [22-10-2008 01:29:31] HOST ALERT: Sydney;UP;HARD;1;PING OK - Packet loss 
> = 0%, RTA = 21.60 ms
> [22-10-2008 01:26:41] HOST ALERT: Sydney;UNREACHABLE;HARD;3;PING 
> CRITICAL - Packet loss = 100%
> [22-10-2008 01:26:31] HOST ALERT: Telehouse Router 
> 2;DOWN;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
> [22-10-2008 01:25:51] HOST ALERT: Sydney;UNREACHABLE;SOFT;2;PING 
> CRITICAL - Packet loss = 100%
>
> You could also look at the Event Log option for the same time period 
> which will also list the notifications Nagios sent out:
>
> [22-10-2008 01:29:41] HOST NOTIFICATION: Andy Shellam;Telehouse Router 
> 2;UP;notify-host-problem;PING OK - Packet loss = 0%, RTA = 0.34 ms
> [22-10-2008 01:29:41] HOST ALERT: Telehouse Router 2;UP;HARD;1;PING OK - 
> Packet loss = 0%, RTA = 0.34 ms
> [22-10-2008 01:29:31] HOST NOTIFICATION: Andy 
> Shellam;Sydney;UP;notify-host-problem;PING OK - Packet loss = 0%, RTA = 
> 21.60 ms
> [22-10-2008 01:29:31] HOST ALERT: Sydney;UP;HARD;1;PING OK - Packet loss 
> = 0%, RTA = 21.60 ms
> [22-10-2008 01:26:41] HOST NOTIFICATION: Andy 
> Shellam;Sydney;UNREACHABLE;notify-host-problem;PING CRITICAL - Packet 
> loss = 100%
> [22-10-2008 01:26:41] HOST ALERT: Sydney;UNREACHABLE;HARD;3;PING 
> CRITICAL - Packet loss = 100%
> [22-10-2008 01:26:31] HOST NOTIFICATION: Andy Shellam;Telehouse Router 
> 2;DOWN;notify-host-problem;CHECK_NRPE: Socket timeout after 10 seconds.
> [22-10-2008 01:26:31] HOST ALERT: Telehouse Router 
> 2;DOWN;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds.
> [22-10-2008 01:25:51] HOST ALERT: Sydney;UNREACHABLE;SOFT;2;PING 
> CRITICAL - Packet loss = 100%
>
> Andy
>
> Grant Lowe wrote:
>  
>> Hey Andy,
>>
>> Which file in /usr/local/nagios/var should I be looking at?  Is it the log file from the archives directory?  If so, then what sort of string should I be looking for?  
>>
>> grant
>>
>>
>> ----- Original Message ----
>> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
>> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
>> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>> Sent: Tuesday, October 21, 2008 2:42:43 PM
>> Subject: Re: [Nagiosplug-help] Host monitoring
>>
>> Hi Grant,
>>
>> That's what I was afraid of!  Your mail commands are using the 
>> $NOTIFICATIONTYPE$ macro which is where your PROBLEM text comes from - 
>> in that command definition you can customise the template of the mail 
>> that goes out.
>>
>> Unfortunately I have no idea why Nagios is classing a host up state as a 
>> problem, instead of a recovery.  Can you review the history of that 
>> host/service shortly before the alert got to you using the Nagios "Alert 
>> History" GUI?
>>
>> What version of Nagios is this on?
>>
>> Thanks,
>>
>> Andy
>>
>> Grant Lowe wrote:
>>  
>>    
>>> Ok, Andy.  Here they are.
>>>
>>> # 'notify-by-email' command definition
>>> define command{
>>>        command_name    notify-by-email
>>>        command_line    /usr/bin/printf "%b" "***** Nagios @VERSION@ *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | @MAIL_PROG@ -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
>>>         }
>>>
>>> # 'notify-host-by-email' command definition
>>> define command{
>>>         command_name    notify-host-by-email
>>>         command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
>>>         }
>>>
>>> # 'notify-service-by-email' command definition
>>> define command{
>>>         command_name    notify-service-by-email
>>>         command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
>>>         }
>>>
>>> Thanks, Andy!
>>>
>>>
>>>
>>> ----- Original Message ----
>>> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
>>> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
>>> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>>> Sent: Tuesday, October 21, 2008 1:32:09 PM
>>> Subject: Re: [Nagiosplug-help] Host monitoring
>>>
>>> Hi Grant,
>>>
>>> Your contact has the commands "notify-service-by-email" and 
>>> "notify-host-by-email" set for the notifications.  These should be 
>>> present in your commands.cfg file, so we need to see the command_line 
>>> definitions for each of these commands - this is the server command-line 
>>> that is executed to send you the notifications.
>>>
>>> Regards,
>>>
>>> Andy
>>>
>>> Grant Lowe wrote:
>>>  
>>>    
>>>      
>>>> Hi Andy,
>>>>
>>>> Here's the generic-contact from the template:
>>>>
>>>> define contact{
>>>>         name                            generic-contact         ; The name of this contact template
>>>>         service_notification_period     24x7                    ; service notifications can be sent anytime
>>>>         host_notification_period        24x7                    ; host notifications can be sent anytime
>>>>         service_notification_options    w,u,c,r,f,s             ; send notifications for all service states, flapping events, and scheduled downtime events
>>>>         host_notification_options       d,u,r,f,s               ; send notifications for all host states, flapping events, and scheduled downtime events
>>>>         service_notification_commands   notify-service-by-email ; send service notifications via email
>>>>         host_notification_commands      notify-host-by-email    ; send host notifications via email
>>>>         register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
>>>>         }
>>>>
>>>> As far as command_line definitions that use this one, there aren't any that I can see.  Unless I'm missing something. 
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
>>>> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
>>>> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>>>> Sent: Tuesday, October 21, 2008 12:18:18 PM
>>>> Subject: Re: [Nagiosplug-help] Host monitoring
>>>>
>>>> Hi Grant,
>>>>
>>>> OK these notification commands aren't defined for your contact - can you 
>>>> post the definition of the generic-contact contact template, as well as 
>>>> the command_line definitions for the notification commands attached to 
>>>> that command?
>>>>
>>>> Andy
>>>>
>>>> Grant Lowe wrote:
>>>>  
>>>>    
>>>>      
>>>>        
>>>>> Hi Andy,
>>>>>
>>>>> Here's the contact info for me in Nagios:
>>>>>
>>>>> define contact{
>>>>>         contact_name                    nagiosadmin                     ; Short name of user
>>>>>         use                             generic-contact         ; Inherit default values from generic-contact template (defined above)
>>>>>         alias                           Nagios Admin            ; Full name of user
>>>>>
>>>>>         email                          glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org      ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
>>>>>         }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
>>>>> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
>>>>> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>>>>> Sent: Tuesday, October 21, 2008 11:09:56 AM
>>>>> Subject: Re: [Nagiosplug-help] Host monitoring
>>>>>
>>>>> Hi Grant,
>>>>>
>>>>> What is your definition of the _contact_ glowe?  That definition should 
>>>>> have a service/host notification command attached to it, please send 
>>>>> those command's command_line definitions.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Andy
>>>>>
>>>>> Grant Lowe wrote:
>>>>>  
>>>>>    
>>>>>      
>>>>>        
>>>>>          
>>>>>> Hi Andy,
>>>>>>
>>>>>> Here's my host definition:
>>>>>>
>>>>>> define host {
>>>>>> host_name                         myhost
>>>>>> alias                             myhost
>>>>>> display_name                      My Host
>>>>>> address                           172.20.8.215
>>>>>> hostgroups                        solaris-servers
>>>>>> check_command                     check-host-alive
>>>>>> initial_state                     o
>>>>>> max_check_attempts                5
>>>>>> check_interval                    3
>>>>>> retry_interval                    3600
>>>>>> active_checks_enabled             0
>>>>>> passive_checks_enabled            1
>>>>>> check_period                      24x7
>>>>>> obsess_over_host                  0
>>>>>> check_freshness                   0
>>>>>> event_handler_enabled             0
>>>>>> flap_detection_enabled            0
>>>>>> flap_detection_options            o,d,u
>>>>>> process_perf_data                 1
>>>>>> retain_status_information         1
>>>>>> retain_nonstatus_information      0
>>>>>> contacts                          glowe
>>>>>> notification_interval             300
>>>>>> notification_period               24x7
>>>>>> notification_options              d,u,r,f,s
>>>>>> notifications_enabled             1
>>>>>> stalking_options
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Here's my service definition:
>>>>>>
>>>>>> define service{
>>>>>> host_name                       blarney
>>>>>> hostgroup_name                  solaris-servers
>>>>>> service_description             Ping
>>>>>> check_command                   check_ping!200.0,20%!600.0,60%
>>>>>> max_check_attempts              5
>>>>>> notification_interval           60
>>>>>> check_period                    24x7
>>>>>> }
>>>>>>
>>>>>> Thanks for the help!
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----
>>>>>> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
>>>>>> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
>>>>>> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>>>>>> Sent: Monday, October 20, 2008 1:44:21 PM
>>>>>> Subject: Re: [Nagiosplug-help] Host monitoring
>>>>>>
>>>>>> Hi Grant,
>>>>>>
>>>>>> Have a look at your contact definition, at the service and host 
>>>>>> notification commands - look those up in your commands.cfg (or whatever 
>>>>>> your command file is) and that should point to a command_line that sends 
>>>>>> the e-mail (using /bin/mail or similar.)  It may even be a shell 
>>>>>> script.  Either way, we'd need to see your command definition to try and 
>>>>>> work out what's going on here.
>>>>>>
>>>>>> Andy
>>>>>>
>>>>>> Grant Lowe wrote:
>>>>>>  
>>>>>>    
>>>>>>      
>>>>>>        
>>>>>>          
>>>>>>            
>>>>>>> Hi Andy,
>>>>>>>
>>>>>>> I'm looking at all the command definitions and nothing is in there that I can see about retaining PROBLEM data.  I do have the notifications set to 60 minutes and that's when I receive the email.  But it always says PROBLEM in the email I receive.  Maybe that's the problem?  Is there a way to set it to a different string?  Or is that opening up a can of worms?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message ----
>>>>>>> From: Andy Shellam <andy-lists-tHksloyy3C0+BY+yDiVN7w at public.gmane.org>
>>>>>>> To: Grant Lowe <glowe-rphTv4pjVZMJGwgDXS7ZQA at public.gmane.org>
>>>>>>> Cc: nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>>>>>>> Sent: Monday, October 20, 2008 11:34:05 AM
>>>>>>> Subject: Re: [Nagiosplug-help] Host monitoring
>>>>>>>
>>>>>>> Hi Grant,
>>>>>>>
>>>>>>> What are your notification options for the host, and your notification 
>>>>>>> command?  It's possible that the host/s in question has gone down and 
>>>>>>> Nagios is reporting it has returned to an UP state, but your 
>>>>>>> notification command is hard-coded to say PROBLEM.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> Grant Lowe wrote:
>>>>>>>  
>>>>>>>    
>>>>>>>      
>>>>>>>        
>>>>>>>          
>>>>>>>            
>>>>>>>              
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Another question for you all.  On some hosts, I keep on getting a notification that reads:
>>>>>>>>
>>>>>>>> ** PROBLEM Host Alert: myserver is UP **
>>>>>>>>
>>>>>>>> I'm trying to figure out why Nagios is generating these errors, when the host is obviously up.  Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------
>>>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>>>>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>>>>>>>> Grand prize is a trip for two to an Open Source event anywhere in the world
>>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>>>>> _______________________________________________
>>>>>>>> Nagiosplug-help mailing list
>>>>>>>> Nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/nagiosplug-help
>>>>>>>> ::: Please include plugins version (-v) and OS when reporting any issue. 
>>>>>>>> ::: Messages without supporting info will risk being sent to /dev/null
>>>>>>>>  
>>>>>>>>    
>>>>>>>>      
>>>>>>>>        
>>>>>>>>          
>>>>>>>>            
>>>>>>>>              
>>>>>>>>                
>>>>>>>  
>>>>>>>    
>>>>>>>      
>>>>>>>        
>>>>>>>          
>>>>>>>            
>>>>>>>              
>>>>>>  
>>>>>>    
>>>>>>      
>>>>>>        
>>>>>>          
>>>>>>            
>>>>>  
>>>>>    
>>>>>      
>>>>>        
>>>>>          
>>>>  
>>>>    
>>>>      
>>>>        
>>>  
>>>    
>>>      
>>  
>>    
>
>
>  


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagiosplug-help mailing list
Nagiosplug-help-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f at public.gmane.org
https://lists.sourceforge.net/lists/listinfo/nagiosplug-help
::: Please include plugins version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list