Limiting mail notification for clock drift

Marc Powell marc at ena.com
Wed Nov 12 17:21:32 CET 2008
Previous message: Limiting mail notification for clock drift
Next message: Limiting mail notification for clock drift
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Nov 12, 2008, at 9:03 AM, Kenneth Holter wrote:

> I think I found one of the causes for the excessive notifacations -  
> the notification_interval was set to 0. For almost all our services  
> we have a high number (such as 1440), but this current value of zero  
> must have made it's way into the code somehow.

It should be left at 0 to get the behavior you're expecting. With  
notification_interval 0, a notification for a {WARNING | CRITICAL | OK/ 
RECOVERY} state will only be sent once per problem. That's what you  
want. When set at 1440, you'll receive repeat notifications every 24  
hours until recovery.

http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#service

"notification_interval: 	This directive is used to define the number  
of "time units" to wait before re-notifying a contact that this  
service is still in a non-OK state. Unless you've changed the  
interval_length directive from the default value of 60, this number  
will mean minutes. If you set this value to 0, Nagios will not re- 
notify contacts about problems for this service - only one problem  
notification will be sent out."

> This is my current definition:
>  define service {
>         use                             generic-service
>         service_description             Current load for virtual  
> servers
>         servicegroups                   performance
>         hostgroups                      virtual-servers
>         is_volatile                     0
>         check_period                    24x7
>         max_check_attempts              3
>         normal_check_interval           5
>         retry_check_interval            1
>         notification_interval           0
>         notification_period             24x7
>         check_command                   check_remote_load!/home/ 
> nagios/.ssh/id_rsa!nagios!13.0,8.0,3.0!15.0,10.0,5.0
> }

This isn't the entire service definition (no contacts specified, no  
notification_options, what's in the template?, etc). That's why I  
suggested getting that (and more) from objects.cache. You might also  
want to include the log entries for the service and notifications to  
exemplify the problem.

> Setting this attribute to a higher value should make me get a lot  
> less notifications, but I will still get more or less duplicate  
> notifications. Help on avoiding this will be appreciated.

You just made it worse by saying you _should_ get repeat notifications  
once a day at least. ;)

>
> Regarding your specific questions: I think the service definition  
> above answers most of them.

Not really, it's missing the key bits of information about who gets  
notified for what, when and how.

> We don't run multiple Nagios daemons on the same machine.

I'm sure you don't intentionally but it can happen accidentally and  
cause strange issues. Actively make sure that you don't have multiple  
nagios daemons running right now.

> I didn't find anything unusual in objects.cache.

This was a request so that we could see what was there and help you  
with your problem. What may not be unusual to you may be plainly  
unusual to us, otherwise you wouldn't be asking for help. To make any  
progress on this issue, you should --

- Verify you do not have multiple nagios daemons running right now.  
Stop nagios, use ps to see if any remain, kill them and restart  
nagios. Let us know if you had anything you needed to kill.

If you didn't have multiple instances or you did and the problem  
continues after doing the above,

- Gather the following information from objects.cache
	- The definition for the service experiencing this problem
	- The definition for the host running the service experiencing this  
problem
	- The definition for the contact experiencing this problem
	- The definition for that contact's notification command
- Gather the following information from nagios.log
	- entries related to the service experiencing the problem when it  
initially goes down and subsequent checks during that outage
	- entries related to notifications for the service during this time.

--
Marc


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Limiting mail notification for clock drift
Next message: Limiting mail notification for clock drift
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list