Limiting mail notification for clock drift

Kenneth Holter kenneho.ndu at gmail.com
Thu Nov 13 12:59:42 CET 2008


Thank you for you patience. As you can see I'm not at all very experienced
with Nagios (just recently set it up). :)

I've reset the notification_interval to 0. Got confused for a moment as most
of the other services are set to high numbers such as 1440. But you're right
- 0 is what I need.

I've extracted the information you asked for from objects.cache, but before
take up space with that info let me show you what I found...

I examined the log file, and found that this:


   - [root at server2 archives]# grep -i server1 nagios-11-12-2008-00.log |grep
   -i ntp|grep -i notification
   - [1226375094] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;CRITICAL;notify-service-by-email;NTP
   CRITICAL: Offset -419.5424475 secs
   - [1226379294] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;OK;notify-service-by-email;NTP
   OK: Offset 0.1932080023 secs
   - [1226381814] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;WARNING;notify-service-by-email;NTP
   WARNING: Offset 0.5295080024 secs
   - [1226382714] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;OK;notify-service-by-email;NTP
   OK: Offset 0.06872100243 secs
   - [1226387933] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;WARNING;notify-service-by-email;NTP
   WARNING: Offset -0.8496964976 secs
   - [1226391833] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;CRITICAL;notify-service-by-email;NTP
   CRITICAL: Offset -6.277684498 secs
   - [1226394233] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;WARNING;notify-service-by-email;NTP
   WARNING: Offset -2.576738498 secs
   - [1226394533] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;CRITICAL;notify-service-by-email;NTP
   CRITICAL: Offset -7.470661498 secs
   - [1226397533] SERVICE NOTIFICATION: nagios;server1;NTP
1.2.3.4;WARNING;notify-service-by-email;NTP
   WARNING: Offset -

I does indeed seem like the notifications are sent out just like I was
aiming at. But because there was so many notifications and my mailbox is a
mess I didn't notice this then.

So to me it seems like there is a lot of flapping going on, and I'll read up
on the subject and see what I come up with.

Thanks for the help. I've learned a lot more about Nagios.


On 11/12/08, Marc Powell <marc at ena.com> wrote:
>
>
> On Nov 12, 2008, at 9:03 AM, Kenneth Holter wrote:
>
> > I think I found one of the causes for the excessive notifacations -
> > the notification_interval was set to 0. For almost all our services
> > we have a high number (such as 1440), but this current value of zero
> > must have made it's way into the code somehow.
>
> It should be left at 0 to get the behavior you're expecting. With
> notification_interval 0, a notification for a {WARNING | CRITICAL | OK/
> RECOVERY} state will only be sent once per problem. That's what you
> want. When set at 1440, you'll receive repeat notifications every 24
> hours until recovery.
>
> http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#service
>
> "notification_interval:         This directive is used to define the number
> of "time units" to wait before re-notifying a contact that this
> service is still in a non-OK state. Unless you've changed the
> interval_length directive from the default value of 60, this number
> will mean minutes. If you set this value to 0, Nagios will not re-
> notify contacts about problems for this service - only one problem
> notification will be sent out."
>
> > This is my current definition:
> >  define service {
> >         use                             generic-service
> >         service_description             Current load for virtual
> > servers
> >         servicegroups                   performance
> >         hostgroups                      virtual-servers
> >         is_volatile                     0
> >         check_period                    24x7
> >         max_check_attempts              3
> >         normal_check_interval           5
> >         retry_check_interval            1
> >         notification_interval           0
> >         notification_period             24x7
> >         check_command                   check_remote_load!/home/
> > nagios/.ssh/id_rsa!nagios!13.0,8.0,3.0!15.0,10.0,5.0
> > }
>
> This isn't the entire service definition (no contacts specified, no
> notification_options, what's in the template?, etc). That's why I
> suggested getting that (and more) from objects.cache. You might also
> want to include the log entries for the service and notifications to
> exemplify the problem.
>
> > Setting this attribute to a higher value should make me get a lot
> > less notifications, but I will still get more or less duplicate
> > notifications. Help on avoiding this will be appreciated.
>
> You just made it worse by saying you _should_ get repeat notifications
> once a day at least. ;)
>
> >
> > Regarding your specific questions: I think the service definition
> > above answers most of them.
>
> Not really, it's missing the key bits of information about who gets
> notified for what, when and how.
>
> > We don't run multiple Nagios daemons on the same machine.
>
> I'm sure you don't intentionally but it can happen accidentally and
> cause strange issues. Actively make sure that you don't have multiple
> nagios daemons running right now.
>
> > I didn't find anything unusual in objects.cache.
>
> This was a request so that we could see what was there and help you
> with your problem. What may not be unusual to you may be plainly
> unusual to us, otherwise you wouldn't be asking for help. To make any
> progress on this issue, you should --
>
> - Verify you do not have multiple nagios daemons running right now.
> Stop nagios, use ps to see if any remain, kill them and restart
> nagios. Let us know if you had anything you needed to kill.
>
> If you didn't have multiple instances or you did and the problem
> continues after doing the above,
>
> - Gather the following information from objects.cache
>        - The definition for the service experiencing this problem
>        - The definition for the host running the service experiencing this
> problem
>        - The definition for the contact experiencing this problem
>        - The definition for that contact's notification command
> - Gather the following information from nagios.log
>        - entries related to the service experiencing the problem when it
> initially goes down and subsequent checks during that outage
>        - entries related to notifications for the service during this time.
>
> --
> Marc
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081113/cbfa3b28/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list