first_notification_delay - notification may be sent too early

Andreas Ericsson ae at op5.se
Thu May 20 12:01:21 CEST 2010


On 05/20/2010 11:19 AM, Paweł Małachowski wrote:
> Hello,
> 
> 
> according to manual:
> 
> first_notification_delay: This directive is used to define the number of
> "time units" to wait before sending out the first problem notification when
> this host enters a non-UP state. Unless you've changed the interval_length
> directive from the default value of 60, this number will mean minutes. If
> you set this value to 0, Nagios will start sending out notifications
> immediately.
> 
> 
> 
> However, it may send notification earlier, because time is counted starting
> from last UP state, not first non-UP state.
> 
> Example:
> passive check, first_notification_delay set to 60, passive checks reported
> every 5 minutes:
> 00 minute - reported OK
> 05 minute - reported OK - the last OK status
> 10 minute - reported DOWN - notification won't be sent (delay in progress)
> 65 minute - notification sent (55 minutes after DOWN, not 60)
> 70 minute - no notification sent
> 
> If passive checks are reported less frequent, e.g. one per our, things get
> even worse. :)
> 
> 
> 
> Code snippet for ilustration of this behaviour:
> 
> /* checks viability of sending a host notification */
> int check_host_notification_viability(host *hst, int type, int options){
> [...]
>          if(type==NOTIFICATION_NORMAL&&  hst->current_notification_number==0&&  hst->current_state!=HOST_UP&&  (current_time<  (time_t)((hst->last_time_up==(time_t)0L)?program_start:hst->last_time_up + (hst->first_notification_delay*interval_length)))){
>                  log_debug_info(DEBUGL_NOTIFICATIONS,1,"Not enough time has elapsed since the host changed to a non-UP state (or since program start), so we shouldn't notify about this problem yet.\n");
>                  return ERROR;
>                  }
> 
> Probably using "last_state_change" instead of "last_time_up" would be better (haven't tried).
> 

It used to be last_hard_state_change. I don't quite see why it was
changed, apart from a dubious comment right on top of the code
about not delaying recovery notifications, but that seems totally
bogus, since it already checks that current state isn't HOST_UP.

The same goes for services, btw. You could try changing it to use
last_hard_state_change instead of the current mess. If it works as
advertised when you do, I'll make the adjustment to the nagios core
so that the change goes in the next release.

Thanks.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------

_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list