freshness_threshold bug - big problem

Rodney Ramos rodneyra at gmail.com
Mon Dec 20 17:52:46 CET 2010


Hi, Jochen,

Thank you again. I think that you found where the problem is, I mean, the
base/checks.c::is_host_result_fresh() code.

I change the lines in the checks.c as below:

========================================================
FROM: Lines 2439 - 2440

 if(temp_host->freshness_threshold==0)

freshness_threshold=(temp_host->check_interval*interval_length)+temp_host->latency+additional_freshness_latency;

TO:

        if(temp_host->freshness_threshold==0){
                if(temp_host->state_type==HARD_STATE ||
temp_host->current_state==STATE_OK)

freshness_threshold=(temp_host->check_interval*interval_length)+temp_host->latency+additional_freshness_latency;
                else

freshness_threshold=(temp_host->retry_interval*interval_length)+temp_host->latency+additional_freshness_latency;
                }
========================================================

It is working well, as expected. My retry interval is 1 minute and hosts are
taking about 2 minutes to change SOFT states.

The logs are:

BEFORE the changes:

[1292854105] Warning: The results of host 'host1' are stale by 0d 0h 0m 1s
(threshold=0d 0h 5m 15s).  I'm forcing an immediate check of the host.
[1292857824] Warning: The results of host 'host1' are stale by 0d 0h 0m 1s
(threshold=0d 0h 5m 15s).  I'm forcing an immediate check of the host.
[1292859117] Warning: The results of host 'host1' are stale by 0d 0h 0m 52s
(threshold=0d 0h 5m 15s).  I'm forcing an immediate check of the host.


AFTER the changes:

[1292859297] Warning: The results of host 'host1' are stale by 0d 0h 0m 59s
(threshold=0d 0h 1m 38s).  I'm forcing an immediate check of the host.
[1292859417] Warning: The results of host 'host1' are stale by 0d 0h 0m 31s
(threshold=0d 0h 1m 22s).  I'm forcing an immediate check of the host.
[1292859597] Warning: The results of host 'host1' are stale by 0d 0h 0m 47s
(threshold=0d 0h 1m 44s).  I'm forcing an immediate check of the host.

So, I´d like to known the developers opinion. It´s a bug or it´s not? Are
you intending to change the source code? If not I will have to change it
myself always when a new Nagios version is released.

Thanks a lot,
Rodney.


On Fri, Dec 17, 2010 at 10:07 AM, Jochen Bern <Jochen.Bern at linworks.de>wrote:

> On 12/17/2010 12:10 PM, Rodney Ramos wrote:
> > Than I understood that you confirm the problem
>
> I confirm that my 3.2.3 autodetermines the host's freshness threshold as
> check_interval+additional_freshness_latency, even in SOFT non-OK cases,
> when active checks would use retry_interval instead.
>
> I'm not calling it a "problem" yet, though, because the specifics you
> quote (apparently from a local copy of the docs ?) are absent from the
> docs at http://nagios.sourceforge.net/docs/3_0/freshness.html .
>
> Nonetheless, when I compare base/checks.c::is_host_result_fresh() to
> base/checks.c::is_service_result_fresh(), it seems that the latter
> *does* do the if-then-else you describe, while it's absent from the former:
>
> [...]
> /* tests whether or not a service's check results are fresh */
> int is_service_result_fresh(service *temp_service, time_t current_time,
> int log_this){
> [...]
>   /* use user-supplied freshness threshold or auto-calculate a
> freshness threshold to use? */
>   if(temp_service->freshness_threshold==0){
>      if(temp_service->state_type==HARD_STATE ||
> temp_service->current_state==STATE_OK)
>
>
> freshness_threshold=(temp_service->check_interval*interval_length)+temp_service->latency+additional_freshness_latency;
>      else
>
>
> freshness_threshold=(temp_service->retry_interval*interval_length)+temp_service->latency+additional_freshness_latency;
>      }
>   else
>      freshness_threshold=temp_service->freshness_threshold;
> [...]
> /* checks to see if a hosts's check results are fresh */
> int is_host_result_fresh(host *temp_host, time_t current_time, int
> log_this){
> [...]
>   /* use user-supplied freshness threshold or auto-calculate a
> freshness threshold to use? */
>   if(temp_host->freshness_threshold==0)
>
>
> freshness_threshold=(temp_host->check_interval*interval_length)+temp_host->latency+additional_freshness_latency;
>   else
>      freshness_threshold=temp_host->freshness_threshold;
> [...]
>
> I have no idea whether that's intentional, though ...
>
> > 18:56:13 Warning: The results of host 'Unfresh' are stale by 0d 0h 0m 59s
> >   (threshold=0d 0h 15m 17s). I'm forcing an immediate check of the host.
> > 18:56:23 HOST ALERT: Unfresh;DOWN;SOFT;2;(null)
> >
> > --> It´s wrong. It should be about 18:42:05, 2 minutes after the SOFT1,
> as
> > your retry_interval is 2 minutes.
> >
> > 19:28:13 Warning: The results of host 'Unfresh' are stale by 0d 0h 0m 39s
> >   (threshold=0d 0h 15m 18s). I'm forcing an immediate check of the host.
> > 19:28:23 HOST ALERT: Unfresh;DOWN;SOFT;3;CRITICAL: All life functions
> > terminated
> >
> > --> It´s wrong. It should be about 18:58:23, 2 minutes after the SOFT2,
> as
> > your retry_interval is 2 minutes.
>
> (You missed the spurious *second* SOFT2 between these two, which upends
> the prediction of "correct" check times even further ...)
>
> P.S. to my previous mail: I also noted that, in spite of the config
> saying "initial_state o", the host was listed as PENDING in the CGIs
> after the first reload. Is that expected behaviour?
>
> Kind regards,
>                                                                J. Bern
> --
> Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
> Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
> PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
> Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
> Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20101220/b1f4da36/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list