freshness check bug?

admin at jpk236.com admin at jpk236.com
Wed May 11 18:39:01 CEST 2005


Bryan,	
	A freshness_threshold of 60 seconds might be a little unrealistic.  The 
default value for the threshold is 300 seconds (5 minutes).
	If you want almost real-time stats, which appears to be what you're 
going for, perhaps you want to try NRPE or check_by_ssh as an 
alternative method of doing distributed monitoring.

  - Justin Kulikowski
	[ http://www.jpk236.com ]

Bryan Loniewski wrote:
> While trying to setup failover in a distributed environment, I came 
> across the following
> problem (bug?) involving freshness checking.
> 
> Note: The host that this is setup on is NOT receiving any passive checks 
> while I am
> testing the freshness checking.. so the results are always stale forcing 
> the freshness
> check everytime.
> 
> Note2: Relevant config snippets are under my .sig
> 
> Trying to configure (passive) service freshness checking to execute an 
> eventhandler
> works correctly for 1 or 2 iterations.. BUT no more than that. It seems 
> to stop checking
> the freshness after at most 3 iterations and stops executing the 
> eventhandler after at most 2 iterations. I've replicated this behavior 
> (too) many times and the results are
> inconsistent.
> 
> Below is the output of my nagios log:
> 
> <snip nagios.log>
> [1115822708] Finished daemonizing... (New PID=15941)
> [1115822828] Warning: The results of service 'PROCS-NAGIOS' on host 
> 'csstest2' are stale
> by 60 seconds (threshold=60 seconds).  I'm forcing an immediate check of 
> the service.
> [1115822838] SERVICE ALERT: csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;CRITICAL
> [1115822838] SERVICE EVENT HANDLER: 
> csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;slave-failover
> [1115822948] Warning: The results of service 'PROCS-NAGIOS' on host 
> 'csstest2' are stale
> by 60 seconds (threshold=60 seconds).  I'm forcing an immediate check of 
> the service.
> 
> Notice the freshness check ran ONLY 2 times when it should have run 5 
> (if you look at my
> config options below) and the eventhandler ran ONLY 1 time, when it 
> should have ran 3 times.
> 
> Can anyone verify (disprove) this behavior? Am I missing something?
> 
> _________________________
> Bryan Loniewski
> Rutgers University
> NBCS - Systems Programmer
> 
> <snip nagios.cfg>
> check_service_freshness=1
> service_freshness_check_interval=60
> <snip>
> 
> <snip objects.cfg>
> define service{
>          name                            generic-service
>          parallelize_check               1
>          obsess_over_service             1
>          check_freshness                 0
>          freshness_threshold             60
>          notifications_enabled           1
>          event_handler_enabled           1
>          flap_detection_enabled          1
>          failure_prediction_enabled      1
>          process_perf_data               1
>          retain_status_information       1
>          retain_nonstatus_information    1
>          is_volatile                     0
>          max_check_attempts              5
>          normal_check_interval           2
>          retry_check_interval            1
>          check_period                    24x7
>          contact_groups                  super-admins
>          notification_interval           3
>          notification_period             24x7
>          register                        0
> }
> define service{
>          use                             generic-service
>          name                            generic-passive-service
>          active_checks_enabled           0
>          passive_checks_enabled          1
>          register                        0
> }
> define service{
>          use                             generic-passive-service
>          host_name                       csstest2
>          service_description             PROCS-NAGIOS
>          check_freshness                 1
>          freshness_threshold             60
>          check_command                   check_dummy!2
>          event_handler                   slave-failover
> }
> define command{
>         command_name    check_dummy
>         command_line    $USER1$/check_dummy $ARG1$
> }
> define command{
>         command_name    slave-failover
>         command_line    $USER2$/failover $SERVICESTATE$ $SERVICESTATETYPE$
> }
> <snip>
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click




More information about the Developers mailing list