freshness check bug?

admin at jpk236.com admin at jpk236.com
Wed May 11 19:26:59 CEST 2005


Bryan,
	You never mentioned, and I forgot to ask.  What method are you using to 
send the passive checks from the distributed monitored servers to your 
central server?  NSCA?  If so, are those servers configured correctly to 
send the data?  Is the central server configured correctly to receive 
the data?

  - Justin Kulikowski
	[ http://www.jpk236.com ]

Bryan Loniewski wrote:
> Regardless of what freshness_threshold I pick (as long as it's not too 
> unrealistic), I just want clarification if a bug exists? (By the way, 
> where do you see the default
> freshness threshold is 300 sec?). Anyway, I increased the threshold just 
> now to 180
> seconds and the only thing in my nagios.log was:
> 
> [1115831032] Finished daemonizing... (New PID=16154)
> [1115831272] Warning: The results of service 'PROCS-NAGIOS' on host 
> 'csstest2' are stale
> by 60 seconds (threshold=180 seconds).  I'm forcing an immediate check 
> of the service.
> 
> So it did not even execute my eventhandler once? I'm getting very 
> inconsistent results!
> 
> NRPE and check_by_ssh are not acceptable methods for distributed 
> monitoring in our
> environment.
> 
> Thanks for the comments... Justin
> 
> _________________________
> Bryan Loniewski
> Rutgers University
> NBCS - Systems Programmer
> 
> On Wed, 11 May 2005, admin at jpk236.com wrote:
> 
>> Bryan,        A freshness_threshold of 60 seconds might be a little 
>> unrealistic.  The default value for the threshold is 300 seconds (5 
>> minutes).
>>     If you want almost real-time stats, which appears to be what 
>> you're going for, perhaps you want to try NRPE or check_by_ssh as an 
>> alternative method of doing distributed monitoring.
>>
>> - Justin Kulikowski
>>     [ http://www.jpk236.com ]
>>
>> Bryan Loniewski wrote:
>>
>>> While trying to setup failover in a distributed environment, I came 
>>> across the following
>>> problem (bug?) involving freshness checking.
>>>
>>> Note: The host that this is setup on is NOT receiving any passive 
>>> checks while I am
>>> testing the freshness checking.. so the results are always stale 
>>> forcing the freshness
>>> check everytime.
>>>
>>> Note2: Relevant config snippets are under my .sig
>>>
>>> Trying to configure (passive) service freshness checking to execute 
>>> an eventhandler
>>> works correctly for 1 or 2 iterations.. BUT no more than that. It 
>>> seems to stop checking
>>> the freshness after at most 3 iterations and stops executing the 
>>> eventhandler after at most 2 iterations. I've replicated this 
>>> behavior (too) many times and the results are
>>> inconsistent.
>>>
>>> Below is the output of my nagios log:
>>>
>>> <snip nagios.log>
>>> [1115822708] Finished daemonizing... (New PID=15941)
>>> [1115822828] Warning: The results of service 'PROCS-NAGIOS' on host 
>>> 'csstest2' are stale
>>> by 60 seconds (threshold=60 seconds).  I'm forcing an immediate check 
>>> of the service.
>>> [1115822838] SERVICE ALERT: 
>>> csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;CRITICAL
>>> [1115822838] SERVICE EVENT HANDLER: 
>>> csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;slave-failover
>>> [1115822948] Warning: The results of service 'PROCS-NAGIOS' on host 
>>> 'csstest2' are stale
>>> by 60 seconds (threshold=60 seconds).  I'm forcing an immediate check 
>>> of the service.
>>>
>>> Notice the freshness check ran ONLY 2 times when it should have run 5 
>>> (if you look at my
>>> config options below) and the eventhandler ran ONLY 1 time, when it 
>>> should have ran 3 times.
>>>
>>> Can anyone verify (disprove) this behavior? Am I missing something?
>>>
>>> _________________________
>>> Bryan Loniewski
>>> Rutgers University
>>> NBCS - Systems Programmer
>>>
>>> <snip nagios.cfg>
>>> check_service_freshness=1
>>> service_freshness_check_interval=60
>>> <snip>
>>>
>>> <snip objects.cfg>
>>> define service{
>>>          name                            generic-service
>>>          parallelize_check               1
>>>          obsess_over_service             1
>>>          check_freshness                 0
>>>          freshness_threshold             60
>>>          notifications_enabled           1
>>>          event_handler_enabled           1
>>>          flap_detection_enabled          1
>>>          failure_prediction_enabled      1
>>>          process_perf_data               1
>>>          retain_status_information       1
>>>          retain_nonstatus_information    1
>>>          is_volatile                     0
>>>          max_check_attempts              5
>>>          normal_check_interval           2
>>>          retry_check_interval            1
>>>          check_period                    24x7
>>>          contact_groups                  super-admins
>>>          notification_interval           3
>>>          notification_period             24x7
>>>          register                        0
>>> }
>>> define service{
>>>          use                             generic-service
>>>          name                            generic-passive-service
>>>          active_checks_enabled           0
>>>          passive_checks_enabled          1
>>>          register                        0
>>> }
>>> define service{
>>>          use                             generic-passive-service
>>>          host_name                       csstest2
>>>          service_description             PROCS-NAGIOS
>>>          check_freshness                 1
>>>          freshness_threshold             60
>>>          check_command                   check_dummy!2
>>>          event_handler                   slave-failover
>>> }
>>> define command{
>>>         command_name    check_dummy
>>>         command_line    $USER1$/check_dummy $ARG1$
>>> }
>>> define command{
>>>         command_name    slave-failover
>>>         command_line    $USER2$/failover $SERVICESTATE$ 
>>> $SERVICESTATETYPE$
>>> }
>>> <snip>
>>>
>>>
>>> -------------------------------------------------------
>>> This SF.Net email is sponsored by Oracle Space Sweepstakes
>>> Want to be the first software developer in space?
>>> Enter now for the Oracle Space Sweepstakes!
>>> http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
>>> _______________________________________________
>>> Nagios-devel mailing list
>>> Nagios-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>>
>>


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click




More information about the Developers mailing list