Nagios 3.1.1 eats cpu like mad

Alessandro Ren alessandro.ren at opservices.com.br
Tue Jun 23 20:00:22 CEST 2009



On 6/23/2009 2:52 PM, Ethan Galstad wrote:
> Patch is in CVS now.  Can someone who was experience scheduling problems
> with the 3.0.6 release test the latest 3.1.2 release?  If the problem
> still persists, its likely in one of the following functions in
> base/utils.c:
>
> check_time_against_period()
> get_next_valid_time()
>    

     This solved the 2010 random schedule of  services bug, now this 
will happen again. Off course, the 100% CPU is not a trace off to solve 
the bug.

     [s].

> These functions are more complicated now with the new timeperiod
> exceptions and date formats, so a bug could likely exist here.
>
> - Ethan Galstad
>
>
> Andreas Ericsson wrote:
>    
>> There's a bug in Nagios 3.1.1, making it eat all available CPU even
>> with a very small configuration (5 hosts, 12 service checks).
>>
>> I sort of introduced it, as I didn't fully test the impact of a patch
>> sent in before accepting it. Mea culpa, so I'll make sure to fix it.
>>
>> For some reason, the patch shown inline below makes Nagios consume
>> 100% CPU on my system. I don't know the reason for this, but I'll
>> investigate it and see how it can be fixed. I *think* it happens
>> because Nagios sees that "current_time" is valid and therefore
>> returns precisely that from get_next_valid_time(), which means it
>> pushes all the scheduled checks in front of it until enough time
>> has passed since the check was last *run* before actually executing
>> it. Obviously, that sucks major donkeyballs, so we really shouldn't
>> do that. I'll need to check that up a bit more closely before I can
>> say with 100% certainty that that's what's happening though.
>>
>> -8<--8<--8<-
>> commit 523e8c516df323a0bafe98ecb9222384fde62d6e
>> Author: Andreas Ericsson<ae at op5.se>
>> Date:   Fri May 22 01:38:28 2009 +0000
>>
>>      Fix service rescheduling on clock skew/timeperiod change
>>
>>      This patch ensures that services and hosts are never scheduled one
>>      year into the future and set to never be rescheduled again.
>>
>>      Previously, this could happen if the next preferred time happened
>>      to already be valid, but stops being so because of clock skew or
>>      someone changing the timeperiod definition between two Nagios
>>      restarts while retaining scheduling info.
>>
>>      Patch-sent-by: Ricardo Maraschini<ricardo.maraschini at opservices.com.br>
>>      Signed-off-by: Andreas Ericsson<ae at op5.se>
>>
>> diff --git a/base/checks.c b/base/checks.c
>> index 9d5c497..ef50a20 100644
>> --- a/base/checks.c
>> +++ b/base/checks.c
>> @@ -277,7 +277,7 @@ int run_scheduled_service_check(service *svc, int check_options, double latency)
>>   				preferred_time=current_time+((svc->check_interval<=0)?300:(svc->check_interval*interval_length));
>>
>>   			/* make sure we rescheduled the next service check at a valid time */
>> -			get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);
>> +			get_next_valid_time(current_time,&next_valid_time,svc->check_period_ptr);
>>
>>   			/* the service could not be rescheduled properly - set the next check time for next year, but don't actually reschedule it */
>>   			if(time_is_valid==FALSE&&  next_valid_time==preferred_time){
>> @@ -2792,7 +2792,7 @@ int run_scheduled_host_check_3x(host *hst, int check_options, double latency){
>>   				preferred_time=current_time+((hst->check_interval<=0)?300:(hst->check_interval*interval_length));
>>
>>   			/* make sure we rescheduled the next host check at a valid time */
>> -			get_next_valid_time(preferred_time,&next_valid_time,hst->check_period_ptr);
>> +			get_next_valid_time(current_time,&next_valid_time,hst->check_period_ptr);
>>
>>   			/* the host could not be rescheduled properly - set the next check time for next year, but don't actually reschedule it */
>>   			if(time_is_valid==FALSE&&  next_valid_time==preferred_time){
>> -8<--8<--8<-
>>
>>
>>      
> ------------------------------------------------------------------------------
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>    

------------------------------------------------------------------------------




More information about the Developers mailing list