Max concurrent checks - spreading the next_time

Ton Voon ton.voon at opsera.com
Thu Jun 11 09:44:47 CEST 2009


On 10 Jun 2009, at 09:52, Andreas Ericsson wrote:

> Ton Voon wrote:
>> I propose that instead of setting next_time = next_time +
>> check_interval, that there is a random factor added, maybe  
>> something like:
>>
>> next_time = now + max(5, min(int(rand(15)),
>> int(rand(retry_interval*interval_length))))
>>
>> This means that the next check has been moved at least 5 seconds away
>> from now (to overcome the temporary load due to the number of  
>> concurrent
>> service checks), with a maximum of 15 seconds away (or less if the
>> retry_interval is lower).
>>

> I can't help but think that something like this could have been quite
> easily resolved with a round-robin scheduling queue, where items  
> requested
> to be queued would simply get inserted within 5 seconds of the  
> requested
> time where there are the most free slots. The prng idea will probably
> work just as well though, and I'm fairly certain you could just use
>
>  next_time = service->check_interval - 7 + (*service->description &  
> 0xf);
>
> to get a distribution almost equally good without having to bother
> about the PRNG-business. This would yield 7 seconds +-, which is
> probably good enough.

I notice that rand() is already used elsewhere in nagios, so I will go  
with that instead.

Ton


------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects




More information about the Developers mailing list