High CPU consumption by java and redis-server

Rahul Amaram rahul.amaram at vizury.com
Thu Sep 18 05:47:02 CEST 2014


Using indices has got down CPU usage of the java process considerably. 
But won't indices create problem if data is missing in-between? For ex. 
if data for 2-3 hours is missing, then won't indices be offset by that 
time? Also, does bischeck support using multiple redis-instances so that 
multi-core be exploited?

Regards,
Rahul.

On Thursday 18 September 2014 01:58 AM, Anders Håål wrote:
> Okay. 4-6 to seconds is absolutely to much and it may be related how 
> the query by time is implement. The current search is pretty brute 
> force finding the "right" time. Its not searching linearly but there 
> are no "index" on time.  Searching by index is much quicker and query 
> by time will be related to the size. With your 6 values the search has 
> to be done 6 times over a list that are 5000 items. The future idea I 
> mentioned will be a sort of index for the timestamp by using a sorted 
> set.
> What I would recommend you to do is to use index instead and see how 
> that effect the performance. Since you use a interval of 120 sec, the 
> -24H will be the same as index 720, -96H will be the same as index 
> 2880, etc.
> I will try to get the time to set up an equivalent test environment. 
> Keep me updated of your investigation
> Anders
>
> On 09/17/2014 09:18 PM, Rahul Amaram wrote:
>> /When it comes to your last finding I have no explanation. Just to 
>> understand you compare using -24H with -10080M (-168H). Would it not 
>> be better to compare -24H and -1440M. I have to get back to you on 
>> this but I would need to get the result when running in cacheCli 
>> since you get the time it takes, 
>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.//
>> ///
>> This was a typo. I was talking about -168H and -10080M. Also, I used 
>> "bischeck cli.CacheCli" to check this. And I re-ran this now, but not 
>> finding much difference between both of them (it takes about 4-6 
>> seconds to retrieve the value).
>>
>> Reg. other points, I have to get back to you. On a side note, I have 
>> upgraded from redis-server 2.6 to 2.8, just to rule out any version 
>> performance issues.
>>
>> Thanks,
>> Rahul.
>>
>>
>> On Thursday 18 September 2014 12:19 AM, Anders Håål wrote:
>>> Hi Rahul,
>>> Looking at your threshold this means that you will retrieve max 6 
>>> values, which should not be that "hard" even if its a time based 
>>> query - using index is faster and is something we will look into in 
>>> the future.
>>> Since you run the query every 120 sec it means that you currently 
>>> have at lest 5040 items in the cache for this each service, which 
>>> does not sound to bad. 10 services  at least 50000 in total.
>>> What I like you to check is the following:
>>> - If you connect with some JMX client against bischeck you can see 
>>> all the different timers 
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Chapter-5. 
>>> The once that are related to threshold are inserting to start with 
>>> but check all the different timers if some one have long execution 
>>> time.
>>> - Since the its the redis-server that are consume a high level of 
>>> CPU its interesting to see the configuration for redis - like the 
>>> amount of memory allocated. If redis need to swap its not good.
>>> - Please check the redis log files.
>>> - You can also connect to redis with redis-cli and run command 
>>> "monitor" to get a real time listing on the commands executed 
>>> against redis.
>>> - Also check with top the percentage of %wa, waiting for io. How 
>>> much memory do you have on the server? Only running bischeck and redis?
>>> - How much cpu is bischeck consuming? Do you see any peaks?
>>> - Also check the bischeck log to see any ERROR or WARN.
>>> - And finally - has this been the behavior from the beginning or has 
>>> it increased over time? What happen if you restart bischeck (not 
>>> reload)?
>>>
>>> Try to collect some more info so we can try to determine where the 
>>> issue is related.
>>>
>>> When it comes to your last finding I have no explanation. Just to 
>>> understand you compare using -24H with -10080M (-168H). Would it not 
>>> be better to compare -24H and -1440M. I have to get back to you on 
>>> this but I would need to get the result when running in cacheCli 
>>> since you get the time it takes, 
>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
>>>
>>>
>>> Regards
>>> Anders
>>>
>>>
>>>
>>>
>>>
>>> On 09/17/2014 07:13 PM, Rahul Amaram wrote:
>>>> Hi,
>>>> I am observing very high CPU consumption by the java process and 
>>>> redis-server. redis-server being single threaded it self is taking 
>>>> 100% CPU. I have about 10 hosts, with about 10 services each (with 
>>>> one service item per service). The time interval for generation of 
>>>> value is 120s. The threshold that I have defined is:
>>>>
>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-96H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-504H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-672H]) 
>>>>
>>>>
>>>> However, currently, not more than 3 values, are available.
>>>>
>>>> I am already running this on a c3.xlarge machine (4 cores) and the 
>>>> load average is quite often > 4 resulting in delay of generation of 
>>>> values. Any pointers in what could be causing the high load would 
>>>> be much appreciated.
>>>>
>>>> On a slightly different note, while using cli.CacheCli, retrieving 
>>>> the value of a service item one week back using hours (-24H) is 
>>>> considerably faster than retrieving it using minutes (-10080M). 
>>>> Again, why does bischeck behave this way?
>>>>
>>>> Thanks,
>>>> Rahul.
>>>>
>>>
>>>
>>
>>
>
>
> -- 
>
> Ingby<http://www.ingby.com>
>
> IngbyForge<http://gforge.ingby.com>
>
> bischeck - dynamic and adaptive thresholds for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com  <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax:  +46 75 75 75 091
>


-- 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20140918/c2a19d80/attachment.html>


More information about the Bischeck-users mailing list