High CPU consumption by java and redis-server

Anders Håål anders.haal at ingby.com
Wed Sep 17 20:49:06 CEST 2014


Hi Rahul,
Looking at your threshold this means that you will retrieve max 6 
values, which should not be that "hard" even if its a time based query - 
using index is faster and is something we will look into in the future.
Since you run the query every 120 sec it means that you currently have 
at lest 5040 items in the cache for this each service, which does not 
sound to bad. 10 services  at least 50000 in total.
What I like you to check is the following:
- If you connect with some JMX client against bischeck you can see all 
the different timers 
http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Chapter-5. 
The once that are related to threshold are inserting to start with but 
check all the different timers if some one have long execution time.
- Since the its the redis-server that are consume a high level of CPU 
its interesting to see the configuration for redis - like the amount of 
memory allocated. If redis need to swap its not good.
- Please check the redis log files.
- You can also connect to redis with redis-cli and run command "monitor" 
to get a real time listing on the commands executed against redis.
- Also check with top the percentage of %wa, waiting for io. How much 
memory do you have on the server? Only running bischeck and redis?
- How much cpu is bischeck consuming? Do you see any peaks?
- Also check the bischeck log to see any ERROR or WARN.
- And finally - has this been the behavior from the beginning or has it 
increased over time? What happen if you restart bischeck (not reload)?

Try to collect some more info so we can try to determine where the issue 
is related.

When it comes to your last finding I have no explanation. Just to 
understand you compare using -24H with -10080M (-168H). Would it not be 
better to compare -24H and -1440M. I have to get back to you on this but 
I would need to get the result when running in cacheCli since you get 
the time it takes, 
http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.


Regards
Anders





On 09/17/2014 07:13 PM, Rahul Amaram wrote:
> Hi,
> I am observing very high CPU consumption by the java process and 
> redis-server. redis-server being single threaded it self is taking 
> 100% CPU. I have about 10 hosts, with about 10 services each (with one 
> service item per service). The time interval for generation of value 
> is 120s. The threshold that I have defined is:
>
> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-96H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-504H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-672H]) 
>
>
> However, currently, not more than 3 values, are available.
>
> I am already running this on a c3.xlarge machine (4 cores) and the 
> load average is quite often > 4 resulting in delay of generation of 
> values. Any pointers in what could be causing the high load would be 
> much appreciated.
>
> On a slightly different note, while using cli.CacheCli, retrieving the 
> value of a service item one week back using hours (-24H) is 
> considerably faster than retrieving it using minutes (-10080M). Again, 
> why does bischeck behave this way?
>
> Thanks,
> Rahul.
>


-- 

Ingby<http://www.ingby.com>

IngbyForge<http://gforge.ingby.com>

bischeck - dynamic and adaptive thresholds for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091



More information about the Bischeck-users mailing list