Specifying the retention period

Anders Håål anders.haal at ingby.com
Mon Sep 8 12:39:52 CEST 2014


Glad that it clarified how to configure the cache section. I will make a 
blog post on this in the mean time, until we have a updated 
documentation. I agree with you that the structure of the configuration 
is a bit "heavy", so ideas and input is appreciated.

Regarding redis ttl, this is a redis feature we do not use. The ttl 
mentioned in my mail is managed by bischeck. Redis ttl on linked list do 
not work on individual nodes in a redis linked list.

Currently the bischeck installer should work for ubuntu, redhat/centos 
and debian. There is currently no plans to make distribution packages 
like rpm or deb. I know op5 (www.op5.com) that bundles Bischeck make a 
bischeck rpm. It would be super if there is any one that like to do this 
for the project.
When it comes to packaging we have done a bit of work to create docker 
containers, but its still experimental.

I also encourage you, if you think bischeck support your monitoring 
effort, to write a small testimony that we can put on the site.
Regards
Anders

On 09/08/2014 11:30 AM, Rahul Amaram wrote:
> Thanks Anders. This explains precisely why my data was getting purged
> after 16 hours (30 values per hour * 1 hours = 480). It would be great
> if you could update the documentation with this info. The entire setup
> and configuration itself takes time to get a hold on and detailed
> documentation would be very helpful.
>
> Also, another quick question? Right now, I believe the Redis TTL is set
> to 2000 seconds. Does this mean that if I don't receive data for a
> particular serviceitem (or service or host) for a 2000 seconds, the data
> related to it is lost?
>
> Also, any plans for bundling this with distributions such as Debian?
>
> Regards,
> Rahul.
>
>
> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>> Hi Rahul,
>> Thanks for the question and feedback on the documentation. Great to
>> hear that you think Bischeck is awesome. If you do not understand how
>> it works by reading the documentation you are probably not alone, and
>> we should consider it a documentation bug.
>>
>> In 1.0.0 we introduce the concept that you asking about and it really
>> two different independent features.
>>
>> Lets start with cache purging.
>> Collected monitoring data, metrics, are kept in the cache (redis from
>> 1.0.0) as a linked lists. There is one linked list per service
>> definition, like host1-service1-serviceitem1.  Prior to 1.0.0 all the
>> linked lists had the same size that was defined with the property
>> lastStatusCacheSize. But in 1.0.0 we made that configurable so it
>> could be defined per service definition.
>> To enable individual cache configurations we added a section called
>> <cache> in the serviceitem section of the bischeck.xml. Like many
>> other configuration options in 1.0.0 the cache section could have the
>> specific values or point to a template that could be shared.
>> To manage the size of the cache , or to be more specific the linked
>> list size, we defined the <purge> section. The purge section can have
>> two different configurations. The first is defining the max size of
>> the cache linked list.
>> <cache>
>>   <purge>
>>    <maxcount>1000</maxcount>
>>   </purge>
>> </cache>
>>
>> The second options is to define the “time to live” for the metrics in
>> the cache.
>> <cache>
>>   <purge>
>>    <offset>10</offset>
>>    <period>D</period>
>>   </purge>
>> </cache>
>> In the above example we set the time to live to 10 days. So any
>> metrics older then this period will be removed. The period can have
>> the following values:
>> H - hours
>> D - days
>> W - weeks
>> Y - year
>>
>> The two option are mutual exclusive. You have to chose one for each
>> serviceitem or cache template.
>>
>> If no cache directive is define for a serviceitem the property
>> lastStatusCacheSize will be used. It's default value is 500.
>>
>> Hopefully this explains the cache purging.
>>
>> The next question was related to aggregations which has nothing to do
>> with purging, but it's configured in the same <cache> section. The
>> idea with aggregations was to create an automatic way to aggregate
>> metrics on the level of an hour, day, week and month. The aggregation
>> functions current supported is average, max and min.
>> Lets say you have a service definition of the format
>> host1-service1-serviceitem1. When you  enable an average (avg)
>> aggregation you will automatically get the following new service
>> definitions
>> host1-service1/H/avg-serviceitem1
>> host1-service1/D/avg-serviceitem1
>> host1-service1/W/avg-serviceitem1
>> host1-service1/M/avg-serviceitem1
>>
>> The configuration you need to achive the above average aggregations is:
>> <cache>
>>   <aggregate>
>>     <method>avg</method>
>>   </aggregate>
>> </cache>
>>
>> If you like to combine it with the above descibed purging your
>> configuration would look like:
>> <cache>
>>   <aggregate>
>>     <method>avg</method>
>>   </aggregate>
>>
>>   <purge>
>>    <offset>10</offset>
>>    <period>D</period>
>>   </purge>
>> </cache>
>>
>> The new aggregated service definitions,
>> host1-service1/H/avg-serviceitem1, etc, will have their own cache
>> entries and can be used in threshold configurations and virtual
>> services like any other service definitions. For example in a
>> threshold hours section we could define
>>
>> <hours hoursID="2">
>>
>>   <hourinterval>
>>     <from>09:00</from>
>>     <to>12:00</to>
>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold>
>>   </hourinterval>
>>   ...
>>
>> This would mean that we use the average value for
>> host1-service1-serviceitem1  for the period of the last hour.
>> Aggregations are calculated hourly, daily, weekly and monthly.
>>
>> By default weekends metrics are not included in the aggrgation
>> calculation. This can be enabled by setting the
>> <useweekend>true</useweekend>:
>>
>> <cache>
>>   <aggregate>
>>     <method>avg</method>
>>     <useweekend>true</useweekend>
>>   </aggregate>
>>   ….
>> </cache>
>>
>> This will create aggregated service definitions with the following
>> name standard:
>> host1-service1/H/avg/weekend-serviceitem1
>> host1-service1/D/avg/weekend-serviceitem1
>> host1-service1/W/avg/weekend-serviceitem1
>> host1-service1/M/avg/weekend-serviceitem1
>>
>> You can also have multiple entries like:
>> <cache>
>>   <aggregate>
>>     <method>avg</method>
>>     <useweekend>true</useweekend>
>>   </aggregate>
>>   <aggregate>
>>     <method>max</method>
>>   </aggregate>
>>   ….
>> </cache>
>>
>> So how long time will the aggregated values be kept in the cache? By
>> default we save
>> Hour aggregation for 25 hours
>> Daily aggregations for 7 days
>> Weekly aggregations for 5 weeks
>> Monthly aggregations for 1 month
>>
>> These values can be override but they can not be lower then the
>> default. Below you have an example where we save the aggregation for
>> 168 hours, 60 days and 53 weeks.
>> <cache>
>>   <aggregate>
>>     <method>avg</method>
>>     <useweekend>true</useweekend>
>>     <retention>
>>       <period>H</period>
>>       <offset>168</offset>
>>     </retention>
>>     <retention>
>>      <period>D</period>
>>       <offset>60</offset>
>>     </retention>
>>     <retention>
>>       <period>W</period>
>>       <offset>53</offset>
>>     </retention>
>> </aggregate>
>>   ….
>> </cache>
>>
>> I hope this makes it a bit less confusing. What is clear to me is that
>> we need to improve the documentation in this area.
>>
>> Looking forward to your feedback.
>> Anders
>>
>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>> Hi,
>>> I am trying to setup the bischeck plugin for our organization. I have
>>> configured most part of it except for the cache retention period. Here
>>> is what I want - I want to store every value which has been generated
>>> during the past 1 month. The reason being my threshold is currently
>>> calculated as the average of the metric value during the past 4 weeks at
>>> the same time of the day.
>>>
>>> So, how do I define the cache template for this? If I don't define any
>>> cache template, for how many days is the data kept?
>>> Also, how does the aggregrate function work and and what does the purge
>>> Maxitems signify?
>>>
>>> I've gone through the documentation but it wasn't clear. Looking forward
>>> to a response.
>>>
>>> Bischeck is one awesome plugin. Keep up the great work.
>>>
>>> Regards,
>>> Rahul.
>>>
>>
>>
>
>


-- 

Ingby<http://www.ingby.com>

IngbyForge<http://gforge.ingby.com>

bischeck - dynamic and adaptive thresholds for Nagios 
<http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091



More information about the Bischeck-users mailing list