Specifying the retention period

Anders Håål anders.haal at ingby.com
Mon Sep 8 17:09:23 CEST 2014


Great if you can make a debian package, and I understand that you can 
not commit. The best thing would be integrated to our build process 
where we use ant.

if the purging is based on time then it could happen that data is 
removed from the cache since the logic is based on time relative to now. 
To avoid it you should increase the purge time before you start 
bischeck. And just a comment on your last sentence Redis TTl is never 
used :)
Anders

On 09/08/2014 02:09 PM, Rahul Amaram wrote:
> I would be more than happy to give you guys a testimonial. However, we
> have just taken this live and would like to see its performance before I
> give a testimonial.
>
> Also, if time permits, I'll try to bundle this for Debian (I'm a Debian
> maintainer). I can't commit on a timeline right away though :).
>
> Also, just to make things explicitly clear. I understand that the below
> service item ttl has nothing to do with Redis TTL. But If I stop my
> bischeck server for a day or two, then would any of my metrics get lost?
> Or would I have to increase th Redis TTL for this.
>
> Regards,
> Rahul.
>
> On Monday 08 September 2014 04:09 PM, Anders Håål wrote:
>> Glad that it clarified how to configure the cache section. I will make
>> a blog post on this in the mean time, until we have a updated
>> documentation. I agree with you that the structure of the
>> configuration is a bit "heavy", so ideas and input is appreciated.
>>
>> Regarding redis ttl, this is a redis feature we do not use. The ttl
>> mentioned in my mail is managed by bischeck. Redis ttl on linked list
>> do not work on individual nodes in a redis linked list.
>>
>> Currently the bischeck installer should work for ubuntu, redhat/centos
>> and debian. There is currently no plans to make distribution packages
>> like rpm or deb. I know op5 (www.op5.com) that bundles Bischeck make a
>> bischeck rpm. It would be super if there is any one that like to do
>> this for the project.
>> When it comes to packaging we have done a bit of work to create docker
>> containers, but its still experimental.
>>
>> I also encourage you, if you think bischeck support your monitoring
>> effort, to write a small testimony that we can put on the site.
>> Regards
>> Anders
>>
>> On 09/08/2014 11:30 AM, Rahul Amaram wrote:
>>> Thanks Anders. This explains precisely why my data was getting purged
>>> after 16 hours (30 values per hour * 1 hours = 480). It would be great
>>> if you could update the documentation with this info. The entire setup
>>> and configuration itself takes time to get a hold on and detailed
>>> documentation would be very helpful.
>>>
>>> Also, another quick question? Right now, I believe the Redis TTL is set
>>> to 2000 seconds. Does this mean that if I don't receive data for a
>>> particular serviceitem (or service or host) for a 2000 seconds, the data
>>> related to it is lost?
>>>
>>> Also, any plans for bundling this with distributions such as Debian?
>>>
>>> Regards,
>>> Rahul.
>>>
>>>
>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>>>> Hi Rahul,
>>>> Thanks for the question and feedback on the documentation. Great to
>>>> hear that you think Bischeck is awesome. If you do not understand how
>>>> it works by reading the documentation you are probably not alone, and
>>>> we should consider it a documentation bug.
>>>>
>>>> In 1.0.0 we introduce the concept that you asking about and it really
>>>> two different independent features.
>>>>
>>>> Lets start with cache purging.
>>>> Collected monitoring data, metrics, are kept in the cache (redis from
>>>> 1.0.0) as a linked lists. There is one linked list per service
>>>> definition, like host1-service1-serviceitem1.  Prior to 1.0.0 all the
>>>> linked lists had the same size that was defined with the property
>>>> lastStatusCacheSize. But in 1.0.0 we made that configurable so it
>>>> could be defined per service definition.
>>>> To enable individual cache configurations we added a section called
>>>> <cache> in the serviceitem section of the bischeck.xml. Like many
>>>> other configuration options in 1.0.0 the cache section could have the
>>>> specific values or point to a template that could be shared.
>>>> To manage the size of the cache , or to be more specific the linked
>>>> list size, we defined the <purge> section. The purge section can have
>>>> two different configurations. The first is defining the max size of
>>>> the cache linked list.
>>>> <cache>
>>>>   <purge>
>>>>    <maxcount>1000</maxcount>
>>>>   </purge>
>>>> </cache>
>>>>
>>>> The second options is to define the “time to live” for the metrics in
>>>> the cache.
>>>> <cache>
>>>>   <purge>
>>>>    <offset>10</offset>
>>>>    <period>D</period>
>>>>   </purge>
>>>> </cache>
>>>> In the above example we set the time to live to 10 days. So any
>>>> metrics older then this period will be removed. The period can have
>>>> the following values:
>>>> H - hours
>>>> D - days
>>>> W - weeks
>>>> Y - year
>>>>
>>>> The two option are mutual exclusive. You have to chose one for each
>>>> serviceitem or cache template.
>>>>
>>>> If no cache directive is define for a serviceitem the property
>>>> lastStatusCacheSize will be used. It's default value is 500.
>>>>
>>>> Hopefully this explains the cache purging.
>>>>
>>>> The next question was related to aggregations which has nothing to do
>>>> with purging, but it's configured in the same <cache> section. The
>>>> idea with aggregations was to create an automatic way to aggregate
>>>> metrics on the level of an hour, day, week and month. The aggregation
>>>> functions current supported is average, max and min.
>>>> Lets say you have a service definition of the format
>>>> host1-service1-serviceitem1. When you  enable an average (avg)
>>>> aggregation you will automatically get the following new service
>>>> definitions
>>>> host1-service1/H/avg-serviceitem1
>>>> host1-service1/D/avg-serviceitem1
>>>> host1-service1/W/avg-serviceitem1
>>>> host1-service1/M/avg-serviceitem1
>>>>
>>>> The configuration you need to achive the above average aggregations is:
>>>> <cache>
>>>>   <aggregate>
>>>>     <method>avg</method>
>>>>   </aggregate>
>>>> </cache>
>>>>
>>>> If you like to combine it with the above descibed purging your
>>>> configuration would look like:
>>>> <cache>
>>>>   <aggregate>
>>>>     <method>avg</method>
>>>>   </aggregate>
>>>>
>>>>   <purge>
>>>>    <offset>10</offset>
>>>>    <period>D</period>
>>>>   </purge>
>>>> </cache>
>>>>
>>>> The new aggregated service definitions,
>>>> host1-service1/H/avg-serviceitem1, etc, will have their own cache
>>>> entries and can be used in threshold configurations and virtual
>>>> services like any other service definitions. For example in a
>>>> threshold hours section we could define
>>>>
>>>> <hours hoursID="2">
>>>>
>>>>   <hourinterval>
>>>>     <from>09:00</from>
>>>>     <to>12:00</to>
>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold>
>>>>   </hourinterval>
>>>>   ...
>>>>
>>>> This would mean that we use the average value for
>>>> host1-service1-serviceitem1  for the period of the last hour.
>>>> Aggregations are calculated hourly, daily, weekly and monthly.
>>>>
>>>> By default weekends metrics are not included in the aggrgation
>>>> calculation. This can be enabled by setting the
>>>> <useweekend>true</useweekend>:
>>>>
>>>> <cache>
>>>>   <aggregate>
>>>>     <method>avg</method>
>>>>     <useweekend>true</useweekend>
>>>>   </aggregate>
>>>>   ….
>>>> </cache>
>>>>
>>>> This will create aggregated service definitions with the following
>>>> name standard:
>>>> host1-service1/H/avg/weekend-serviceitem1
>>>> host1-service1/D/avg/weekend-serviceitem1
>>>> host1-service1/W/avg/weekend-serviceitem1
>>>> host1-service1/M/avg/weekend-serviceitem1
>>>>
>>>> You can also have multiple entries like:
>>>> <cache>
>>>>   <aggregate>
>>>>     <method>avg</method>
>>>>     <useweekend>true</useweekend>
>>>>   </aggregate>
>>>>   <aggregate>
>>>>     <method>max</method>
>>>>   </aggregate>
>>>>   ….
>>>> </cache>
>>>>
>>>> So how long time will the aggregated values be kept in the cache? By
>>>> default we save
>>>> Hour aggregation for 25 hours
>>>> Daily aggregations for 7 days
>>>> Weekly aggregations for 5 weeks
>>>> Monthly aggregations for 1 month
>>>>
>>>> These values can be override but they can not be lower then the
>>>> default. Below you have an example where we save the aggregation for
>>>> 168 hours, 60 days and 53 weeks.
>>>> <cache>
>>>>   <aggregate>
>>>>     <method>avg</method>
>>>>     <useweekend>true</useweekend>
>>>>     <retention>
>>>>       <period>H</period>
>>>>       <offset>168</offset>
>>>>     </retention>
>>>>     <retention>
>>>>      <period>D</period>
>>>>       <offset>60</offset>
>>>>     </retention>
>>>>     <retention>
>>>>       <period>W</period>
>>>>       <offset>53</offset>
>>>>     </retention>
>>>> </aggregate>
>>>>   ….
>>>> </cache>
>>>>
>>>> I hope this makes it a bit less confusing. What is clear to me is that
>>>> we need to improve the documentation in this area.
>>>>
>>>> Looking forward to your feedback.
>>>> Anders
>>>>
>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>>>> Hi,
>>>>> I am trying to setup the bischeck plugin for our organization. I have
>>>>> configured most part of it except for the cache retention period. Here
>>>>> is what I want - I want to store every value which has been generated
>>>>> during the past 1 month. The reason being my threshold is currently
>>>>> calculated as the average of the metric value during the past 4
>>>>> weeks at
>>>>> the same time of the day.
>>>>>
>>>>> So, how do I define the cache template for this? If I don't define any
>>>>> cache template, for how many days is the data kept?
>>>>> Also, how does the aggregrate function work and and what does the
>>>>> purge
>>>>> Maxitems signify?
>>>>>
>>>>> I've gone through the documentation but it wasn't clear. Looking
>>>>> forward
>>>>> to a response.
>>>>>
>>>>> Bischeck is one awesome plugin. Keep up the great work.
>>>>>
>>>>> Regards,
>>>>> Rahul.
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


-- 

Ingby<http://www.ingby.com>

IngbyForge<http://gforge.ingby.com>

bischeck - dynamic and adaptive thresholds for Nagios 
<http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091



More information about the Bischeck-users mailing list