Specifying the retention period

Anders Håål anders.haal at ingby.com
Mon Sep 8 10:34:10 CEST 2014


Hi Rahul,
Thanks for the question and feedback on the documentation. Great to hear 
that you think Bischeck is awesome. If you do not understand how it 
works by reading the documentation you are probably not alone, and we 
should consider it a documentation bug.

In 1.0.0 we introduce the concept that you asking about and it really 
two different independent features.

Lets start with cache purging.
Collected monitoring data, metrics, are kept in the cache (redis from 
1.0.0) as a linked lists. There is one linked list per service 
definition, like host1-service1-serviceitem1.  Prior to 1.0.0 all the 
linked lists had the same size that was defined with the property 
lastStatusCacheSize. But in 1.0.0 we made that configurable so it could 
be defined per service definition.
To enable individual cache configurations we added a section called 
<cache> in the serviceitem section of the bischeck.xml. Like many other 
configuration options in 1.0.0 the cache section could have the specific 
values or point to a template that could be shared.
To manage the size of the cache , or to be more specific the linked list 
size, we defined the <purge> section. The purge section can have two 
different configurations. The first is defining the max size of the 
cache linked list.
<cache>
   <purge>
    <maxcount>1000</maxcount>
   </purge>
</cache>

The second options is to define the “time to live” for the metrics in 
the cache.
<cache>
   <purge>
    <offset>10</offset>
    <period>D</period>
   </purge>
</cache>
In the above example we set the time to live to 10 days. So any metrics 
older then this period will be removed. The period can have the 
following values:
H - hours
D - days
W - weeks
Y - year

The two option are mutual exclusive. You have to chose one for each 
serviceitem or cache template.

If no cache directive is define for a serviceitem the property 
lastStatusCacheSize will be used. It's default value is 500.

Hopefully this explains the cache purging.

The next question was related to aggregations which has nothing to do 
with purging, but it's configured in the same <cache> section. The idea 
with aggregations was to create an automatic way to aggregate metrics on 
the level of an hour, day, week and month. The aggregation functions 
current supported is average, max and min.
Lets say you have a service definition of the format 
host1-service1-serviceitem1. When you  enable an average (avg) 
aggregation you will automatically get the following new service 
definitions
host1-service1/H/avg-serviceitem1
host1-service1/D/avg-serviceitem1
host1-service1/W/avg-serviceitem1
host1-service1/M/avg-serviceitem1

The configuration you need to achive the above average aggregations is:
<cache>
   <aggregate>
     <method>avg</method>
   </aggregate>
</cache>

If you like to combine it with the above descibed purging your 
configuration would look like:
<cache>
   <aggregate>
     <method>avg</method>
   </aggregate>

   <purge>
    <offset>10</offset>
    <period>D</period>
   </purge>
</cache>

The new aggregated service definitions, 
host1-service1/H/avg-serviceitem1, etc, will have their own cache 
entries and can be used in threshold configurations and virtual services 
like any other service definitions. For example in a threshold hours 
section we could define

<hours hoursID="2">
		
   <hourinterval>
     <from>09:00</from>
     <to>12:00</to>			
     <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold>
   </hourinterval>
   ...

This would mean that we use the average value for 
host1-service1-serviceitem1  for the period of the last hour.
Aggregations are calculated hourly, daily, weekly and monthly.

By default weekends metrics are not included in the aggrgation 
calculation. This can be enabled by setting the 
<useweekend>true</useweekend>:

<cache>
   <aggregate>
     <method>avg</method>
     <useweekend>true</useweekend>
   </aggregate>
   ….
</cache>

This will create aggregated service definitions with the following name 
standard:
host1-service1/H/avg/weekend-serviceitem1
host1-service1/D/avg/weekend-serviceitem1
host1-service1/W/avg/weekend-serviceitem1
host1-service1/M/avg/weekend-serviceitem1

You can also have multiple entries like:
<cache>
   <aggregate>
     <method>avg</method>
     <useweekend>true</useweekend>
   </aggregate>
   <aggregate>
     <method>max</method>
   </aggregate>
   ….
</cache>

So how long time will the aggregated values be kept in the cache? By 
default we save
Hour aggregation for 25 hours
Daily aggregations for 7 days
Weekly aggregations for 5 weeks
Monthly aggregations for 1 month

These values can be override but they can not be lower then the default. 
Below you have an example where we save the aggregation for 168 hours, 
60 days and 53 weeks.
<cache>
   <aggregate>
     <method>avg</method>
     <useweekend>true</useweekend>
     <retention>
       <period>H</period>
       <offset>168</offset>
     </retention>
     <retention>
      <period>D</period>
       <offset>60</offset>
     </retention>
     <retention>
       <period>W</period>
       <offset>53</offset>
     </retention>
</aggregate>
   ….
</cache>

I hope this makes it a bit less confusing. What is clear to me is that 
we need to improve the documentation in this area.

Looking forward to your feedback.
Anders

On 09/08/2014 06:02 AM, Rahul Amaram wrote:
> Hi,
> I am trying to setup the bischeck plugin for our organization. I have
> configured most part of it except for the cache retention period. Here
> is what I want - I want to store every value which has been generated
> during the past 1 month. The reason being my threshold is currently
> calculated as the average of the metric value during the past 4 weeks at
> the same time of the day.
>
> So, how do I define the cache template for this? If I don't define any
> cache template, for how many days is the data kept?
> Also, how does the aggregrate function work and and what does the purge
> Maxitems signify?
>
> I've gone through the documentation but it wasn't clear. Looking forward
> to a response.
>
> Bischeck is one awesome plugin. Keep up the great work.
>
> Regards,
> Rahul.
>


-- 

Ingby<http://www.ingby.com>

IngbyForge<http://gforge.ingby.com>

bischeck - dynamic and adaptive thresholds for Nagios 
<http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091



More information about the Bischeck-users mailing list