Specifying the retention period

Anders Håål anders.haal at ingby.com
Fri Sep 12 11:11:39 CEST 2014


Glad that it worked out. What is clear to me is that this topic is not 
that simple to understand with the current documentation, so this 
feedback from you is vary valuable. Will add some additional blog posts 
on the topic and then get it into the next major release documentation. 
We will also need to figure out if this can be simplified.

Did you try the CacheCli?

Keep the feedback coming.
Anders

On 09/11/2014 11:39 PM, Rahul Amaram wrote:
> Ok. I figured out the problem. It was with my understanding. I have 
> weekend to be true. So, instead of 
> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23], I should 
> be using 
> $$HOSTNAME$$-$$SERVICENAME$$/H/avg/weekend-$$SERVICEITEMNAME$$[23] and 
> so on.
>
> Thanks for the awesome support.
>
> - Rahul.
>
> On Thursday 11 September 2014 11:43 AM, Anders Håål wrote:
>> Hi Rahul,
>> Now I have a backlog of questions :)
>> Okay lets start with the last question.
>> - First verify that you have data in the cahe. User redis-cli or the 
>> Bischeck CacheCli, 
>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4.
>> - Then there is an issue with null data. Lets say that one of the 
>> expressions you have return null. Null is tricky so in Bischeck you 
>> have to decide how to manage a null value. Look at 
>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Section-4.3. 
>>
>> - You can also check the logs and also increase the loglevel to debug 
>> to get more info. Check out 
>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-3.2. 
>>
>>
>> The two following questions I will try to clarify better later, must 
>> run into a meeting, but the index on hour specify an specific hour, 
>> like the avg, max or min for that hour. Index 0 means the last 
>> calculated hour so if time is 2:30 index 0 means the avg, max or min 
>> for the period 1:00 to 2:00.
>>
>> These are good question, we are glad that get your users perspective 
>> on this.
>> Anders
>>
>> On 09/11/2014 07:19 AM, Rahul Amaram wrote:
>>> This doesn't help :(.
>>>
>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[167],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[335])</threshold> 
>>>
>>>
>>> - Rahul.
>>>
>>> On Thursday 11 September 2014 10:45 AM, Rahul Amaram wrote:
>>>> Also, let us say, that the current time is 2.30 and that I want the 
>>>> average of all the values between 2.00 and 3.00 the previous day, 
>>>> I'd probably have to use
>>>>
>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[23]
>>>>
>>>> rather than
>>>>
>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24]
>>>>
>>>> Am I right ?
>>>>
>>>> Thanks,
>>>> Rahul.
>>>>
>>>> On Thursday 11 September 2014 10:39 AM, Rahul Amaram wrote:
>>>>> Ok. So would 
>>>>> $$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[24] refer 
>>>>> to the average of the all the values ONLY in the 24th hour before 
>>>>> the current time?
>>>>>
>>>>> On Thursday 11 September 2014 10:30 AM, Anders Håål wrote:
>>>>>> Hi Amaram,
>>>>>> I think you just need to remove the minus sign when using the 
>>>>>> aggregated. Minus is used for time, like back in time, and just a 
>>>>>> integer without minus and a time indicator is an index. Check out 
>>>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_configuration_guide.html#toc-Chapter-4. 
>>>>>>
>>>>>> You can also use redis-cli to explore the data in the cache. The 
>>>>>> key in the redis is the same as the service definition.
>>>>>> Anders
>>>>>>
>>>>>> On 09/11/2014 06:38 AM, Rahul Amaram wrote:
>>>>>>> Ok. I am facing another issue. I have been running bischeck with 
>>>>>>> the aggregate function for more than a day. I am using the below 
>>>>>>> threshold function.
>>>>>>>
>>>>>>> <threshold>avg($$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-24],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-168],$$HOSTNAME$$-$$SERVICENAME$$/H/avg-$$SERVICEITEMNAME$$[-336])</threshold> 
>>>>>>>
>>>>>>>
>>>>>>> and it doesn't seem to work. I am expecting that the first 
>>>>>>> aggregate value should be available.
>>>>>>>
>>>>>>> Instead if I use the below threshold function (I know this is 
>>>>>>> not related to aggregate)
>>>>>>>
>>>>>>> avg($$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-24H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-168H],$$HOSTNAME$$-$$SERVICENAME$$-$$SERVICEITEMNAME$$[-336H]) 
>>>>>>>
>>>>>>>
>>>>>>> the threshold is calcuated fine, which is just the first value 
>>>>>>> as the remaining two values are not in cache.
>>>>>>>
>>>>>>> How can I debug why aggregate is not working?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul.
>>>>>>>
>>>>>>> On Wednesday 10 September 2014 04:53 PM, Anders Håål wrote:
>>>>>>>> Thanks - got the ticket.
>>>>>>>> I will update progress on the bug ticket, but its good that the 
>>>>>>>> work around works.
>>>>>>>> Anders
>>>>>>>>
>>>>>>>> On 09/10/2014 01:20 PM, Rahul Amaram wrote:
>>>>>>>>> That indeed seems to be the problem. Using count rather than 
>>>>>>>>> period
>>>>>>>>> seems to address the issue. Raised a ticket -
>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/?action=TrackerItemEdit&tracker_item_id=259 
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Rahul.
>>>>>>>>>
>>>>>>>>> On Wednesday 10 September 2014 04:02 PM, Anders Håål wrote:
>>>>>>>>>> This looks like a bug. Could you please report it on
>>>>>>>>>> http://gforge.ingby.com/gf/project/bischeck/tracker/ in the Bugs
>>>>>>>>>> tracker. You need a account but its just a sign up and you 
>>>>>>>>>> get an
>>>>>>>>>> email confirmation.
>>>>>>>>>> Can you try to use maxcount for purging instead as a work 
>>>>>>>>>> around? Just
>>>>>>>>>> calculate your maxcount based on the scheduling interval you 
>>>>>>>>>> use.
>>>>>>>>>> Anders
>>>>>>>>>>
>>>>>>>>>> On 09/10/2014 12:17 PM, Rahul Amaram wrote:
>>>>>>>>>>> Following up on the earlier topic, I am seeing the below 
>>>>>>>>>>> errors related
>>>>>>>>>>> to cache purge. Any idea on what might be causing this? I 
>>>>>>>>>>> don't see any
>>>>>>>>>>> other errors in log related to metrics.
>>>>>>>>>>>
>>>>>>>>>>> 2014-09-10 12:12:00.001 ; INFO ; 
>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; 
>>>>>>>>>>> CachePurge
>>>>>>>>>>> purging 180
>>>>>>>>>>> 2014-09-10 12:12:00.003 ; INFO ; 
>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob ; 
>>>>>>>>>>> CachePurge
>>>>>>>>>>> executed in 1 ms
>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; 
>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>> org.quartz.core.JobRunShell ; Job 
>>>>>>>>>>> DailyMaintenance.CachePurge threw an
>>>>>>>>>>> unhandled Exception: java.lang.NullPointerException: null
>>>>>>>>>>>          at
>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>          at
>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-09-10 12:12:00.003 ; ERROR ; 
>>>>>>>>>>> DefaultQuartzScheduler_Worker-5 ;
>>>>>>>>>>> org.quartz.core.ErrorLogger ; Job 
>>>>>>>>>>> (DailyMaintenance.CachePurge threw an
>>>>>>>>>>> exception.org.quartz.SchedulerException: Job threw an unhandled
>>>>>>>>>>> exception.
>>>>>>>>>>>          at 
>>>>>>>>>>> org.quartz.core.JobRunShell.run(JobRunShell.java:224)
>>>>>>>>>>>          at
>>>>>>>>>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Caused by: java.lang.NullPointerException: null
>>>>>>>>>>>          at
>>>>>>>>>>> com.ingby.socbox.bischeck.cache.provider.redis.LastStatusCache.trim(LastStatusCache.java:1250) 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>          at
>>>>>>>>>>> com.ingby.socbox.bischeck.configuration.CachePurgeJob.execute(CachePurgeJob.java:140) 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is my cache configuration:
>>>>>>>>>>>
>>>>>>>>>>>      <cache>
>>>>>>>>>>>        <aggregate>
>>>>>>>>>>>          <method>avg</method>
>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>          <retention>
>>>>>>>>>>>            <period>H</period>
>>>>>>>>>>>            <offset>720</offset>
>>>>>>>>>>>          </retention>
>>>>>>>>>>>          <retention>
>>>>>>>>>>>            <period>D</period>
>>>>>>>>>>>            <offset>30</offset>
>>>>>>>>>>>          </retention>
>>>>>>>>>>>        </aggregate>
>>>>>>>>>>>
>>>>>>>>>>>        <purge>
>>>>>>>>>>>          <offset>30</offset>
>>>>>>>>>>>          <period>D</period>
>>>>>>>>>>>        </purge>
>>>>>>>>>>>      </cache>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Rahul.
>>>>>>>>>>> On Monday 08 September 2014 08:39 PM, Anders Håål wrote:
>>>>>>>>>>>> Great if you can make a debian package, and I understand 
>>>>>>>>>>>> that you can
>>>>>>>>>>>> not commit. The best thing would be integrated to our build 
>>>>>>>>>>>> process
>>>>>>>>>>>> where we use ant.
>>>>>>>>>>>>
>>>>>>>>>>>> if the purging is based on time then it could happen that 
>>>>>>>>>>>> data is
>>>>>>>>>>>> removed from the cache since the logic is based on time 
>>>>>>>>>>>> relative to
>>>>>>>>>>>> now. To avoid it you should increase the purge time before 
>>>>>>>>>>>> you start
>>>>>>>>>>>> bischeck. And just a comment on your last sentence Redis 
>>>>>>>>>>>> TTl is never
>>>>>>>>>>>> used :)
>>>>>>>>>>>> Anders
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/08/2014 02:09 PM, Rahul Amaram wrote:
>>>>>>>>>>>>> I would be more than happy to give you guys a testimonial. 
>>>>>>>>>>>>> However, we
>>>>>>>>>>>>> have just taken this live and would like to see its 
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> before I
>>>>>>>>>>>>> give a testimonial.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, if time permits, I'll try to bundle this for Debian 
>>>>>>>>>>>>> (I'm a
>>>>>>>>>>>>> Debian
>>>>>>>>>>>>> maintainer). I can't commit on a timeline right away 
>>>>>>>>>>>>> though :).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, just to make things explicitly clear. I understand 
>>>>>>>>>>>>> that the
>>>>>>>>>>>>> below
>>>>>>>>>>>>> service item ttl has nothing to do with Redis TTL. But If 
>>>>>>>>>>>>> I stop my
>>>>>>>>>>>>> bischeck server for a day or two, then would any of my 
>>>>>>>>>>>>> metrics get
>>>>>>>>>>>>> lost?
>>>>>>>>>>>>> Or would I have to increase th Redis TTL for this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Monday 08 September 2014 04:09 PM, Anders Håål wrote:
>>>>>>>>>>>>>> Glad that it clarified how to configure the cache 
>>>>>>>>>>>>>> section. I will
>>>>>>>>>>>>>> make
>>>>>>>>>>>>>> a blog post on this in the mean time, until we have a 
>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>> documentation. I agree with you that the structure of the
>>>>>>>>>>>>>> configuration is a bit "heavy", so ideas and input is 
>>>>>>>>>>>>>> appreciated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regarding redis ttl, this is a redis feature we do not 
>>>>>>>>>>>>>> use. The ttl
>>>>>>>>>>>>>> mentioned in my mail is managed by bischeck. Redis ttl on 
>>>>>>>>>>>>>> linked list
>>>>>>>>>>>>>> do not work on individual nodes in a redis linked list.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Currently the bischeck installer should work for ubuntu,
>>>>>>>>>>>>>> redhat/centos
>>>>>>>>>>>>>> and debian. There is currently no plans to make 
>>>>>>>>>>>>>> distribution packages
>>>>>>>>>>>>>> like rpm or deb. I know op5 (www.op5.com) that bundles 
>>>>>>>>>>>>>> Bischeck
>>>>>>>>>>>>>> make a
>>>>>>>>>>>>>> bischeck rpm. It would be super if there is any one that 
>>>>>>>>>>>>>> like to do
>>>>>>>>>>>>>> this for the project.
>>>>>>>>>>>>>> When it comes to packaging we have done a bit of work to 
>>>>>>>>>>>>>> create
>>>>>>>>>>>>>> docker
>>>>>>>>>>>>>> containers, but its still experimental.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also encourage you, if you think bischeck support your 
>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>> effort, to write a small testimony that we can put on the 
>>>>>>>>>>>>>> site.
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09/08/2014 11:30 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>> Thanks Anders. This explains precisely why my data was 
>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>> purged
>>>>>>>>>>>>>>> after 16 hours (30 values per hour * 1 hours = 480). It 
>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>> great
>>>>>>>>>>>>>>> if you could update the documentation with this info. 
>>>>>>>>>>>>>>> The entire
>>>>>>>>>>>>>>> setup
>>>>>>>>>>>>>>> and configuration itself takes time to get a hold on and 
>>>>>>>>>>>>>>> detailed
>>>>>>>>>>>>>>> documentation would be very helpful.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, another quick question? Right now, I believe the 
>>>>>>>>>>>>>>> Redis TTL is
>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>> to 2000 seconds. Does this mean that if I don't receive 
>>>>>>>>>>>>>>> data for a
>>>>>>>>>>>>>>> particular serviceitem (or service or host) for a 2000 
>>>>>>>>>>>>>>> seconds, the
>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>> related to it is lost?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, any plans for bundling this with distributions 
>>>>>>>>>>>>>>> such as Debian?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Monday 08 September 2014 02:04 PM, Anders Håål wrote:
>>>>>>>>>>>>>>>> Hi Rahul,
>>>>>>>>>>>>>>>> Thanks for the question and feedback on the 
>>>>>>>>>>>>>>>> documentation. Great to
>>>>>>>>>>>>>>>> hear that you think Bischeck is awesome. If you do not
>>>>>>>>>>>>>>>> understand how
>>>>>>>>>>>>>>>> it works by reading the documentation you are probably not
>>>>>>>>>>>>>>>> alone, and
>>>>>>>>>>>>>>>> we should consider it a documentation bug.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In 1.0.0 we introduce the concept that you asking about 
>>>>>>>>>>>>>>>> and it
>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>> two different independent features.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Lets start with cache purging.
>>>>>>>>>>>>>>>> Collected monitoring data, metrics, are kept in the 
>>>>>>>>>>>>>>>> cache (redis
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> 1.0.0) as a linked lists. There is one linked list per 
>>>>>>>>>>>>>>>> service
>>>>>>>>>>>>>>>> definition, like host1-service1-serviceitem1. Prior to 
>>>>>>>>>>>>>>>> 1.0.0
>>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>>> linked lists had the same size that was defined with 
>>>>>>>>>>>>>>>> the property
>>>>>>>>>>>>>>>> lastStatusCacheSize. But in 1.0.0 we made that 
>>>>>>>>>>>>>>>> configurable so it
>>>>>>>>>>>>>>>> could be defined per service definition.
>>>>>>>>>>>>>>>> To enable individual cache configurations we added a 
>>>>>>>>>>>>>>>> section called
>>>>>>>>>>>>>>>> <cache> in the serviceitem section of the bischeck.xml. 
>>>>>>>>>>>>>>>> Like many
>>>>>>>>>>>>>>>> other configuration options in 1.0.0 the cache section 
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>> specific values or point to a template that could be 
>>>>>>>>>>>>>>>> shared.
>>>>>>>>>>>>>>>> To manage the size of the cache , or to be more 
>>>>>>>>>>>>>>>> specific the linked
>>>>>>>>>>>>>>>> list size, we defined the <purge> section. The purge 
>>>>>>>>>>>>>>>> section can
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> two different configurations. The first is defining the 
>>>>>>>>>>>>>>>> max size of
>>>>>>>>>>>>>>>> the cache linked list.
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>> <maxcount>1000</maxcount>
>>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The second options is to define the “time to live” for the
>>>>>>>>>>>>>>>> metrics in
>>>>>>>>>>>>>>>> the cache.
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>>    <offset>10</offset>
>>>>>>>>>>>>>>>>    <period>D</period>
>>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>> In the above example we set the time to live to 10 
>>>>>>>>>>>>>>>> days. So any
>>>>>>>>>>>>>>>> metrics older then this period will be removed. The 
>>>>>>>>>>>>>>>> period can have
>>>>>>>>>>>>>>>> the following values:
>>>>>>>>>>>>>>>> H - hours
>>>>>>>>>>>>>>>> D - days
>>>>>>>>>>>>>>>> W - weeks
>>>>>>>>>>>>>>>> Y - year
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The two option are mutual exclusive. You have to chose 
>>>>>>>>>>>>>>>> one for each
>>>>>>>>>>>>>>>> serviceitem or cache template.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If no cache directive is define for a serviceitem the 
>>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>>> lastStatusCacheSize will be used. It's default value is 
>>>>>>>>>>>>>>>> 500.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hopefully this explains the cache purging.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The next question was related to aggregations which has 
>>>>>>>>>>>>>>>> nothing
>>>>>>>>>>>>>>>> to do
>>>>>>>>>>>>>>>> with purging, but it's configured in the same <cache> 
>>>>>>>>>>>>>>>> section. The
>>>>>>>>>>>>>>>> idea with aggregations was to create an automatic way 
>>>>>>>>>>>>>>>> to aggregate
>>>>>>>>>>>>>>>> metrics on the level of an hour, day, week and month. The
>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>> functions current supported is average, max and min.
>>>>>>>>>>>>>>>> Lets say you have a service definition of the format
>>>>>>>>>>>>>>>> host1-service1-serviceitem1. When you enable an average 
>>>>>>>>>>>>>>>> (avg)
>>>>>>>>>>>>>>>> aggregation you will automatically get the following 
>>>>>>>>>>>>>>>> new service
>>>>>>>>>>>>>>>> definitions
>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1
>>>>>>>>>>>>>>>> host1-service1/D/avg-serviceitem1
>>>>>>>>>>>>>>>> host1-service1/W/avg-serviceitem1
>>>>>>>>>>>>>>>> host1-service1/M/avg-serviceitem1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The configuration you need to achive the above average
>>>>>>>>>>>>>>>> aggregations is:
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you like to combine it with the above descibed 
>>>>>>>>>>>>>>>> purging your
>>>>>>>>>>>>>>>> configuration would look like:
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   <purge>
>>>>>>>>>>>>>>>>    <offset>10</offset>
>>>>>>>>>>>>>>>>    <period>D</period>
>>>>>>>>>>>>>>>>   </purge>
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The new aggregated service definitions,
>>>>>>>>>>>>>>>> host1-service1/H/avg-serviceitem1, etc, will have their 
>>>>>>>>>>>>>>>> own cache
>>>>>>>>>>>>>>>> entries and can be used in threshold configurations and 
>>>>>>>>>>>>>>>> virtual
>>>>>>>>>>>>>>>> services like any other service definitions. For 
>>>>>>>>>>>>>>>> example in a
>>>>>>>>>>>>>>>> threshold hours section we could define
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <hours hoursID="2">
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   <hourinterval>
>>>>>>>>>>>>>>>>     <from>09:00</from>
>>>>>>>>>>>>>>>>     <to>12:00</to>
>>>>>>>>>>>>>>>> <threshold>host1-service1/H/avg-serviceitem1[0]*0.8</threshold> 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   </hourinterval>
>>>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This would mean that we use the average value for
>>>>>>>>>>>>>>>> host1-service1-serviceitem1  for the period of the last 
>>>>>>>>>>>>>>>> hour.
>>>>>>>>>>>>>>>> Aggregations are calculated hourly, daily, weekly and 
>>>>>>>>>>>>>>>> monthly.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> By default weekends metrics are not included in the 
>>>>>>>>>>>>>>>> aggrgation
>>>>>>>>>>>>>>>> calculation. This can be enabled by setting the
>>>>>>>>>>>>>>>> <useweekend>true</useweekend>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This will create aggregated service definitions with 
>>>>>>>>>>>>>>>> the following
>>>>>>>>>>>>>>>> name standard:
>>>>>>>>>>>>>>>> host1-service1/H/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>> host1-service1/D/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>> host1-service1/W/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>> host1-service1/M/avg/weekend-serviceitem1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You can also have multiple entries like:
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>     <method>max</method>
>>>>>>>>>>>>>>>>   </aggregate>
>>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So how long time will the aggregated values be kept in the
>>>>>>>>>>>>>>>> cache? By
>>>>>>>>>>>>>>>> default we save
>>>>>>>>>>>>>>>> Hour aggregation for 25 hours
>>>>>>>>>>>>>>>> Daily aggregations for 7 days
>>>>>>>>>>>>>>>> Weekly aggregations for 5 weeks
>>>>>>>>>>>>>>>> Monthly aggregations for 1 month
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> These values can be override but they can not be lower 
>>>>>>>>>>>>>>>> then the
>>>>>>>>>>>>>>>> default. Below you have an example where we save the 
>>>>>>>>>>>>>>>> aggregation
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> 168 hours, 60 days and 53 weeks.
>>>>>>>>>>>>>>>> <cache>
>>>>>>>>>>>>>>>>   <aggregate>
>>>>>>>>>>>>>>>>     <method>avg</method>
>>>>>>>>>>>>>>>> <useweekend>true</useweekend>
>>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>>       <period>H</period>
>>>>>>>>>>>>>>>> <offset>168</offset>
>>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>>      <period>D</period>
>>>>>>>>>>>>>>>> <offset>60</offset>
>>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>>     <retention>
>>>>>>>>>>>>>>>>       <period>W</period>
>>>>>>>>>>>>>>>> <offset>53</offset>
>>>>>>>>>>>>>>>>     </retention>
>>>>>>>>>>>>>>>> </aggregate>
>>>>>>>>>>>>>>>>   ….
>>>>>>>>>>>>>>>> </cache>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I hope this makes it a bit less confusing. What is 
>>>>>>>>>>>>>>>> clear to me is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> we need to improve the documentation in this area.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>> Anders
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 09/08/2014 06:02 AM, Rahul Amaram wrote:
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I am trying to setup the bischeck plugin for our 
>>>>>>>>>>>>>>>>> organization. I
>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>> configured most part of it except for the cache 
>>>>>>>>>>>>>>>>> retention period.
>>>>>>>>>>>>>>>>> Here
>>>>>>>>>>>>>>>>> is what I want - I want to store every value which has 
>>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>>> during the past 1 month. The reason being my threshold is
>>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>> calculated as the average of the metric value during 
>>>>>>>>>>>>>>>>> the past 4
>>>>>>>>>>>>>>>>> weeks at
>>>>>>>>>>>>>>>>> the same time of the day.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So, how do I define the cache template for this? If I 
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>> define any
>>>>>>>>>>>>>>>>> cache template, for how many days is the data kept?
>>>>>>>>>>>>>>>>> Also, how does the aggregrate function work and and 
>>>>>>>>>>>>>>>>> what does the
>>>>>>>>>>>>>>>>> purge
>>>>>>>>>>>>>>>>> Maxitems signify?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've gone through the documentation but it wasn't 
>>>>>>>>>>>>>>>>> clear. Looking
>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>> to a response.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bischeck is one awesome plugin. Keep up the great work.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Rahul.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>


-- 

Ingby<http://www.ingby.com>

IngbyForge<http://gforge.ingby.com>

bischeck - dynamic and adaptive thresholds for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091



More information about the Bischeck-users mailing list