Possibility to avoid certain values which are way too deviant while calculating threshold

Rahul Amaram rahul.amaram at vizury.com
Mon Apr 20 15:12:54 CEST 2015


Done. Couldn't raise a feature request via the web interface. So, 
dropped a mail.

Thanks,
Rahul.

On Sunday 28 December 2014 11:44 AM, anders.haal at ingby.com wrote:
> I have looked into the topic a little bit more and I think the 
> capability to detect outliers is an important functionality pointed 
> out by Rahul.
> I think we should try to get some functionality like the MAD approach 
> into the next version.
> @Rahul - please make a feature request on this topic.
> Anders
>
>
>
> On 12/17/2014 09:57 PM, Anders Håål wrote:
>> Sorry for the link - 
>> http://stats.stackexchange.com/questions/38001/detecting-outliers-using-standard-deviations
>>
>>
>> The problem is not to write the code, the problem is to find a logic 
>> to determine which numbers to remove from the data set. What is a 
>> deviation from the normal difference in the set?
>>
>> Googling a bit more I found these definitions that may be applicable 
>> using stdev for your use case:
>>
>> *Mean and Standard Deviation Method**
>> *For this outlier detection method, the mean and standard deviation 
>> of the residuals are calculated and compared. If a value is a certain 
>> number of standard deviations away from the mean, that data point is 
>> identified as an outlier. The specified number of standard deviations 
>> is called the threshold. The default value is 3.
>>
>> This method can fail to detect outliers because the outliers increase 
>> the standard deviation. The more extreme the outlier, the more the 
>> standard deviation is affected.
>>
>> *Median and Median Absolute Deviation Method (MAD)**
>> *
>> For this outlier detection method, the median of the residuals is 
>> calculated. Then, the difference is calculated between each 
>> historical value and this median. These differences are expressed as 
>> their absolute values, and a new median is calculated and multiplied 
>> by an empirically derived constant to yield the median absolute 
>> deviation (MAD). If a value is a certain number of MAD away from the 
>> median of the residuals, that value is classified as an outlier. The 
>> default threshold is 3 MAD.
>>
>> This method is generally more effective than the mean and standard 
>> deviation method for detecting outliers, but it can be too aggressive 
>> in classifying values that are not really extremely different. Also, 
>> if more than 50% of the data points have the same value, MAD is 
>> computed to be 0, so any value different from the residual median is 
>> classified as an outlier.
>>
>> *Median and Interquartile Deviation Method (IQD)*
>>
>> For this outlier detection method, the median of the residuals is 
>> calculated, along with the 25th percentile and the 75th percentile. 
>> The difference between the 25th and 75th percentile is the 
>> interquartile deviation (IQD). Then, the difference is calculated 
>> between each historical value and the residual median. If the 
>> historical value is a certain number of MAD away from the median of 
>> the residuals, that value is classified as an outlier. The default 
>> threshold is 2.22, which is equivalent to 3 standard deviations or MADs.
>>
>> This method is somewhat susceptible to influence from extreme 
>> outliers, but less so than the mean and standard deviation method. 
>> Box plots are based on this approach. The median and interquartile 
>> deviation method can be used for both symmetric and asymmetric data.
>>
>> If you find a method that you think could work, we could implement it 
>> together and you can verify it with your data. Can you say anything 
>> about the data collected?
>> Anders
>>
>> On 12/17/2014 09:25 PM, Rahul Amaram wrote:
>>> Hi Andre,
>>>
>>> So, I would like to remove the outlier and calculate the mean for 
>>> the remaining elements. Any suggestion apart from writing my own 
>>> custom math function? Also, I don't think that you have shared the 
>>> link.
>>>
>>> Thanks,
>>> Rahul.
>>>
>>> On Thursday 18 December 2014 12:55 AM, Anders Håål wrote:
>>>> Hi Rahul,
>>>> Its possible, but the question is what algorithm to use. The second 
>>>> question would also be what would you do with the remaining set, 
>>>> calculate a mean?
>>>> When it comes to exclude a deviant value it sound close to what is 
>>>> called a outlier, http://en.wikipedia.org/wiki/Outlier. There are a 
>>>> number of mathematical solutions to this problem, but not sure 
>>>> which would be applicable or correct. Check this link for a 
>>>> discussions on the topic where one approach is using standard 
>>>> deviation, but from the discussion it does not sound like a 
>>>> statistical correct approach.
>>>>
>>>> If you or anyone else on this list find an good approach, I more 
>>>> then happy to try it. In Bischeck its possible to plug in your own 
>>>> functions as described in 
>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-6.2 
>>>> so you can easily do your own testing. Using the cache browser cli 
>>>> http://www.bischeck.org/wp-content/uploads/2014/06/Bischeck_installation_and_administration_guide.html#toc-Section-4.4 
>>>> you can easily test your function.
>>>>
>>>> Anders
>>>>
>>>>
>>>> On 12/17/2014 03:40 PM, Rahul Amaram wrote:
>>>>> Hi,
>>>>>
>>>>> I had a quick question. Let us say we calculate the threshold 
>>>>> based on the values of the past six days, one value per day. Now 
>>>>> let us say, out of 6 values, one of these values is way too 
>>>>> deviant. Then is it possible to exclude this deviant value from 
>>>>> calculating the threshold?
>>>>>
>>>>> Thanks,
>>>>> Rahul.
>>>>
>>>>
>>>
>>
>>
>> -- 
>>
>> Ingby<http://www.ingby.com>
>>
>> IngbyForge<http://gforge.ingby.com>
>>
>> bischeck - dynamic and adaptive thresholds for Nagios<http://www.bischeck.org>
>>
>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>
>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>
>> Ingenjörsbyn
>> Box 531
>> 101 30 Stockholm
>> Sweden
>> www.ingby.com  <http://www.ingby.com/>
>> Mobil: +46 70 575 35 46
>> Tele: +46 75 75 75 090
>> Fax:  +46 75 75 75 091
>>
>
> -- 
>
>
> Ingby<http://www.ingby.com>
>
> IngbyForge<http://gforge.ingby.com>
>
> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com  <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax:  +46 75 75 75 091


-- 
 
<http://web.vizury.com/website/in/2015/03/12/vizury-shortlisted-for-performance-marketing-awards-2015-for-driving-up-revenue-for-etihad-airways-through-data-fuelled-display-marketing/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20150420/0dc0e54c/attachment.html>


More information about the Bischeck-users mailing list