









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The relationship between outliers and heavy-tailed distributions, challenging the common belief that their presence is interconnected. The authors argue that compact supported distributions are more likely to produce outliers than those with non-compact support. They also discuss various examples and distributions with high outlier probabilities.
What you will learn
Typology: Lecture notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!
Abstract The aim of the paper is to show that the presence of one possible type of outliers is not connected to that of heavy tails of the distribu- tion. In contrary, typical situation for outliers appearance is the case of compact supported distributions.
Key words: outliers; heavy tailed distributions; compact support
In this paper we revise the concept of the outliers, or, more precisely, out- liers of the first type (in terminology of [12]). We found the contemporary notion rather vague, which motivates us to carefully dispute its meaning and connection with distribution tail behavior. Let us start by closely looking at the outliers definition. Usually, it is similar to that given in popular IN- TERNET encyclopedia Wikipedia: “In statistics, an outlier is an observation point that is distant from other observations. An outlier may be due to vari- ability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data.” Obviously, the definition is given neither in mathematically nor statistically correct way. In particular, we found the description of ”the point being distant” from other observations rather confusing. A little bit better seems to be a definition given on NISTA site:“An outlier is an observation that lies in abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be aDepartment of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic, e-mail: lev.klebanov@mff.cuni.cz bCzech Technical University in Prague, Prague, Czech Republic
considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations.” However, it has similar drawbacks. In our opinion, it is essential to specify some measurement unit of the considered distance and mainly the definition of the corresponding considered distance. Therefore we wish to conclude that the term outlier in such a setup is highly depended on the choice of topology and geometry of the space in which we consider our experiment. In the same manner, we found the term ”experimental error” equally misleading. Say, outlier is an observation which is not connected to the particular experiment, and so this observation will not appear in the next experiment. However, the statis- tics is devoted to repeating the experiments, and such observations will be automatically excluded from further experiments and study. Now consider the possibility that such ”distant” observations remain appearing in the repetitions of our experimental study. In that case, we need to keep the observations attributed to the experiment. Therefore, it is misleading to label the observations as ”errors”. For example, the triggering event of occurrence of such observations can be caused by the design of the particular experiment, i.e. the way how the experiment is designed does not capture the nature of corresponding applied problem. As a result, some ob- servations may appear as a natural phenomena seamlessly to the considered problem. However, there are no mathematical or statistical tools to recognize such a situation and so we are left with concluding that: such observations are in contradiction with mathematical model chosen to describe the practical model under study. Of course, if some observations are in contradiction with one model, they may be in a good agreement with another model. And so we conclude that the notion of outliers is a model sensitive, i.e. the outlier needs to be associated with the concrete mathematical or statistical model. Based on our initial discussion, let us give the following definition.
Definition 1.1. Consider a mathematical model of some real phenomena experiment. We say that an observation is the outlier for this particular model if it is ”in contradiction” with the model, i.e. it is either impossible to have such an observation under the assumption that the model holds, or the probability to obtain such observation for the case of true model is extremely low. If the probability is very small yet non-zero, we denote the probability as β, we will call relevant observation the β-outlier.
Definition 1.1 gives precise sense to the second part of the Wikipedia
distributed (i.i.d.) random variables. Denote by
x¯n =
n
∑^ n
j=
Xj , s^2 n =
n
∑^ n
j=
(Xj − x¯)^2
their empirical mean and empirical variance correspondingly. Let k > 0 be a fixed number. Namely, let us estimate the following probability
pn = IP{|X − x¯n|/sn > k}, (2.1)
Definition 2.1. We say that the distribution of X produces outliers of the first kind if the probability (2.1) is high (say, higher than for normal distri- bution).
Really, if one has a model based on Gaussian distribution then the pres- ence of many observations with pn greater that for normal case contradicts to the model, and the observations appears to be outliers in the sense of our Definition 1.1. Such approach was used in financial mathematics to show the Gaussian distribution provides bad model for corresponding data (see, for example, [3, 1]). The observations Xj for which the inequality |Xj − x¯n|/sn > k holds appear to be outliers for Gaussian model. In some financial models the presence of them was considered as an argument for the existence of heavy tails for real distributions. Unfortunately, this is not so (see [8, 9, 11]).
Theorem 2.1. (see [11]) Suppose that X 1 , X 2 ,... , Xn is a sequence of i.i.d. r.v.s belonging to a domain of attraction of strictly stable random variable with index of stability α ∈ (0, 2). Then
lim n→∞ pn = 0. (2.2)
Proof. Since Xj , j = 1,... , n belong to the domain of attraction of strictly stable random variable with index α < 2, it is also true that X 12 ,... , X n^2 belong to the domain of attraction of one-sided stable distribution with index α/2.
and sn −→ n→∞
∞. We have
IP{|X 1 − x¯n| > ksn} = IP{X 1 > ksn + ¯xn} + IP{X 1 < −ksn + ¯xn} =
= IP{X 1 > ksn + a + o(1)} + IP{X 1 < −ksn + a + o(1)} −→ n→∞
s^2 n =
n
∑^ n
j=
X j^2 − ¯x^2 n ∼ n^2 /α−^1 Z(1 + o(1)),
where Z has one-sided positive stable distribution with index α/2. We have
IP{|X 1 − x¯n| > ksn} = IP{(X 1 − x¯n)^2 > ks^2 n} =
= IP{X 12 > n^2 /α−^1 Z(1 + o(1))} −→ n→∞
From this Theorem it follows that (for sufficiently large n) many heavy- tailed distributions will not produce any outliers of the first kind. Moreover, now we see the the presence of outliers of the first kind is in contradiction with many models having heavy tailed distributions, particularly, with models involving stable distributions. By the way, word variability is not defined precisely, too. It shows, that high variability may denote something different than high standard deviation. Theorem 2.1 shows that for many situations distributions with infinite variance do not produce outliers for sufficiently large values of the sample size n. It means, we may restrict ourselves with the case of distributions having finite second moment. Therefore, instead of (2.1), it is better to consider corresponding characteristic of general distribution
p(κ; X) = IP{|X − IEX|/σX > κ}, (2.3)
where IEX is expectation of the random variable X and σX is its standard deviation. We shall say that the expression p(κ; X) gives the prob- ability to have outliers on the level κ. It is clear that if random variable X has finite second moment then pn from (2.1) converges to p(κ; X) defined by (2.3). To see this it is sufficient to apply the law of large numbers to the sequences Xj and (Xj − IEXj )^2 , j = 1, 2 ,.. ..
Now we see, that the distributions with compact support may have higher probability of outliers on the same level than the distributions with non- compact support. Therefore, the idea on connection of the presence of outliers with the heaviness of distributional tails is wrong. Let us now consider symmetric distribution with a compact support. The most interesting problem now is to describe the distributions having maximal possible probability of outliers. It appears that the problem had been consid- ered in the literature (see, for example, [6]). Let us formulate corresponding results below in the form suitable for us. Let X be a random variable whose distribution function has a compact support. Because the probability
p(κ; X) = IP{|X − IEX|/σX ≥ κ}
is location and scale invariant we can suppose that IEX = 0 and IP{|X| ≤ 1 } = 1.
Example 2.1. Consider random variable X taking three values: − 1 , 0 , 1 with probabilities p/ 2 , 1 − p, p/ 2 , correspondingly. It is easy to see that IEX = 0 and σX =
p. Therefore, p(κ; X) = p for any κ ∈ (1, 1 /
p].
Particularly, for κ = 1/
p we obtain the greater value of outlier which may appears with probability p. Opposite, if κ > 1 is fixed then we can define p = 1/κ^2 and the random variable from Example 2.1 will posses outliers of level κ with probability p. From Selberg inequality (see, for example, [6]) it follows that random variable with outliers of the level ≥ κ appearing with the probability ≥ 1 /κ^2 coincides with that given in Example 2.1 up to location and scale parameters. In other words, the random variable with corresponding outlier properties is essentially unique. However, it seems that random variables similar to that from Example 2.1 do appear in applied very rarely. Are there more practical examples with “large enough” number of outliers? Corresponding example is based on Gauss inequality (see [6]).
Example 2.2. Consider random variable X which takes zero value with probability 1 − p (p ∈ (0, 1)), and coinsides with uniformly distributed over [√− 1 , 1] random variable U with probability p. It is clear that IEX = 0, σX = p/ 3. Therefore,
IP{|X| ≥ κσX } = p
1 − κ
p/ 3
Let us maximize this expression with respect to p ∈ [0, 1] assuming that κ > 2 /
(IP{|X| ≥ κσX } =
9 κ^2
which attains for p = 4/(3κ^2 ).
From Gauss inequality (see [6]) it follows that this is upper bound in the class of unimodal distributions with finite variance. The extremal distribu- tion constructed in Example 2.2 is unique up to location and scale parame- ters. Of course, the boundary for probability to have outliers on the level κ is smaller in 9/4 times for Example 2.2 than that for Example 2.1. However, this probability is not too small comparing, say with Gaussian distribution. The distribution from Example 2.2 looks also not too “practical” in view of presence of an “essential” mass at zero. However, it is clear that we may replace this mass by a pick of a density near origin. Of course, the probability of the presence of corresponding outliers will become smaller a little bit, but not too essential. Let us give some simulation results for such case. We simulated a sample of the volume n = 500 from a mixture of two uniform distributions. First component of the mixture was following uniform law on interval [− 0. 1 , 0 .1]. This component had weight 71/75. The second component was following uniform distribution on interval [− 1 , 1]. It had weight 4/75. The sample points are shown on the Figure 1 as blue points. Vertical lines are situated at positions − 5 σ and 5σ. We see some elements of the sample (at least 6 of them, that is 1.2%) are outside the interval [− 5 σ, 5 σ].
Figure 1: A sample of 500 points from a mixture of two uniform distributions. Vertical lines are situated at positions − 5 σ and 5 σ.
It is clear, that we observe some outliers for smooth enough distribution having a compact support. To understand what is “typical form” of a dis- tribution with not small probability of outliers of the level κ let us consider another simulation.
Consider one more situation. Let X be a random variable having standard Gaussian distribution, and A be a random variable with the density function
q(a; α) =
α (a + 1)α+^
where a ∈ [0, ∞) and α > 2 be a parameter. Consider the product Y = X/A. Its cumulative distribution function has form
G(x; α) =
0
Φ(ax)q(a; α)da.
It is clear that with growing α the tails of random variable A become more thin. This implies that the tails of Y become more heavy. Computer calcu- lations show that the probability of the event {Y ≥ κσ(α)} for κ = 5 has the form (as a function of α)
Out[ ]=
4 6 8 10
Figure 4: Probability of the event {Y ≥ 5 σ} as a function of α ∈ (2, 10] for the scale mixture of Gaussian distribution by means of random variable A having Pareto distribution.
From the Figure 4 we see that the probability of outliers on the level κ = 5 decreases with increasing α, that is while the tails become more heavy. Now we can resume that the opinion on the direct dependence between probability of outliers and heaviness of the tails is wrong and, probably, has to be changed by opposite, that is on inverse dependence.
3 Ostensible heavy tails
Let us try to understand what structure of distribution leads to high proba- bility of outliers and why the general opinion consists in direct dependence between this probability and heaviness of the tails. Unfortunately, we can- not provide mathematically precise results in this connection, but will give intuitive clear explanation basing on the results obtained above. The examples given above lead to the following. The distribution with high probability of outliers has the properties:
Let us consider the use of robust estimators from formally mathematical point of view. Let us discuss the problem on robust estimation of location parameter, or, more precisely, of symmetry center. Formally, the quality of robust estimators is guaranteed only asymptotically, that is for large sample size n. However, the presence of outliers asymptotically means that the tails are not heavy (see Theorem 2.1) that the variance is finite. Therefore, the law of large numbers is valued and the sample mean is asymptotically consistent. This means we do not need any robust estimator. Of course, the statement above holds asymptotically only. And it is not clear how big should be n. However, it is not clear (in precise terms) for which n robust estimator is good enough. Clearly, Example 2.1 shows that the outliers may essentially change the quality of sample mean for the case of not large sample size. This means, we need to have some estimators with nice non-asymptotical properties. Our point of view is that there is needed a theory of robust non-asymptotically oriented estimators. Some initial aspects of such theory are given in [7]. However, some more study is needed. We are planning to consider corresponding result in another paper. Let us note that classical approach to robust estimators is intuitively oriented on other definition of outliers. Probably, the definition given in [12] and [13] more corresponds to classical theory of robust estimation.
Acknowledgment
The work was partially supported by Grant GA CR 16-03708S.ˇ
References
[1] Szymon Borak, Adam Misiorek, Rafal Weron (2010). Models for Heavy-tailed Asset Returns, SFB 649 Discussion Paper 2010-049, http://sfb649.wiwi.hu-berlin.de ISSN 1860-5664 SFB 649, Humboldt- Universit¨at zu Berlin, Spandauer Straße 1, D-10178 Berlin, 1–40.
[2] H.A. David, H.N. Nagaraja (2003). Order Statistics, John Wiley & Sons.
[3] Erns Eberlein and Ulrich Keller (1995). Hyperbolic Distributions in Finance, Institut f¨ur Mathematische Stochastik, Universit¨at Freiburg, 1–24.
[4] Frank R. Hampel, Elvezio M, Ronchetti, Peter J. Rousseeuw and Werner A. Stahel (1986). Robust Statistics, The Approach Based on Influ- ence Functions, John Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore.
[5] D.M. Hawkins (1980). Identification of outliers. SPRINGER- SCIENCE+BUSINESS MEDIA, B.V.
[6] Samuel Karlin and William Studden (1966). Tchebycheff Systems: With Applications in Analysis and Statistics, Interscience Publishers.
[7] Lev B. Klebanov, Svetlozar T. Rachev, Frank J. Fabozzi (2009) Robust and Non-Robust Models in Statistics, Nova Science Publishers, Inc., New York.
[8] Lev B. Klebanov, Irina Volchenkova (2015). Heavy Tailed Distributions in Finance: Reality or Myth? Amateurs Viewpoint. arXiv 1507.07735v1, 1–17.
[9] Lev B. Klebanov (2016). No Stable Distributions in Finance, please! arXiv 1601.00566v2, 1–9.
[10] Lev B. Klebanov (2016). Big Outliers Versus Heavy Tails: what to use? arXiv 1611.05410v1, 1–14.
[11] Lev B Klebanov, Gregory Temnov, Ashot V. Kakosyan (2016). Some Contra-Arguments for the Use of Stable Distributions in Financial Mod- eling, arXiv 1602.00256v1, 1–9.
[12] Lev B. Klebanov, Jaromir Antoch, Andrea Karlova and Ashot V. Kakosyan (2017) Outliers and related problems, arXiv 1701.06642v1, 1–16.
[13] Lev B. Klebanov, Ashot V. Kakosyan, and Andrea Karlova (2016). Out- liers, the Law of Large Numbers, Index of Stability and Heavy Tails, arXiv 1612.09265v1, 1–5.