Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Outliers and Compact Support Distributions: Debunking the Myth of Heavy Tails Connection, Lecture notes of Mathematical Statistics

University of the Southwest Mathematical Statistics

The relationship between outliers and heavy-tailed distributions, challenging the common belief that their presence is interconnected. The authors argue that compact supported distributions are more likely to produce outliers than those with non-compact support. They also discuss various examples and distributions with high outlier probabilities.

What you will learn

Why is the presence of outliers more likely for compact supported distributions?
What are some practical examples of distributions with a high probability of outliers?
What is the definition of outliers according to the authors?

How does the presence of outliers affect robust estimation?
What are some alternative definitions of outliers proposed in the literature?

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

karthur 🇺🇸

4.8

(8)

230 documents

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Outliers and The Ostensibly Heavy Tails

Lev B. Klebanovaand Irina Volchenkovab

Abstract

The aim of the paper is to show that the presence of one possible

type of outliers is not connected to that of heavy tails of the distribu-

tion. In contrary, typical situation for outliers appearance is the case

of compact supported distributions.

Key words: outliers; heavy tailed distributions; compact support

1 Introduction and preliminary considerations

In this paper we revise the concept of the outliers, or, more precisely, out-

liers of the first type (in terminology of [12]). We found the contemporary

notion rather vague, which motivates us to carefully dispute its meaning and

connection with distribution tail behavior. Let us start by closely looking

at the outliers definition. Usually, it is similar to that given in popular IN-

TERNET encyclopedia Wikipedia: “In statistics, an outlier is an observation

point that is distant from other observations. An outlier may be due to vari-

ability in the measurement or it may indicate experimental error; the latter

are sometimes excluded from the data.” Obviously, the definition is given

neither in mathematically nor statistically correct way. In particular, we

found the description of ”the point being distant” from other observations

rather confusing. A little bit better seems to be a definition given on NISTA

site:“An outlier is an observation that lies in abnormal distance from other

values in a random sample from a population. In a sense, this definition

leaves it up to the analyst (or a consensus process) to decide what will be

aDepartment of Probability and Mathematical Statistics, Charles University, Prague,

Czech Republic, e-mail: lev.klebanov@mff.cuni.cz

bCzech Technical University in Prague, Prague, Czech Republic

1

arXiv:1807.08715v1 [math.ST] 23 Jul 2018

Partial preview of the text

Download Outliers and Compact Support Distributions: Debunking the Myth of Heavy Tails Connection and more Lecture notes Mathematical Statistics in PDF only on Docsity!

Outliers and The Ostensibly Heavy Tails

Lev B. Klebanovaand Irina Volchenkovab

Abstract The aim of the paper is to show that the presence of one possible type of outliers is not connected to that of heavy tails of the distribu- tion. In contrary, typical situation for outliers appearance is the case of compact supported distributions.

Key words: outliers; heavy tailed distributions; compact support

1 Introduction and preliminary considerations

In this paper we revise the concept of the outliers, or, more precisely, out- liers of the first type (in terminology of [12]). We found the contemporary notion rather vague, which motivates us to carefully dispute its meaning and connection with distribution tail behavior. Let us start by closely looking at the outliers definition. Usually, it is similar to that given in popular IN- TERNET encyclopedia Wikipedia: “In statistics, an outlier is an observation point that is distant from other observations. An outlier may be due to vari- ability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data.” Obviously, the definition is given neither in mathematically nor statistically correct way. In particular, we found the description of ”the point being distant” from other observations rather confusing. A little bit better seems to be a definition given on NISTA site:“An outlier is an observation that lies in abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be aDepartment of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic, e-mail: lev.klebanov@mff.cuni.cz bCzech Technical University in Prague, Prague, Czech Republic

arXiv:1807.08715v1 [math.ST] 23 Jul 2018

considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations.” However, it has similar drawbacks. In our opinion, it is essential to specify some measurement unit of the considered distance and mainly the definition of the corresponding considered distance. Therefore we wish to conclude that the term outlier in such a setup is highly depended on the choice of topology and geometry of the space in which we consider our experiment. In the same manner, we found the term ”experimental error” equally misleading. Say, outlier is an observation which is not connected to the particular experiment, and so this observation will not appear in the next experiment. However, the statis- tics is devoted to repeating the experiments, and such observations will be automatically excluded from further experiments and study. Now consider the possibility that such ”distant” observations remain appearing in the repetitions of our experimental study. In that case, we need to keep the observations attributed to the experiment. Therefore, it is misleading to label the observations as ”errors”. For example, the triggering event of occurrence of such observations can be caused by the design of the particular experiment, i.e. the way how the experiment is designed does not capture the nature of corresponding applied problem. As a result, some ob- servations may appear as a natural phenomena seamlessly to the considered problem. However, there are no mathematical or statistical tools to recognize such a situation and so we are left with concluding that: such observations are in contradiction with mathematical model chosen to describe the practical model under study. Of course, if some observations are in contradiction with one model, they may be in a good agreement with another model. And so we conclude that the notion of outliers is a model sensitive, i.e. the outlier needs to be associated with the concrete mathematical or statistical model. Based on our initial discussion, let us give the following definition.

Definition 1.1. Consider a mathematical model of some real phenomena experiment. We say that an observation is the outlier for this particular model if it is ”in contradiction” with the model, i.e. it is either impossible to have such an observation under the assumption that the model holds, or the probability to obtain such observation for the case of true model is extremely low. If the probability is very small yet non-zero, we denote the probability as β, we will call relevant observation the β-outlier.

Definition 1.1 gives precise sense to the second part of the Wikipedia

distributed (i.i.d.) random variables. Denote by

x¯n =

n

∑^ n

j=

Xj , s^2 n =

n

∑^ n

j=

(Xj − x¯)^2

their empirical mean and empirical variance correspondingly. Let k > 0 be a fixed number. Namely, let us estimate the following probability

pn = IP{|X − x¯n|/sn > k}, (2.1)

Definition 2.1. We say that the distribution of X produces outliers of the first kind if the probability (2.1) is high (say, higher than for normal distri- bution).

Really, if one has a model based on Gaussian distribution then the pres- ence of many observations with pn greater that for normal case contradicts to the model, and the observations appears to be outliers in the sense of our Definition 1.1. Such approach was used in financial mathematics to show the Gaussian distribution provides bad model for corresponding data (see, for example, [3, 1]). The observations Xj for which the inequality |Xj − x¯n|/sn > k holds appear to be outliers for Gaussian model. In some financial models the presence of them was considered as an argument for the existence of heavy tails for real distributions. Unfortunately, this is not so (see [8, 9, 11]).

Theorem 2.1. (see [11]) Suppose that X 1 , X 2 ,... , Xn is a sequence of i.i.d. r.v.s belonging to a domain of attraction of strictly stable random variable with index of stability α ∈ (0, 2). Then

lim n→∞ pn = 0. (2.2)

Proof. Since Xj , j = 1,... , n belong to the domain of attraction of strictly stable random variable with index α < 2, it is also true that X 12 ,... , X n^2 belong to the domain of attraction of one-sided stable distribution with index α/2.

Consider at first the case 1 < α < 2. In this case, ¯xn −→ n→∞ a = IEX 1

and sn −→ n→∞

∞. We have

IP{|X 1 − x¯n| > ksn} = IP{X 1 > ksn + ¯xn} + IP{X 1 < −ksn + ¯xn} =

= IP{X 1 > ksn + a + o(1)} + IP{X 1 < −ksn + a + o(1)} −→ n→∞

Suppose now that 0 < α < 1. In this case, we have ¯xn ∼ n^1 /α−^1 Y as n → ∞. Here Y is α-stable random variable, and the sign ∼ is used for asymptotic equivalence. Similarly,

s^2 n =

n

∑^ n

j=

X j^2 − ¯x^2 n ∼ n^2 /α−^1 Z(1 + o(1)),

where Z has one-sided positive stable distribution with index α/2. We have

IP{|X 1 − x¯n| > ksn} = IP{(X 1 − x¯n)^2 > ks^2 n} =

= IP{X 12 > n^2 /α−^1 Z(1 + o(1))} −→ n→∞

In the case α = 1 we deal with Cauchy distribution. The proof for this case is very similar to that in the case 2). We omit the details.

From this Theorem it follows that (for sufficiently large n) many heavy- tailed distributions will not produce any outliers of the first kind. Moreover, now we see the the presence of outliers of the first kind is in contradiction with many models having heavy tailed distributions, particularly, with models involving stable distributions. By the way, word variability is not defined precisely, too. It shows, that high variability may denote something different than high standard deviation. Theorem 2.1 shows that for many situations distributions with infinite variance do not produce outliers for sufficiently large values of the sample size n. It means, we may restrict ourselves with the case of distributions having finite second moment. Therefore, instead of (2.1), it is better to consider corresponding characteristic of general distribution

p(κ; X) = IP{|X − IEX|/σX > κ}, (2.3)

where IEX is expectation of the random variable X and σX is its standard deviation. We shall say that the expression p(κ; X) gives the prob- ability to have outliers on the level κ. It is clear that if random variable X has finite second moment then pn from (2.1) converges to p(κ; X) defined by (2.3). To see this it is sufficient to apply the law of large numbers to the sequences Xj and (Xj − IEXj )^2 , j = 1, 2 ,.. ..

Now we see, that the distributions with compact support may have higher probability of outliers on the same level than the distributions with non- compact support. Therefore, the idea on connection of the presence of outliers with the heaviness of distributional tails is wrong. Let us now consider symmetric distribution with a compact support. The most interesting problem now is to describe the distributions having maximal possible probability of outliers. It appears that the problem had been consid- ered in the literature (see, for example, [6]). Let us formulate corresponding results below in the form suitable for us. Let X be a random variable whose distribution function has a compact support. Because the probability

p(κ; X) = IP{|X − IEX|/σX ≥ κ}

is location and scale invariant we can suppose that IEX = 0 and IP{|X| ≤ 1 } = 1.

Example 2.1. Consider random variable X taking three values: − 1 , 0 , 1 with probabilities p/ 2 , 1 − p, p/ 2 , correspondingly. It is easy to see that IEX = 0 and σX =

p. Therefore, p(κ; X) = p for any κ ∈ (1, 1 /

p].

Particularly, for κ = 1/

p we obtain the greater value of outlier which may appears with probability p. Opposite, if κ > 1 is fixed then we can define p = 1/κ^2 and the random variable from Example 2.1 will posses outliers of level κ with probability p. From Selberg inequality (see, for example, [6]) it follows that random variable with outliers of the level ≥ κ appearing with the probability ≥ 1 /κ^2 coincides with that given in Example 2.1 up to location and scale parameters. In other words, the random variable with corresponding outlier properties is essentially unique. However, it seems that random variables similar to that from Example 2.1 do appear in applied very rarely. Are there more practical examples with “large enough” number of outliers? Corresponding example is based on Gauss inequality (see [6]).

Example 2.2. Consider random variable X which takes zero value with probability 1 − p (p ∈ (0, 1)), and coinsides with uniformly distributed over [√− 1 , 1] random variable U with probability p. It is clear that IEX = 0, σX = p/ 3. Therefore,

IP{|X| ≥ κσX } = p

1 − κ

p/ 3

Let us maximize this expression with respect to p ∈ [0, 1] assuming that κ > 2 /

We obtain max p∈[0,1]

(IP{|X| ≥ κσX } =

9 κ^2

which attains for p = 4/(3κ^2 ).

From Gauss inequality (see [6]) it follows that this is upper bound in the class of unimodal distributions with finite variance. The extremal distribu- tion constructed in Example 2.2 is unique up to location and scale parame- ters. Of course, the boundary for probability to have outliers on the level κ is smaller in 9/4 times for Example 2.2 than that for Example 2.1. However, this probability is not too small comparing, say with Gaussian distribution. The distribution from Example 2.2 looks also not too “practical” in view of presence of an “essential” mass at zero. However, it is clear that we may replace this mass by a pick of a density near origin. Of course, the probability of the presence of corresponding outliers will become smaller a little bit, but not too essential. Let us give some simulation results for such case. We simulated a sample of the volume n = 500 from a mixture of two uniform distributions. First component of the mixture was following uniform law on interval [− 0. 1 , 0 .1]. This component had weight 71/75. The second component was following uniform distribution on interval [− 1 , 1]. It had weight 4/75. The sample points are shown on the Figure 1 as blue points. Vertical lines are situated at positions − 5 σ and 5σ. We see some elements of the sample (at least 6 of them, that is 1.2%) are outside the interval [− 5 σ, 5 σ].

0.5 0.

Figure 1: A sample of 500 points from a mixture of two uniform distributions. Vertical lines are situated at positions − 5 σ and 5 σ.

It is clear, that we observe some outliers for smooth enough distribution having a compact support. To understand what is “typical form” of a dis- tribution with not small probability of outliers of the level κ let us consider another simulation.

Consider one more situation. Let X be a random variable having standard Gaussian distribution, and A be a random variable with the density function

q(a; α) =

α (a + 1)α+^

where a ∈ [0, ∞) and α > 2 be a parameter. Consider the product Y = X/A. Its cumulative distribution function has form

G(x; α) =

0

Φ(ax)q(a; α)da.

It is clear that with growing α the tails of random variable A become more thin. This implies that the tails of Y become more heavy. Computer calcu- lations show that the probability of the event {Y ≥ κσ(α)} for κ = 5 has the form (as a function of α)

Out[ ]=

4 6 8 10

Figure 4: Probability of the event {Y ≥ 5 σ} as a function of α ∈ (2, 10] for the scale mixture of Gaussian distribution by means of random variable A having Pareto distribution.

From the Figure 4 we see that the probability of outliers on the level κ = 5 decreases with increasing α, that is while the tails become more heavy. Now we can resume that the opinion on the direct dependence between probability of outliers and heaviness of the tails is wrong and, probably, has to be changed by opposite, that is on inverse dependence.

3 Ostensible heavy tails

Let us try to understand what structure of distribution leads to high proba- bility of outliers and why the general opinion consists in direct dependence between this probability and heaviness of the tails. Unfortunately, we can- not provide mathematically precise results in this connection, but will give intuitive clear explanation basing on the results obtained above. The examples given above lead to the following. The distribution with high probability of outliers has the properties:

High pike near mean value.
Thin or truncated “far” tail^1.
The part of distributions body outside of the pike changes not too es- sentially with the remotion from the pike on a distance, but turns into thin tail after that. In this situation, the main part of observations on such population will concentrate around pike. In view of this the empirical variance will not be large. According to the part outside of pike some observations will belong to this region, that is “rather far” from the pike and, therefore, “far” from mean value. This part of observations represents outliers. Obviously, thin or truncated tail plays no role, and we have rather high relative number of outliers. In contrary, if the distribution has heavy tails (so that the variance is infinite) the observations belonging to “far” tail make the variance large. It follows from the proof of Theorem 2.1 this grows is such quick that the probability of outliers on any positive level tends to zero as the number of observations tends to infinity. Such behavior is impossible for any finite distribution if the level of outliers is less than the distance from the border of support divided by the standard deviation of the distribution. So, intuitively, the presence of heavy tails is in contradiction with the presence of large level outliers. In the situation when the general distribution possesses the properties
− 3. we say that is has ostensible heavy tails. But why the general opinion consists in direct dependence between this probability and heaviness of the tails? There are at least three reasons for that. (^1) that is the part of distribution adjoing to infinity

Let us consider the use of robust estimators from formally mathematical point of view. Let us discuss the problem on robust estimation of location parameter, or, more precisely, of symmetry center. Formally, the quality of robust estimators is guaranteed only asymptotically, that is for large sample size n. However, the presence of outliers asymptotically means that the tails are not heavy (see Theorem 2.1) that the variance is finite. Therefore, the law of large numbers is valued and the sample mean is asymptotically consistent. This means we do not need any robust estimator. Of course, the statement above holds asymptotically only. And it is not clear how big should be n. However, it is not clear (in precise terms) for which n robust estimator is good enough. Clearly, Example 2.1 shows that the outliers may essentially change the quality of sample mean for the case of not large sample size. This means, we need to have some estimators with nice non-asymptotical properties. Our point of view is that there is needed a theory of robust non-asymptotically oriented estimators. Some initial aspects of such theory are given in [7]. However, some more study is needed. We are planning to consider corresponding result in another paper. Let us note that classical approach to robust estimators is intuitively oriented on other definition of outliers. Probably, the definition given in [12] and [13] more corresponds to classical theory of robust estimation.

Acknowledgment

The work was partially supported by Grant GA CR 16-03708S.ˇ

References

[1] Szymon Borak, Adam Misiorek, Rafal Weron (2010). Models for Heavy-tailed Asset Returns, SFB 649 Discussion Paper 2010-049, http://sfb649.wiwi.hu-berlin.de ISSN 1860-5664 SFB 649, Humboldt- Universit¨at zu Berlin, Spandauer Straße 1, D-10178 Berlin, 1–40.

[2] H.A. David, H.N. Nagaraja (2003). Order Statistics, John Wiley & Sons.

[3] Erns Eberlein and Ulrich Keller (1995). Hyperbolic Distributions in Finance, Institut f¨ur Mathematische Stochastik, Universit¨at Freiburg, 1–24.

[4] Frank R. Hampel, Elvezio M, Ronchetti, Peter J. Rousseeuw and Werner A. Stahel (1986). Robust Statistics, The Approach Based on Influ- ence Functions, John Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore.

[5] D.M. Hawkins (1980). Identification of outliers. SPRINGER- SCIENCE+BUSINESS MEDIA, B.V.

[6] Samuel Karlin and William Studden (1966). Tchebycheff Systems: With Applications in Analysis and Statistics, Interscience Publishers.

[7] Lev B. Klebanov, Svetlozar T. Rachev, Frank J. Fabozzi (2009) Robust and Non-Robust Models in Statistics, Nova Science Publishers, Inc., New York.

[8] Lev B. Klebanov, Irina Volchenkova (2015). Heavy Tailed Distributions in Finance: Reality or Myth? Amateurs Viewpoint. arXiv 1507.07735v1, 1–17.

[9] Lev B. Klebanov (2016). No Stable Distributions in Finance, please! arXiv 1601.00566v2, 1–9.

[10] Lev B. Klebanov (2016). Big Outliers Versus Heavy Tails: what to use? arXiv 1611.05410v1, 1–14.

[11] Lev B Klebanov, Gregory Temnov, Ashot V. Kakosyan (2016). Some Contra-Arguments for the Use of Stable Distributions in Financial Mod- eling, arXiv 1602.00256v1, 1–9.

[12] Lev B. Klebanov, Jaromir Antoch, Andrea Karlova and Ashot V. Kakosyan (2017) Outliers and related problems, arXiv 1701.06642v1, 1–16.

[13] Lev B. Klebanov, Ashot V. Kakosyan, and Andrea Karlova (2016). Out- liers, the Law of Large Numbers, Index of Stability and Heavy Tails, arXiv 1612.09265v1, 1–5.