Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Measuring Central Tendency & Dispersion: Mean, Median, Mode, Percentiles, Quartiles, Range, Study notes of Statistics

City College of San Francisco (CCSF)Statistics

Prof. Asatar P. Bair

Various measures of central tendency and dispersion, including the sample mean, median, mode, percentiles, quartiles, range, interquartile range, population variance, sample variance, standard deviation, chebyshev's theorem, and the empirical rule. Central tendency refers to identifying the most common value or values in a dataset, while dispersion measures the spread or variability of the data. Examples and formulas for calculating each measure.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-myt 🇺🇸

10 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

Econ 5

Introduction to Statistics

Asat ar Bair, Ph.D.

Departmen t of Economics

City College o f San Francisc o

aba ir@ccsf.edu

Lectures on Chapter 3

Measures of Location

•Mean

•Median

•Mode

•Percentiles

•Quartile s

Mean

The arithmetic mean is one measure of

location. It is a measure of central

tendency or an average. It is some times

referred to as “the average” or “the

mean” – although there are other

means and averages.

Central tendency

Central tendency is an important concept;

we want to know if any particular values

are more commonly observed, if the data is

clustered in a certain range

Partial preview of the text

Download Measuring Central Tendency & Dispersion: Mean, Median, Mode, Percentiles, Quartiles, Range and more Study notes Statistics in PDF only on Docsity!

Econ 5

Introduction to Statistics

Asatar Bair, Ph.D.

Department of Economics

City College of San Francisco

abair@ccsf.edu

Lectures on Chapter 3

Measures of Location

Mean
Median
Mode
Percentiles
Quartiles

Mean

The arithmetic mean is one measure of

location. It is a measure of central

tendency or an average. It is sometimes

referred to as “the average” or “the

mean” – although there are other

means and averages.

Central tendency

Central tendency is an important concept;

we want to know if any particular values

are more commonly observed, if the data is

clustered in a certain range

Sample Mean

If the data are from a sample,

the mean is denoted by:

x =

i

i = 1

n

Summation sign

it means add up the values

i

= x

i = 1

Population Mean

If the data are from a population,

the mean is denoted by:

μ =

i

i = 1

N

Home Prices

The following is a sample of the prices of 20

homes in San Francisco. The data are in

ascending order, in thousands of dollars.

420 455 465 472 512

514 554 575 580 600

625 630 670 670 810

1 , 250 1 , 480 2, 700 3 , 400 4 ,

Distribution of wealth is skewed

$0 tril

$5 tril

$10 tril

$15 tril

$20 tril

2004

Bottom 90% Top 1%

http://www.federalreserve.gov/pubs/feds/ 2006 / 200613 /200613pap.pdf

Average wealth, bottom 50% = $22, 300

Avg. wealth, 50-90% = $313,

Average wealth, top 1% = $15 mil

http://www.federalreserve.gov/pubs/feds/ 2006 / 200613 /200613pap.pdf

Distribution of wealth is skewed

Bi-polar distributions

Say there’s a land populated exclusively by

gnomes (height: 1. 5 - 2 .5 ft.) and giants (height:

9-11 ft.)

the mean (6 ft.) does not accurately describe the

central tendencies of the population data

Gnome Giant

Rolling a six-sided die;

possible outcomes: 1, 2, 3, 4, 5, 6

mean = 3. 5

does this mean you will roll a 3. 5?

not a central tendency in the sense that you are

more likely to observe the mean value.

Mean with high

variance

Average does not mean “normal”

Most cultures propagate strong pressures to

conform to prevailing standards of behavior,

appearance, etc.

Standards change, leading to a constant

search for what is normal;

“Most Americans have an above-average

number of legs”;

number of Americans with 3+ legs = 0

number of Americans with 1 or 0 legs > 0

this means the mean number of legs < 2

I’

m a

b o v e

a v e r a g e !

Median

The median of a data set is the value in

the middle when the data items are

arranged in ascending order.

If there is an odd number of items, the

median is the value of the middle item.

If there is an even number of items, the

median is the midpoint of the values for

the middle two items.

Mode and central tendency

The mode is rarely a good measure of

central tendency;

it fails when a data set is large and there is

either no mode or there are many repeated

values, which may make the mode less

meaningful.

Percentiles

The p

percentile of a data set is a

value such that at least p percent of the

items take on this value or less and at

least (100- p ) percent of the items take

on this value or more.

Percentiles

To find the p

percentile of a data set:

Arrange the data in ascending order.
Compute index i , the position of the p

percentile.

i = ( p/ 100 ) n

Percentiles

If i is not an integer, the p

percentile is the value of the next

integer, e.g. i = 3.2 4

If i is an integer, the p

percentile is the midpoint of the

values in positions i and ( i+ 1).

Example: Home prices

90 th Percentile:

i = ( p / 100 ) n = ( 90 / 100 )20 = 18

The midpoint of the 18th and 19th data

values:

420

455

465

472

512

514

554

575

580

600

625

630

670

810

1 , 250

1 , 480

2, 700

3 , 400

4 ,

Example: Home prices

78 th Percentile:

i = ( p / 100 ) n = (78/ 100 )20 = 15. 6

Round up to 16, (even if it were 15.1 we’d

round up), so the 78th percentile would be

the 16th data value

420

455

465

472

512

514

554

575

580

600

625

630

670

810

1 , 250

1 , 480

2, 700

3 , 400

4 ,

78 th percentile

Quartiles

Quartiles are specific percentiles
- (^) First Quartile = 25th Percentile
- (^) Second Quartile = 50th Percentile = Median
- (^) Third Quartile = 75th Percentile

420

455

465

472

512

514

554

575

580

600

625

630

670

810

1 , 250

1 , 480

2, 700

3 , 400

4 ,

512 + 514

= 513

25th

percentile

600 + 625

= 612. 5

50 th

percentile

810 + 1250

= 1030

75th

percentile

Measuring variability

The concept of variability is an important one

is statistics and probability, for it’s one of the

foundations of the concept of risk and

uncertainty;

the minimum variability would be a set of

numbers that does not change at all --

essentially the same number repeated;

when the data do vary, we want to know:

how much?

because we’re not accustomed to thinking

about variability, the number can be hard to

interpret.

Range

The simplest measure of variability:

Range = Largest value - Smallest value

very sensitive to extreme highs and lows

Range of home prices = 4500 - 420 = 4, 080

Interquartile range

This solves the problem of high and low

extreme values, by considering the

difference bet ween the third and first

quartiles:

IQR = Q3 - Q 1

IQR of home prices = 1030 - 513 = 517

Population Variance

If the data are from a sample,

the variance is denoted by:

μ

( )

i = 1

Sample Variance

If the data are from a sample,

the variance is denoted by:

s

x

" x

i = 1

n " 1

Sample variance:

alternative formula

s

x

i

" nx

i = 1

n

n " 1

Standard deviation

Population standard deviation =

Sample standard deviation =

" = "

s = s

Sample Variance and Sample Standard Deviation

A common question is, why divide by

(n- 1 )? Why not divide by n?

s

x

" x

i = 1

n " 1

Using Excel

The output for the home price data looks

like this:

Coefficient of Variation

This statistic looks at the standard

deviation in relation to the mean:

Standard deviation

X 100

Mean

Distribution shape

Looking at the shape of the frequency

histogram gives us information about the

central tendency or tendencies of the data

There are many kinds of histograms:

symmetric, skewed, uniform, and bi-polar

are some examples

Symmetric Distribution

100

1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35

Skewed-right Distribution

100

1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35

Look at the “tail” of the

distribution; a longer tail on

the right side means the

distribution is skewed right.

Skewed-left Distribution

100

1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35

Uniform Distribution

100

1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35

Bi-polar Distribution

100

1 - 5 6 - 10 11 - 15 16 - 20 21 - 25 26 - 30 31 - 35

The number of data values

within z standard

deviations of the mean is at

least equal to:

for all z > 1

1 "

(

Chebyshev’s

Theorem

Chebyshev’s Theorem

For z = 2, at least 0. 75 , or 75% of the data

values are within 2 standard deviations of

the mean (above or below)

1 "

( =^1 "^

( =^

= 0.

Chebyshev’s Theorem

The incredible thing about

Chebyshev’s theorem is that it

holds for all distributions,

regardless of the shape.

Chebyshev’s Theorem

A more formal version:

Let c be any constant greater than zero.

For any distribution of X :

P ( X " μ # c ) $

c

Empirical rule

We can use a different rule for data which

seem to have the bell-shaped, or normal

distribution;

about 68% of the data are within 1

standard deviation from the mean;

about 95% are within 2;

nearly all are within 3.

Empirical rule

68%

95%

nearly 100%

Outliers

Since the over whelming majority (89--

almost 100%) of the data lie within 3

standard deviations of the mean, it’s good

to review any data points with z-scores

less than -3 or greater than 3;

such points may be outliers;

they could be errors, which should be

removed;

they could be evidence of something unusual

in the data.

Five number summary

A quick way of summarizing the data is to

consider the following five numbers:

Smallest value 420
First quartile (Q 1 ) 513
Second quartile (Q 2 ) 612. 5
Third quartile (Q 3 ) 1030
Largest value 4500

Correlation Coefficient

This measure solves the problem of units

that plagues the covariance:

Sample Population

Correlation Coefficient

One benefit of the correlation coefficient is that

it also tells us about the strength of the linear

relationship bet ween x and y;

Correlation coefficient Relationship bet ween x and y

0 None

1 perfect positive

1 perfect negative

positive, close to zero weak positive

negative, close to zero weak negative

Positive linear relationship

Covariance, Correlation coefficient positive

Negative linear relationship

Covariance, Correlation coefficient negative

No linear relationship

Covariance, Correlation coefficient zero

A perfect positive linear relationship

Correlation coefficient = 1

Weighted mean

Sometimes the arithmetic mean

does not give us an accurate

measure of the central tendency

of a data set, because certain

values occur with much greater

frequency than others;

for example, with this data, the

arithmetic mean cost would be

$3, but the over whelming

majority of the time, the cost is

Cost Purchases

$1 50

$2 120

$3 120

$4 110

$5 1500

Weighted mean

To overcome this problem, we use the

weighted mean:

Measuring Central Tendency & Dispersion: Mean, Median, Mode, Percentiles, Quartiles, Range, Study notes of Statistics

Related documents

Partial preview of the text

Download Measuring Central Tendency & Dispersion: Mean, Median, Mode, Percentiles, Quartiles, Range and more Study notes Statistics in PDF only on Docsity!

Econ 5

Introduction to Statistics

Measures of Location

Mean

The arithmetic mean is one measure of

location. It is a measure of central

tendency or an average. It is sometimes

referred to as “the average” or “the

mean” – although there are other

means and averages.

Sample Mean

the mean is denoted by:

i

i = 1

n

i

i = 1

the mean is denoted by:

i

i = 1

N

Home Prices

Average does not mean “normal”

I’

the middle when the data items are

arranged in ascending order.

median is the value of the middle item.

median is the midpoint of the values for

the middle two items.

Percentiles

percentile of a data set is a

value such that at least p percent of the

items take on this value or less and at

least (100- p ) percent of the items take

on this value or more.

Percentiles

percentile of a data set:

percentile.

i = ( p/ 100 ) n

Percentiles

Example: Home prices

90 th Percentile:

i = ( p / 100 ) n = ( 90 / 100 )20 = 18

Example: Home prices

78 th Percentile:

i = ( p / 100 ) n = (78/ 100 )20 = 15. 6

Quartiles

IQR = Q3 - Q 1

Population Variance

the variance is denoted by:

μ

Sample Variance

the variance is denoted by:

s

x

" x

n " 1

Sample variance:

alternative formula

s

x

i

" nx

i = 1

n

n " 1

Standard deviation

Sample Variance and Sample Standard Deviation

(n- 1 )? Why not divide by n?

s

x

" x

n " 1

X 100

Let c be any constant greater than zero.

For any distribution of X :