Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Computing Formula for Standard Deviation - Statistical Methods | STA 100, Study notes of Data Analysis & Statistical Methods

State University of New York Polytechnic - Utica-Rome Data Analysis & Statistical Methods

Prof. William Thistleton

Material Type: Notes; Professor: Thistleton; Class: Statistical Methods; Subject: Statistics; University: SUNY Institute of Technology at Utica-Rome; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/09/2009

koofers-user-uyd 🇺🇸

10 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Prof. Thistleton

STA100 Statistical Methods

Lecture 4

Text Sections: Chapter 3 Sections 2

Computing Formula for Standard Deviation

So far we have seen how to describe a data set with just a few numbers: the mean and the median

tell you where the data are centered and provide an insight to the data set with just one number.

If you also know the standard deviation or the Inter-Quartile Range (IQR) you know how spread

out the data are. As a reminder, suppose you have the following toy data set, and assume the data

come from a sample (not the whole population):

100 112 121 95 97

You can easily calculate the sample mean, as

(1)

Note a few things:

we’ve used the symbol “x bar” since our data are drawn from a sample. If they had come

from a population we would have used the Greek letter or “mu”.

Also, since they come from a sample we have denoted the number of data points (the

sample size) as “little” or “lower case” n.

Finally, just as our text book does, I’ve used the Greek letter “upper case sigma” or

to indicate that we are adding some numbers up. What we are saying is that we have

several numbers where the first is a 100, the second is 112, etc. A compact way to write

this which clearly indicates which number is first, second, all the way to fifth is to say

. Then since these are all

individual data points from the variable write “sum of the x’s” as . Greek letter

sigma for sum.

Partial preview of the text

Download Computing Formula for Standard Deviation - Statistical Methods | STA 100 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Prof. Thistleton STA100 Statistical Methods Lecture 4

Text Sections: Chapter 3 Sections 2

Computing Formula for Standard Deviation

So far we have seen how to describe a data set with just a few numbers: the mean and the median tell you where the data are centered and provide an insight to the data set with just one number. If you also know the standard deviation or the Inter-Quartile Range (IQR) you know how spread out the data are. As a reminder, suppose you have the following toy data set, and assume the data come from a sample (not the whole population):

100 112 121 95 97

You can easily calculate the sample mean, as

(1)

Note a few things: we’ve used the symbol “x bar” since our data are drawn from a sample. If they had come from a population we would have used the Greek letter or “mu”.

Also, since they come from a sample we have denoted the number of data points (the sample size) as “little” or “lower case” n.

Finally, just as our text book does, I’ve used the Greek letter “upper case sigma” or to indicate that we are adding some numbers up. What we are saying is that we have several numbers where the first is a 100, the second is 112, etc. A compact way to write this which clearly indicates which number is first, second, all the way to fifth is to say

. Then since these are all individual data points from the variable write “sum of the x’s” as. Greek letter s igma for s um.

Some people like to draw a “dot plot” showing the data on a horizontal scale:

o o o o o 80 90 100 110 120

If you draw in a small triangle under the line at the point 105 you can see that the (arithmetic) mean shows where a collection of numbers would “balance” or if you prefer has its “center of mass”. (Just think of kids on a see-saw).

Now when we try to see how spread out the data are by calculating the standard deviation. Remember that to do this we calculate the mean (105) and then calculate the deviations, etc.

Data points, x Deviations, Deviations^2 100 100-105 = -5 (100-105)^2 = (-5)^2 = 25 112 112-105 = 7 (112-105)^2 = (7)^2 = 49 121 121-105 = 16 (121-105)^2 = (16)^2 = 256 95 95-105 = -10 (95-105)^2 = (-10)^2 = 100 97 97-105 = -8 (97-105)^2 = (-8)^2 = 64 sums 525 0 494

Once we add up all the squared deviations we take their average. As we noted before, some books at this point just divide by the number of data points,. Our book (and many others) divide by when the data come from a sample. They do this for the following reason. Remember that we usually form a sample because we can’t get to all the data in the population. If you would like to know the average number of text messages sent by a 12 year old each day you can’t get this info for all kids in America (not even Verizon can do this!) but will have to find a sample of, say, 100 kids and work from that. Dividing by tends to underestimate the true population variability, so we boost it a little by dividing by a little less, We can be more technical about this later. Thus the sample variance,

And the sample standard deviation (“average” deviation)

Chebyshev's Theorem.

Pafnuty Chebyshev was an 19th^ century Russian mathematician who is probably most famous for the following idea. Given any set of numbers you can think of, the standard deviation gives us a way of organizing the data in our mind. Consider the following data, expressed as a dot plot.

o o o o o o

5 10 15 20 25 30 35 40 45 50 55 60 65 70

We can calculate the mean as

Put a marker on your graph at

Now calculate the standard deviation by filling in the missing figures in the table:

10 100 25 30 40 1600 45 55 3025 sums

And obtain

Now start to orient your data set by putting little markers one, two and three standard deviations away from the mean to each side. This is just like walking one, two, or three yards in either direction to mark out a garden. This gives us

If you mark your dotplot with these numbers you should see that

o o o o o o 5 10 15 20 25 30 35 40 45 50 55 60 65 70

Four of your numbers (25, 30, 40, and 45) lie between 18.22406 and 50. All of your numbers lie between 2.281456 and 66.

Chebyshev told us that this isn’t a coincidence. Here’s what is always true: if you have a collection of numbers you are guaranteed to see

At least 75% (3/4) of your data within 2 standard deviations of the mean At least 89% (8/9) of your data within 3 standard deviations of the mean At least 93.75% (15/16) of your data within 4 standard deviations of the mean.

In general, you will see at least

Of your data within standard deviations of the mean. Note: You will probably see more. This is a “worst case scenario”.

The Normal Distribution

IQ data tend to have the distribution shown in the histogram above. This is called the Normal or

Gaussian distribution and has a characteristic bell shape. Many real world data sets have this

shape.

As you can see Chebyshev is really very conservative. There is an Empirical Rule in our text that

tells us, when data are normally distributed :

Approximately 68.27% of data will lie within one standard deviation of the mean (between 85 and 115 above).

Approximately 95.45% of data will lie within one standard deviation of the mean (between 70 and 130 above).

Approximately 99.73of data will lie within one standard deviation of the mean (between 55 and 145 above).

So that’s pretty much the whole show. Using the normal distribution we will later compute that

the chances that someone randomly selected from the population has an IQ as high as 180 are

actually 1 in 20,741,279 (if you believe the model).

(^025 40 55 70 85 100 115 130 145 160 )

The Normal Distribution, =100, =

Computing Formula for Standard Deviation - Statistical Methods | STA 100, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Computing Formula for Standard Deviation - Statistical Methods | STA 100 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Text Sections: Chapter 3 Sections 2

Computing Formula for Standard Deviation

Chebyshev's Theorem.

o o o o o o

The Normal Distribution