




















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of measures of central tendency and provides examples on how to calculate the mean, median, and mode using given data sets. It also discusses the strengths and weaknesses of each measure and how they differ in handling skewed data.
What you will learn
Typology: Lecture notes
1 / 28
This page cannot be seen from the preview
Don't miss anything!
Questions such as: “how many calories do I eat per day?” or “how much time do I spend talking per day?” can be hard to answer because the answer will vary from day to day. It’s sometimes more sensible to ask “how many calories do I consume on a typical day?” or “on average, how much time do I spend talking per day?”.
In this section we will study three ways of measuring central tendency in data, the mean, the median and the mode. Each measure give us a single value?^ that might be considered typical. Each measure has its own strengths and weaknesses ?: some exclusions apply
A population of books, cars, people, polar bears, all games played by Babe Ruth throughout his career etc.... is the entire collection of those objects. For any given variable under consideration, each member of the population has a particular value of the variable associated to them, for example the number of home runs scored by Babe Ruth for each game played by him during his career. These values are called data and we can apply our measures of central tendency to the entire population, to get a single value (maybe more than one for the mode) measuring central tendency for the entire population; or we can apply our measures to a subset or sample of the population, to get an estimate of the central tendency for the population.
The population mean of m numbers x 1 , x 2 ,... , xm (the data for every member of a population of size m) is denoted by μ and is computed as follows:
μ =
x 1 + x 2 + · · · + xm m
The sample mean of the numbers x 1 , x 2 ,... , xn (data for a sample of size n from the population) is denoted by ¯x and is computed similarly:
x¯ =
x 1 + x 2 + · · · + xn n
Example Consider the following set of data, showing the number of times a sample of 5 students check their e-mail per day: 1 , 3 , 5 , 5 , 3.
Here n = 5 and x 1 = 1, x 2 = 3, x 3 = 5, x 4 = 5 and x 5 = 3.
Calculate the sample mean ¯x. 1 + 3 + 5 + 5 + 3 5
We can calculate the mean above more efficiently here by using frequencies. We can see from the calculation above that
x¯ =
The frequency distribution for the data is:
Frequency 0 1 0 × 1 1 2 1 × 2 2 8 2 × 8 3 4 3 × 4 4 5 4 × 5 ¯x = Sum 20 = 5020 = 2. 5
In general: If the frequency/relative frequency table for our sample of size n looks like the one below (where the observations are denoted Oi, the corresponding frequencies by fi and the relative frequencies by fi/n):
Observation Frequency Relative Frequency Oi fi fi/n O 1 f 1 f 1 /n O 2 f 2 f 2 /n O 3 f 3 f 3 /n .. .
OR fR fR/n then:
Alternatively we can use the relative frequencies, instead of dividing by the n at the end.
Outcome Frequency Relative Frequency Outcome × Relative Frequency Oi fi fi/n Oi × fi/n O 1 f 1 f 1 /n O 1 × f 1 /n O 2 f 2 f 2 /n O 2 × f 2 /n O 3 f 3 f 3 /n O 3 × f 3 /n ... ... ... ... OR .fR fR/n OR × fR/n SUM = ¯x
You can of course choose any method for calculation from the three methods listed above. The easiest method to use will depend on how the data is presented.
Example The number of goals scored by the 32 teams in the 2014 world cup are shown below:
18 , 15 , 12 , 11 , 10 , 8 , 7 , 7 , 6 , 6 , 6 , 5 , 5 , 5 , 4 , 4 , 4 , 4 , 4 , 4 , 3 , 3 , 3 , 3 , 3 , 2 , 2 , 2 , 2 , 1 , 1 , 1
Make a frequency table for the data and, taking the soccer teams who played in the world cup as a population, calculate the population mean, μ.
Outcome Frequency 1 3 2 4 3 5 4 6 5 3 6 3 7 2
Outcome Frequency 8 1 10 1 11 1 12 1 15 1 18 1 μ =?
If we are given a histogram (showing frequencies) or a frequency table where the data is already grouped into categories, and we do not have access to the original data, we can still estimate the mean using the midpoints of the intervals which serve as categories for the data. Suppose there are k categories (shown as the bases of the rectangles) with midpoints m 1 , m 2 ,... , mk respectively and the frequencies of the corresponding intervals are f 1 , f 2 ,... , fk, then the mean of the data set is approximately
m 1 f 1 + m 2 f 2 + · · · + mkfk n
where n = f 1 + f 2 + · · · + fk.
Example Approximate the mean for the set of data used to make the following histogram, showing the time (in seconds) spent waiting by a sample of customers at Gringotts Wizarding bank.
250 - 300 300 -^350
2
4
6
8
10
12
50 - 100 100 - 150 150 - 200 200 - 250 Time spent waiting (in seconds)
midpoints:
approximation of sample mean:
x ¯approx =
This calculation only gives an approximation to the sample mean because I do not know the distribution of actual wait times within each bar (cf. the two histograms for Old Faithful eruption durations in the previous section’s slides).
We can calculate the minimum possible sample mean by assuming all the people in each bar are at the left hand edge. For example, all 12 people in the first bar waited 50 seconds. This gives a result of ¯xmin = 105.
We can also calculate the maximal possible sample mean by assuming all the people in each bar are at the right hand edge. This gives the result ¯xmax = 155. Notice ¯xapprox =
x¯min + ¯xmax 2
and the actual sample mean, ¯x satisfies the inequalities
x¯min 6 x¯ 6 x¯max
Example The number of goals scored by the 32 teams in the 2014 world cup are shown below: 18 , 15 , 12 , 11 , 10 , 8 , 7 , 7 , 6 , 6 , 6 , 5 , 5 , 5 , 4 , 4 , 4 , 4 , 4 , 4 , 3 , 3 , 3 , 3 , 3 , 2 , 2 , 2 , 2 , 1 , 1 , 1 Find the median of the above set of data. The data is in descending order. There are 32 events and half of 32 is 16. Sixteen elements from the right is 4, indicated in green in the list below. Sixteen elements from the left is 4, indicated in red in the list below. The median is 4 =
Example A sample of 5 students were asked how much money they were carrying and the results are shown below:
$75, $2, $5, $0, $5. Find the mean and median of the above set of data.
The data in ascending order is 0, 2, 5, 5, 75. The median is 0 + 2 + 5 + 5 + 75 5
= 17.4. There are 5 = 2 · 3 − 1 numbers so to find the median count in 3 from either end to get 5. Notice that the median gives us a more representative picture here, since the mean is skewed by the outlier $75.