Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lab 9: Correlation and Time Series Analysis, Lab Reports of Probability and Statistics

A lab exercise on correlation and time series analysis. It covers the concept of correlation as a measure of linear dependency between random variables, the computation of theoretical and sample correlations, and the discussion of time series and autocorrelation. The lab includes several problems to be solved using minitab software.

Typology: Lab Reports

Pre 2010

Uploaded on 08/08/2009

koofers-user-nxr-1
koofers-user-nxr-1 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lab 9. Correlation. Time series.
www.nmt.edu/~olegm/382/Lab9.pdf
Note: the menus and other things you will read or type on the computer are in
italics. Attach the printouts whenever needed.
In this Lab, we explore the correlation as a measure of linear dependency
between random variables. We also discuss Time Series and the related concept
of autocorrelation.
1 Correlation
When analyzing relationships between two or more random variables, the con-
cept of correlation is important.
The theoretical correlation is computed using
ρ=Cov(X, Y )
pV(X)V(Y)=E(XY )E(X)E(Y)
σXσY
The sample correlation is found when replacing the expected values by aver-
ages, that is
r=Pn
i=1(XiYi)/n X·Y
rPX2
i/n X2PY2
i/n Y2
Naturally, we would expect, when the sample is large enough, that the sample
correlation rwill be close to the expected one, ρ.
Problem 1
(a) To get a feel for correlation: make scatterplots of Y vs X and compute
the sample correlation (Stat Basic Correlation) for 4 examples
from the file corex.txt. Notice how the shape and orientation of plots
is changing according to the value of r. Describe what you see.
(b) What is the correlation of Xwith itself?
(c) Compute the correlation for temperature data between two cities: Tuc-
son, AZ and Eugene, OR. (file 2cities.csv). The temperatures are
given in Celcius. Does the correlation change if we apply a linear func-
tion to X or Y, say, we convert the temperature to Fahrenheit (recall
that oF=oC×1.8 + 32)?
1
pf3
pf4

Partial preview of the text

Download Lab 9: Correlation and Time Series Analysis and more Lab Reports Probability and Statistics in PDF only on Docsity!

Lab 9. Correlation. Time series.

www.nmt.edu/~olegm/382/Lab9.pdf

Note: the menus and other things you will read or type on the computer are in italics. Attach the printouts whenever needed.

In this Lab, we explore the correlation as a measure of linear dependency between random variables. We also discuss Time Series and the related concept of autocorrelation.

1 Correlation

When analyzing relationships between two or more random variables, the con- cept of correlation is important.

The theoretical correlation is computed using

ρ =

Cov(X, Y ) √ V (X)V (Y )

E (XY ) − E (X)E (Y )

σX σY

The sample correlation is found when replacing the expected values by aver- ages, that is

r =

∑n √ i=1(XiYi)/n^ −^ X^ ·^ Y (∑ X i^2 /n − X

2 ) (∑^

Y (^) i^2 /n − Y

Naturally, we would expect, when the sample is large enough, that the sample correlation r will be close to the expected one, ρ.

Problem 1

(a) To get a feel for correlation: make scatterplots of Y vs X and compute the sample correlation (Stat → Basic → Correlation) for 4 examples from the file corex.txt. Notice how the shape and orientation of plots is changing according to the value of r. Describe what you see.

(b) What is the correlation of X with itself?

(c) Compute the correlation for temperature data between two cities: Tuc- son, AZ and Eugene, OR. (file 2cities.csv). The temperatures are given in Celcius. Does the correlation change if we apply a linear func- tion to X or Y, say, we convert the temperature to Fahrenheit (recall that oF = oC × 1 .8 + 32)?

(d) If X and Y are independent then their correlation is 0. The converse is not true. As an example, consider the data on fuel economy for Ford Escort (Escort.txt) The variable X is speed in km/h, and Y is the fuel con- sumption in liters per 100km. Make a scatterplot. The correlation be- tween the speed and fuel consumption is close to 0, however, there is a clear relationship between the two. In fact, the correlation reflects the strength of linear relationship.

Problem 2

Consider the data set in MercBass.txt. The data contain several environmen- tal variables for a sample of Florida lakes, and the average mercury content in bass caught there. Obtain the “matrix” scatterplot (Graph → Matrix plot) for all the continuous variables. Also, compute all pairwise correlations. Which variables seem to be related? Do the correlation values fully reflect the extent of relationship between the variables?

2 Time series

Time Series occur when the observations Xt are taken at regular time inter- vals, and usually Xt is correlated with Xt+1. This is called autocorrelation. To compute it, we just need to find correlation of the X-column with itself, shifted by 1. This is lag 1 autocorrelation. When you shift by 2, 3 etc., you find how much Xt is related to Xt+2, Xt+3 etc. The result is a function of lag and is called autocorrelation function (ACF). (What is lag 0 autocorrela- tion?)

Problem 3

The file TucsonAZ.csv contains temperature data for Tucson, AZ.

(a) Consider the first 30 values of temp variable (copy them into a separate column). First, compute autocorrelation manually for lags 1,...,4 by shifting the column by 1,...,4 spaces. Then, use Minitab’s ACF (Stat → Time Series → Autocorrelation) to produce the graph.

(b) Make a Time Series Plot for the entire temp variable. There’s a clearly defined seasonal trend. Which month is the hottest?

(c) The last column contains the residuals, that is, the observations with the seasonal trend subtracted. (We will later learn how to fit the trend function for this example.)

i. Use Calc → Make Patterned Data → Simple to obtain a column with the sequence 1, 2, 3, 4, 5 repeated as many times as needed to match the length of the data set ii. Use Data → Unstack and then

iii. With the 5 columns you got, use Calc → Row Statistics to find their sums.

Find the variance of residuals themselves and then the variance of the 5-day batch sums. Do they agree with your computations in part (b)? (Assume that r equals to Lag 1 autocorrelation you found in Problem 3.)