


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A lab exercise on correlation and time series analysis. It covers the concept of correlation as a measure of linear dependency between random variables, the computation of theoretical and sample correlations, and the discussion of time series and autocorrelation. The lab includes several problems to be solved using minitab software.
Typology: Lab Reports
1 / 4
This page cannot be seen from the preview
Don't miss anything!
www.nmt.edu/~olegm/382/Lab9.pdf
Note: the menus and other things you will read or type on the computer are in italics. Attach the printouts whenever needed.
In this Lab, we explore the correlation as a measure of linear dependency between random variables. We also discuss Time Series and the related concept of autocorrelation.
When analyzing relationships between two or more random variables, the con- cept of correlation is important.
The theoretical correlation is computed using
ρ =
Cov(X, Y ) √ V (X)V (Y )
σX σY
The sample correlation is found when replacing the expected values by aver- ages, that is
r =
∑n √ i=1(XiYi)/n^ −^ X^ ·^ Y (∑ X i^2 /n − X
Y (^) i^2 /n − Y
Naturally, we would expect, when the sample is large enough, that the sample correlation r will be close to the expected one, ρ.
Problem 1
(a) To get a feel for correlation: make scatterplots of Y vs X and compute the sample correlation (Stat → Basic → Correlation) for 4 examples from the file corex.txt. Notice how the shape and orientation of plots is changing according to the value of r. Describe what you see.
(b) What is the correlation of X with itself?
(c) Compute the correlation for temperature data between two cities: Tuc- son, AZ and Eugene, OR. (file 2cities.csv). The temperatures are given in Celcius. Does the correlation change if we apply a linear func- tion to X or Y, say, we convert the temperature to Fahrenheit (recall that oF = oC × 1 .8 + 32)?
(d) If X and Y are independent then their correlation is 0. The converse is not true. As an example, consider the data on fuel economy for Ford Escort (Escort.txt) The variable X is speed in km/h, and Y is the fuel con- sumption in liters per 100km. Make a scatterplot. The correlation be- tween the speed and fuel consumption is close to 0, however, there is a clear relationship between the two. In fact, the correlation reflects the strength of linear relationship.
Problem 2
Consider the data set in MercBass.txt. The data contain several environmen- tal variables for a sample of Florida lakes, and the average mercury content in bass caught there. Obtain the “matrix” scatterplot (Graph → Matrix plot) for all the continuous variables. Also, compute all pairwise correlations. Which variables seem to be related? Do the correlation values fully reflect the extent of relationship between the variables?
Time Series occur when the observations Xt are taken at regular time inter- vals, and usually Xt is correlated with Xt+1. This is called autocorrelation. To compute it, we just need to find correlation of the X-column with itself, shifted by 1. This is lag 1 autocorrelation. When you shift by 2, 3 etc., you find how much Xt is related to Xt+2, Xt+3 etc. The result is a function of lag and is called autocorrelation function (ACF). (What is lag 0 autocorrela- tion?)
Problem 3
The file TucsonAZ.csv contains temperature data for Tucson, AZ.
(a) Consider the first 30 values of temp variable (copy them into a separate column). First, compute autocorrelation manually for lags 1,...,4 by shifting the column by 1,...,4 spaces. Then, use Minitab’s ACF (Stat → Time Series → Autocorrelation) to produce the graph.
(b) Make a Time Series Plot for the entire temp variable. There’s a clearly defined seasonal trend. Which month is the hottest?
(c) The last column contains the residuals, that is, the observations with the seasonal trend subtracted. (We will later learn how to fit the trend function for this example.)
i. Use Calc → Make Patterned Data → Simple to obtain a column with the sequence 1, 2, 3, 4, 5 repeated as many times as needed to match the length of the data set ii. Use Data → Unstack and then
iii. With the 5 columns you got, use Calc → Row Statistics to find their sums.
Find the variance of residuals themselves and then the variance of the 5-day batch sums. Do they agree with your computations in part (b)? (Assume that r equals to Lag 1 autocorrelation you found in Problem 3.)