Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Interpreting Multiple Regression: R-squared, Estimates, and Intervals, Lecture notes of Mathematical Statistics

Landmark College Mathematical Statistics

How to interpret the results of a multiple regression analysis, focusing on r-squared values, parameter estimates, and their confidence intervals. Using an example from a study on the proportion of pollen removed by queen and worker bumblebees, it discusses the meaning of r-squared and adjusted r-squared, the interpretation of intercepts and coefficients, and the calculation of confidence intervals for linear combinations of parameters. It also covers the standard error of the mean and the use of the correlation matrix and covariance matrix to find the standard error for a linear combination.

What you will learn

How do you interpret the intercept and coefficients in a multiple regression model?
What is the meaning of R-squared and adjusted R-squared in multiple regression?
How do you calculate confidence intervals for linear combinations of parameters in multiple regression?

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

sheela_98 🇺🇸

4.2

(12)

234 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

Interpretation in Multiple Regression

Topics:

1.

R-squared and Adjusted R-squared

2.

Interpretation of parameter estimates

3.

Linear combinations of parameter estimates



variance-covariance matrix



standard errors of combinations



standard error for the mean

We will use the final model from last time to illustrate these concepts.

Summaries

of

the

model - least squares estimates with standard errors given below in parentheses:



logit



proportion

  

2.71



0.89



log



duration

 

0.57



I



0.38

 

.14

 

0.24







= 0.65 with 44 degrees of freedom

R-squared =

0.6068029

R-squared

and

Adjusted

R-squared

:

The

R-squared

value

means

that

61%

of

the

variation

in

the

logit

of

proportion

of

pollen

removed

can

be

explained

by

the

regression

on

log

duration

and

the

group

indicator

variable.

As

R-squared

values

increase

as

we

ass

more

variables

to

the

model,

the

adjusted

R-squared

is

often

used

to

summarize

the

fit

as

it takes into account the the number of variables in the model.

Adjusted R-squared = 1 - Mean Square Error /Total Mean Square

where

Mean

Square

Error

is





2

from

the

regression

model

and

the

Total

mean

square

is

the

sample

variance

of

the

response

(

s

Y

2







2

is

a

good

estimate

if

all

the

regression

coefficients are 0). For this example,

Adjusted R-squared = 1 - 0.65^2/ 1.034 = 0.59.

Intercept

:

the

intercept

in

a

multiple

regression

model

is

the

mean

for

the

response

when

all of the explanatory variables take on the value 0.

In

this

problem,

this

means

that

the

dummy

variable

I

=

0

(code

=

1,

which

was

the

queen

bumblebees)

and

log(duration)

=

0,

or

duration

is

1

second.

For

queenbumblebees,

with

visits

of

1

second,

we

are

95%

confident

that

the

mean

logit(proportion

of

pollen

removed)

is

between



2.71



2.02



0.38

or

between

-

3.49

to

-

1.93.

The

Student

t

quantile 2.02 is based on 44 degrees of freedom; qt(.975, 44).

To

convert

back

to

the

original

units,

we

can

take

the

inverse

of

the

logit

transformation.

I.e.

if

logit(p)

=

log(p/(1-p)),

then

p

=

exp(x)/(1

+

exp(x)).

To

get

the

confidence

interval

Partial preview of the text

Download Interpreting Multiple Regression: R-squared, Estimates, and Intervals and more Lecture notes Mathematical Statistics in PDF only on Docsity!

Interpretation in Multiple Regression

Topics:

R-squared and Adjusted R-squared
Interpretation of parameter estimates
Linear combinations of parameter estimates

variance-covariance matrix

standard errors of combinations

standard error for the mean

We will use the final model from last time to illustrate these concepts. Summaries of the model - least squares estimates with standard errors given below in parentheses:

logit

proportion 2.71 0.89 log duration 0.57 I

= 0.65 with 44 degrees of freedom R-squared = 0. R-squared and Adjusted R-squared : The R-squared value means that 61% of the variation in the logit of proportion of pollen removed can be explained by the regression on log duration and the group indicator variable. As R-squared values increase as we ass more variables to the model, the adjusted R-squared is often used to summarize the fit as it takes into account the the number of variables in the model. Adjusted R-squared = 1 - Mean Square Error /Total Mean Square where Mean Square Error is 2 from the regression model and the Total mean square is the sample variance of the response ( sY

is a good estimate if all the regression coefficients are 0). For this example, Adjusted R-squared = 1 - 0.65^2/ 1.034 = 0.59. Intercept : the intercept in a multiple regression model is the mean for the response when all of the explanatory variables take on the value 0. In this problem, this means that the dummy variable I = 0 (code = 1, which was the queen bumblebees) and log(duration) = 0, or duration is 1 second. For queenbumblebees, with visits of 1 second, we are 95% confident that the mean logit(proportion of pollen

removed) is between 2.71 2.02 0.38 or between - 3.49 to - 1.93. The Student t

quantile 2.02 is based on 44 degrees of freedom; qt(.975, 44). To convert back to the original units, we can take the inverse of the logit transformation. I.e. if logit(p) = log(p/(1-p)), then p = exp(x)/(1 + exp(x)). To get the confidence interval

for the proportion just apply the inverse transformation. So for queen bumblebees with visits lasting 1 second, we are 95% confident that the mean proportion of pollen removed is between 0.03 and 0.13. [exp(-3.49)/(1 + exp(-3.49)) to exp(-1.93)/(1 + exp(- 1.93))] Note: while this is the interpretation of the intercept, we are extrapolating. Regression Coefficients : Typically the coefficient of a variable is interpreted as the change in the response based on a 1-unit change in the corresponding explanatory variable keeping all other variables held constant. In some problems, keeping all other variables held fixed is impossible (i.e. A quadratic model, or the model with different slopes for queen and worker bees). For this example, we have the estimated coefficient of log(duration) is 0.89. Because we have taken the log transformation of duration, the interpretation of the coefficient is easier to understand by looking at a doubling of duration (review page 208 chapter 8). A doubling of the duration of visit corresponds to a β 1 log(2) change in the mean logit(proportion of pollen removed) or 0.89*log(2) = 0.62. The 95% confidence interval

for β 1 is 0.89 2.02 0.14 or 0.61 to 1.17. The interval under the doubling of

duration is obtained by multiplying this interval by log(2). So a 95% confidence interval for the change in the mean logit(proportion pollen removed) is 0.42 to 0.81. Further simplification is not possible. Dummy variable coefficients: A 1 unit change for a dummy variable implies going from level 0 to level 1, so the the interpretation of the dummy variable coefficient is the amount by which the mean logit(proportion) for worker bees exceeds the mean logit(proportion) for queen bumble bees. i.e the logit of the proportion pollen removed for worker bees is 0.56 higher than the logit for queen bumble bees. A 95% confidence interval for the amount is 0.09 to 1.05. (this is the case for parallel regression lines; if we still had the interaction variable we could not make this statement, since the interaction of the dummy*log(duration) cannot be held constant). In the model derivation, we said that the intercept plus the dummy variable coefficient corresponded to the intercept for the worker bees, which is estimated as - 2.71 + .57 or - 2.14. This can be translated ac to the original scale as we did the intercept for the queen bumble bees. As this has a more interesting meaning, let's find a confidence interval for β 0 + β 2. To do this we need to find the standard error for a linear combination.

Linear Combination of Parameters

To find the variance (and then standard deviation) of the estimator of β 0 + β 2 we need to take into account the individual variances plus how the estimates will vary together from sample to sample (their covariance). The variance of the sum is the sum of the variances plus 2 times the covariance. We can get the covariance from the correlation of the estimates (recall the correlation is the covariance divided by the product of the standard deviations, so the covariance is the correlation times the product of the standard deviations. Since the standard deviations are unknown, we use the estimated covariance matrix calculated using the standard errors. In the Results options for Regression, check

The variance of the mean at this point is found by

i 0

p

j 0

p

cov!

i ,!

j # Ci C j

which in this case simplifies to

var!

0 #$ 1 % var!

1 #log 2

2

2 $ cov!

1 #$ 1 $ log 2 #& 0.

For more details see section 10.4.3 and exercises 21-23. This is how the se.fit variable is obtained. From the SE(mean) we can get the SE(prediction)

SE prediction Y ' X ( x ))( * Xhat +

SE -. / Y 0 X 1 x 2

2

To get a prediction interval first calculate the prediction interval in the logit scale, then transform the interval using the inverse transformation applied to each endpoint of the interval. Putting this all together we can find the estimates and prediction intervals in the original units. 10 30 50 70 duration of visit (seconds)

Proportion pollen removed code=1 Queens code=2 Workers Estimated Mean for Queens Estimated Mean for Workers 95% Prediction Intervals Queens 95% Prediction Intervals Workers Proportion of Pollen Removed for Queen Bumblebees and Worker Honeybees

Interpreting Multiple Regression: R-squared, Estimates, and Intervals, Lecture notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Interpreting Multiple Regression: R-squared, Estimates, and Intervals and more Lecture notes Mathematical Statistics in PDF only on Docsity!

Interpretation in Multiple Regression

variance-covariance matrix

standard errors of combinations

standard error for the mean

proportion  2.71  0.89  log  duration 0.57  I

removed) is between  2.71  2.02  0.38 or between - 3.49 to - 1.93. The Student t

for β 1 is 0.89  2.02  0.14 or 0.61 to 1.17. The interval under the doubling of

Linear Combination of Parameters

The variance of the mean at this point is found by 

i  0

j  0

cov!

i ,!

j # Ci C j

var!

0 #$ 1 % var!

1 #log 2

2 $ cov!

1 #$ 1 $ log 2 #& 0.

SE prediction Y ' X ( x ))( * Xhat +

SE -. / Y 0 X 1 x 2

proportion 2.71 0.89 log duration 0.57 I

removed) is between 2.71 2.02 0.38 or between - 3.49 to - 1.93. The Student t

for β 1 is 0.89 2.02 0.14 or 0.61 to 1.17. The interval under the doubling of

The variance of the mean at this point is found by

i 0

j 0