Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Econometrics Problem Set 3: Analyzing House Prices and Obesity, Assignments of Introduction to Econometrics

This problem set explores econometric concepts through practical applications. It analyzes the relationship between house prices and factors like square footage and number of bedrooms, demonstrating omitted variable bias and multicollinearity. It also examines the relationship between restaurant visits and obesity, highlighting the importance of considering potential confounding factors in statistical analysis.

Typology: Assignments

2023/2024

Uploaded on 10/28/2024

wentao-ma
wentao-ma ๐Ÿ‡บ๐Ÿ‡ธ

2 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Eco 240: Econometrics
Professor Mariyana Zapryanova
Problem Set 3
Due: October 24
1. For this problem set, use the data file โ€œhprice1.dtaโ€ on Moodle. The data set hprice1.dta contains
data on house prices in dollars (price), the size of the house in square feet (sqrft), the number of
bedrooms (bdrms) and other variables.
a) Make a scatter plot of price against the square footage of the house and price against the # of
bedrooms. Describe the apparent relationships shown in each. Is there a positive or negative
correlation? Do the errors appear homoskedastic or possibly heteroskedastic?
There is a positive correlation for both graphs: price increases when there is an increase in squarefoot
and with the number of bedrooms.
The error for both appears heteroskedastic since the variance of error is larger at smaller and larger
squarefoot than in the middle, and variance of price is larger in the mid-range of bedrooms.
b) Write the population regression function for the bivariate model between price and square feet.
Run this regression in Stata, including robust standard errors, and report the results, including
interpreting the meaning of both estimated coefficients.
๐‘ƒ๐‘Ÿ๐‘–๐‘๐‘’๐‘–= ฮฒ0+ ฮฒ1๐‘ ๐‘ž๐‘Ÿ๐‘“๐‘ก + ๐‘ข๐‘–
=11204.14, a house with zero square footage would worth $11204.14
ฮฒ0
=140.2121, each additional square footage is associated with a price increase of the house by
ฮฒ1
140.211 dollars. The effect of square-footage on price is significant @95% confidence level.
pf3
pf4
pf5

Partial preview of the text

Download Econometrics Problem Set 3: Analyzing House Prices and Obesity and more Assignments Introduction to Econometrics in PDF only on Docsity!

Eco 240: Econometrics

Professor Mariyana Zapryanova

Problem Set 3

Due: October 24

  1. For this problem set, use the data file โ€œ hprice1.dta โ€ on Moodle. The data set hprice1.dta contains

data on house prices in dollars (price), the size of the house in square feet (sqrft), the number of

bedrooms (bdrms) and other variables.

a) Make a scatter plot of price against the square footage of the house and price against the # of

bedrooms. Describe the apparent relationships shown in each. Is there a positive or negative

correlation? Do the errors appear homoskedastic or possibly heteroskedastic?

There is a positive correlation for both graphs: price increases when there is an increase in squarefoot

and with the number of bedrooms.

The error for both appears heteroskedastic since the variance of error is larger at smaller and larger

squarefoot than in the middle, and variance of price is larger in the mid-range of bedrooms.

b) Write the population regression function for the bivariate model between price and square feet.

Run this regression in Stata, including robust standard errors, and report the results, including

interpreting the meaning of both estimated coefficients.

๐‘ƒ๐‘Ÿ๐‘–๐‘๐‘’ ๐‘–

= ฮฒ 0

  • ฮฒ 1

๐‘–

ฮฒ =11204.14, a house with zero square footage would worth $11204. 0

ฮฒ =140.2121, each additional square footage is associated with a price increase of the house by 1

140.211 dollars. The effect of square-footage on price is significant @95% confidence level.

c) Could there be omitted variables bias in this regression because we did not include #

bedrooms? Does # bedrooms meet our criteria to be a likely omitted variable? If we do not

include the # bedrooms as a variable in our regression, will we over or underestimate the

coefficient on sqrft? Why?

Yes. It is likely that the number of bedrooms is a determinant of house price so corr(ov, y)โ‰ 0. The

number of bedrooms is likely correlated with the squarefootage of the house so corr(ov, x)โ‰ 0.

Also, the number of bedrooms cannot be explained completely by the squarefootage of the house

so corr(ov,x)โ‰ 1, which means #bdrms is not redundant. Therefore, the number of bedrooms is

likely an omitted variable.

It is likely that number of bedrooms is positively correlated with price of the house (more number

of bedrooms, more expensive house) and positively correlate with squarefootage (larger house has

more bedrooms), therefore it is likely that there is a positive bias, resulting in an overestimation of

the coefficient on sqrft(+)(+)

d) Confirm your logic from c) by including # bedrooms in the population regression model and then

estimating it in Stata. Write the model and interpret the magnitude and statistical significance of the

coefficients. (Donโ€™t forget to use robust SE)

๐‘–

= ฮฒ 0

  • ฮฒ 1

๐‘ ๐‘ž๐‘Ÿ๐‘“๐‘ก + ฮฒ 2

๐‘–

๐‘–

๐‘–

f) What is the expected increase in the price of a house with one more bedroom, given that the total square

footage of the house remains constant at 1000 sq ft? What is the expected increase in the price of a house

with one more bedroom, given that the total square footage of the house remains constant at 1140 sq ft?

Explain why these numbers are the same or different.

Price increase after adding one more bedroom for a house with 1000 sq ft:

โˆ†๐‘๐‘Ÿ๐‘–๐‘๐‘’ =-19315+128.4361000+15198.19(x+1)-19315+128.4361000+15198.19x ๐‘–

=15198.19*1=15198.19=ฮฒ 2

Price increase after adding one more bedroom for a house with 1140 sq ft:

โˆ†๐‘๐‘Ÿ๐‘–๐‘๐‘’ =-19315+128.4361140+15198.19(x+1)-19315+128.4361140+15198.19 ๐‘–

=15198.19*1=15198.19=ฮฒ 2

The expected increase in house price is the same for different square footage, equal to the coefficient of the

sqarefootage variable ( ฮฒ). There is no change in the price of the house because according to the definition 2

of linear regression, the coefficient estimate is independent of the value of x or other variables in the model.

g) What is the expected increase in the price of a house with one more bedroom and with an additional 140

square feet of size? Compare your answer to the answer in f)

โˆ†๐‘๐‘Ÿ๐‘–๐‘๐‘’ =-19315+128.4361140+15198.19(x+1)-19315+128.4361000+15198.19x ๐‘–

=140128.436+115198.19= 17981.04+15198.19=33179.23.

After adding a bedroom while also adding squarefootage, the price of the house will increase by $33179.23,

larger than 15198.19 when only increasing one more bedroom but keeping the size of the house the same.

f) We included the # of bedrooms in our regression because we were worried about it as an omitted

variable. Should we also be worried that sqrft and bdrms might cause problems due to imperfect

multicollinearity? Report the correlation between sqrft and bdrms and explain whether you think we should

include both in the model?

As the correlation matrix below shows, squareroot and bedrooms are positively correlated with

correlation coefficient = 0.5315. There is imperfect multicollinearity since the two independent variables

are not perfectly. Since the correlation between squarefootage and bedrooms is smaller than 1, we do not

need to worry about the problem of imperfect collinearity. We can include both variables to reduce

omitted variables bias but there might be a trade off to increase the variance. Still, I think we should

include both variables because 0.5315 does not indicate the correlation is too strong.

i) An economist suggests that a possible omitted variable in our regression (from d)) is the level of air

pollution near this house. Could the level of air pollution meet our criteria to be a likely omitted variable?

What is the likely sign of the bias on the coefficient on square footage? Why? (Note you have no data here

so you just need to think about it. Hint: assume that houses are smaller in cities.)

Yes. The level of pollution can be an omitted variable since 1) Pollution might have a negative correlation

with house price and 2) Pollution might have a negative correlation with the independent variable

squarefootage and 3) Pollution is not perfectly explained by squarefootage.

Bias = ฮฒ =(-)(-) 2

ฮด 1

ฮฒ is negative because higher air pollution is likely correlated with lower house price because people want to 2

avoid houses with higher air pollution.

ฮด is likely negative because air pollution in cities is worse and houses in cities are smaller, which means 1

higher levels of air pollution is likely correlated with smaller houses (square footage). Therefore, the

correlation between air pollution and house price should be negative.

Since both signs of the bias is positive, there is a positive bias, causing squarefoot to overestimate price of

house.

  1. A researcher wants to test whether eating out in restaurants is a key factor in the increasing

obesity among Americans.

1 The researcher obtains access to individual data from a survey of

U.S. adults about their eating habits and obesity conducted by the Center for Disease Control.

Obesity is measured by the individual's BMI (Body Mass Index), which is the ratio

weight/height

2

. Consider the model below:

where ๐‘๐‘š๐‘–๐‘– is the measured Body Mass Index of individual ๐‘– (in kg/m

2 ) and ๐‘Ÿ๐‘’๐‘ ๐‘ก๐‘Ž๐‘ข๐‘Ÿ๐‘Ž๐‘›๐‘ก๐‘– is the

number of times the individual has eaten in a restaurant in the last month.

a) What sign do you expect for ๐›ฝ 1?

I expect ๐›ฝ1 to be positive - restaurant usually cook with a lot of sugar and oil, which increases calories of the

food, hence the number of times the individual has eaten in a restaurant might be positively correlated with

the BMI.

b) Under what conditions will ๐›ฝ 1

be an unbiased estimator of ๐›ฝ 1

If all three OLS assumptions are hold: 1) Data is iid, 2) Zero conditional mean: E(u_i|restaurant) =0, and 3) there are no large outliers.

c) Describe three other factors that affect a person's BMI. Give an example of one factor that

might not bias ๐›ฝ

. Explain.

  1. Genetics: Genetic factors play a significant role in determining a person's BMI. People from

the same family often have similar body shapes and sizes due to shared genetics. For instance,

if someone comes from a family where many members have a naturally higher body weight,