



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This problem set explores econometric concepts through practical applications. It analyzes the relationship between house prices and factors like square footage and number of bedrooms, demonstrating omitted variable bias and multicollinearity. It also examines the relationship between restaurant visits and obesity, highlighting the importance of considering potential confounding factors in statistical analysis.
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!
Due: October 24
data on house prices in dollars (price), the size of the house in square feet (sqrft), the number of
bedrooms (bdrms) and other variables.
a) Make a scatter plot of price against the square footage of the house and price against the # of
bedrooms. Describe the apparent relationships shown in each. Is there a positive or negative
correlation? Do the errors appear homoskedastic or possibly heteroskedastic?
There is a positive correlation for both graphs: price increases when there is an increase in squarefoot
and with the number of bedrooms.
The error for both appears heteroskedastic since the variance of error is larger at smaller and larger
squarefoot than in the middle, and variance of price is larger in the mid-range of bedrooms.
b) Write the population regression function for the bivariate model between price and square feet.
Run this regression in Stata, including robust standard errors, and report the results, including
interpreting the meaning of both estimated coefficients.
๐๐๐๐๐ ๐
= ฮฒ 0
๐
ฮฒ =11204.14, a house with zero square footage would worth $11204. 0
ฮฒ =140.2121, each additional square footage is associated with a price increase of the house by 1
140.211 dollars. The effect of square-footage on price is significant @95% confidence level.
c) Could there be omitted variables bias in this regression because we did not include #
bedrooms? Does # bedrooms meet our criteria to be a likely omitted variable? If we do not
include the # bedrooms as a variable in our regression, will we over or underestimate the
coefficient on sqrft? Why?
Yes. It is likely that the number of bedrooms is a determinant of house price so corr(ov, y)โ 0. The
number of bedrooms is likely correlated with the squarefootage of the house so corr(ov, x)โ 0.
Also, the number of bedrooms cannot be explained completely by the squarefootage of the house
so corr(ov,x)โ 1, which means #bdrms is not redundant. Therefore, the number of bedrooms is
likely an omitted variable.
It is likely that number of bedrooms is positively correlated with price of the house (more number
of bedrooms, more expensive house) and positively correlate with squarefootage (larger house has
more bedrooms), therefore it is likely that there is a positive bias, resulting in an overestimation of
the coefficient on sqrft(+)(+)
d) Confirm your logic from c) by including # bedrooms in the population regression model and then
estimating it in Stata. Write the model and interpret the magnitude and statistical significance of the
coefficients. (Donโt forget to use robust SE)
๐
= ฮฒ 0
๐ ๐๐๐๐ก + ฮฒ 2
๐
๐
๐
f) What is the expected increase in the price of a house with one more bedroom, given that the total square
footage of the house remains constant at 1000 sq ft? What is the expected increase in the price of a house
with one more bedroom, given that the total square footage of the house remains constant at 1140 sq ft?
Explain why these numbers are the same or different.
Price increase after adding one more bedroom for a house with 1000 sq ft:
โ๐๐๐๐๐ =-19315+128.4361000+15198.19(x+1)-19315+128.4361000+15198.19x ๐
=15198.19*1=15198.19=ฮฒ 2
Price increase after adding one more bedroom for a house with 1140 sq ft:
โ๐๐๐๐๐ =-19315+128.4361140+15198.19(x+1)-19315+128.4361140+15198.19 ๐
=15198.19*1=15198.19=ฮฒ 2
The expected increase in house price is the same for different square footage, equal to the coefficient of the
sqarefootage variable ( ฮฒ). There is no change in the price of the house because according to the definition 2
of linear regression, the coefficient estimate is independent of the value of x or other variables in the model.
g) What is the expected increase in the price of a house with one more bedroom and with an additional 140
square feet of size? Compare your answer to the answer in f)
โ๐๐๐๐๐ =-19315+128.4361140+15198.19(x+1)-19315+128.4361000+15198.19x ๐
=140128.436+115198.19= 17981.04+15198.19=33179.23.
After adding a bedroom while also adding squarefootage, the price of the house will increase by $33179.23,
larger than 15198.19 when only increasing one more bedroom but keeping the size of the house the same.
f) We included the # of bedrooms in our regression because we were worried about it as an omitted
variable. Should we also be worried that sqrft and bdrms might cause problems due to imperfect
multicollinearity? Report the correlation between sqrft and bdrms and explain whether you think we should
include both in the model?
As the correlation matrix below shows, squareroot and bedrooms are positively correlated with
correlation coefficient = 0.5315. There is imperfect multicollinearity since the two independent variables
are not perfectly. Since the correlation between squarefootage and bedrooms is smaller than 1, we do not
need to worry about the problem of imperfect collinearity. We can include both variables to reduce
omitted variables bias but there might be a trade off to increase the variance. Still, I think we should
include both variables because 0.5315 does not indicate the correlation is too strong.
i) An economist suggests that a possible omitted variable in our regression (from d)) is the level of air
pollution near this house. Could the level of air pollution meet our criteria to be a likely omitted variable?
What is the likely sign of the bias on the coefficient on square footage? Why? (Note you have no data here
so you just need to think about it. Hint: assume that houses are smaller in cities.)
Yes. The level of pollution can be an omitted variable since 1) Pollution might have a negative correlation
with house price and 2) Pollution might have a negative correlation with the independent variable
squarefootage and 3) Pollution is not perfectly explained by squarefootage.
Bias = ฮฒ =(-)(-) 2
ฮด 1
ฮฒ is negative because higher air pollution is likely correlated with lower house price because people want to 2
avoid houses with higher air pollution.
ฮด is likely negative because air pollution in cities is worse and houses in cities are smaller, which means 1
higher levels of air pollution is likely correlated with smaller houses (square footage). Therefore, the
correlation between air pollution and house price should be negative.
Since both signs of the bias is positive, there is a positive bias, causing squarefoot to overestimate price of
house.
obesity among Americans.
1 The researcher obtains access to individual data from a survey of
U.S. adults about their eating habits and obesity conducted by the Center for Disease Control.
Obesity is measured by the individual's BMI (Body Mass Index), which is the ratio
weight/height
2
. Consider the model below:
where ๐๐๐๐ is the measured Body Mass Index of individual ๐ (in kg/m
2 ) and ๐๐๐ ๐ก๐๐ข๐๐๐๐ก๐ is the
number of times the individual has eaten in a restaurant in the last month.
a) What sign do you expect for ๐ฝ 1?
I expect ๐ฝ1 to be positive - restaurant usually cook with a lot of sugar and oil, which increases calories of the
food, hence the number of times the individual has eaten in a restaurant might be positively correlated with
the BMI.
b) Under what conditions will ๐ฝ 1
be an unbiased estimator of ๐ฝ 1
If all three OLS assumptions are hold: 1) Data is iid, 2) Zero conditional mean: E(u_i|restaurant) =0, and 3) there are no large outliers.
c) Describe three other factors that affect a person's BMI. Give an example of one factor that
might not bias ๐ฝ
. Explain.
the same family often have similar body shapes and sizes due to shared genetics. For instance,
if someone comes from a family where many members have a naturally higher body weight,