Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

7 Problems for Final Exam - Introduction to Statistics | MATH 102, Exams of Statistics

Material Type: Exam; Class: Introduction to Statistics; Subject: Mathematics; University: Colgate University; Term: Fall 2008;

Typology: Exams

Pre 2010

Uploaded on 08/18/2009

koofers-user-p2o
koofers-user-p2o 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
December 16, 2008
Final Exam Math 102 / Core 143 CX
Points are in parentheses. An unsimplified answer like 123.51 + 6/7 is usually worth more than
23.3, because it is easier to understand where it came from.
1. (25 points) An educational researcher wants to test whether students using a new curriculum
for teaching fractions learn the material better than those using an older curriculum. She
teaches 100 students, chosen at random, by the new method, and then administers a stan-
dardized exam. Long experience has shown that using the old method, students score an
average of 50 on this exam, but her students’ average score is 54, with an SD of 20.
(a) What kind of test (zor tor 2-sample zor χ2) should be used to decide whether the new
curriculum is better than the old?
(b) Find the value of the variable (zor tor χ2) used in this test.
(c) Find the P-value, i.e., the probability that, if the new curriculum were no better than
the old, her students would average 54 or more just by chance.
(d) Should we conclude that the new curriculum results in higher scores than the old?
(e) Suppose another researcher reaches a conclusion that is the opposite of ours. Should we
conclude that his data is “tainted” (i.e., obtained by faulty methods or even perhaps
falsified)?
2. (20 points) The cathat plant has flowers that are either pink, blue or lavender. A genetic
model holds that a single gene controls the color of the flowers, with pure forms (genotypes
p/p or b/b) having pink or blue flowers respectively; but neither is dominant, so that hybrids
(p/b or b/p) have lavender flowers. In an experiment, lavender-flowered cathats are crossed
with each other, and, out of 40 offspring selected at random, 15 have pink flowers, 22 have
lavender flowers and 3 have blue flowers.
(a) What kind of test would be used to decide whether these results are significant evidence
against the genetic model?
(b) In computing the value of the variable (zor tor χ2) used in this test, should we use
the numbers of plants with flowers of each color, or the percentages of each color? Or
doesn’t it matter?
(c) Compute the value of the variable.
(d) Is this data significant evidence against the model?
3. (15 points) An ecologist projects the level Nof a certain nutrient in a stream by using linear
regression on the amount Fof fertilizer used in a few fields near the stream. Suppose the
averages of Fand Nare 3 (tons) and 3.5 (parts per thousand) respectively, with standard
deviations of 2.5 and 2 respectively, and a correlation of 0.3.
(a) What is his regression equation for projecting Nfrom F?.
(b) Roughly how far off should the ecologist expect the projections to be as he makes them
using his equation in (a)?
pf3
pf4
pf5

Partial preview of the text

Download 7 Problems for Final Exam - Introduction to Statistics | MATH 102 and more Exams Statistics in PDF only on Docsity!

December 16, 2008

Final Exam — Math 102 / Core 143 CX

Points are in parentheses. An unsimplified answer like 12

3 .51 + 6/7 is usually worth more than 23.3, because it is easier to understand where it came from.

  1. (25 points) An educational researcher wants to test whether students using a new curriculum for teaching fractions learn the material better than those using an older curriculum. She teaches 100 students, chosen at random, by the new method, and then administers a stan- dardized exam. Long experience has shown that using the old method, students score an average of 50 on this exam, but her students’ average score is 54, with an SD of 20.

(a) What kind of test (z or t or 2-sample z or χ^2 ) should be used to decide whether the new curriculum is better than the old? (b) Find the value of the variable (z or t or χ^2 ) used in this test. (c) Find the P -value, i.e., the probability that, if the new curriculum were no better than the old, her students would average 54 or more just by chance. (d) Should we conclude that the new curriculum results in higher scores than the old? (e) Suppose another researcher reaches a conclusion that is the opposite of ours. Should we conclude that his data is “tainted” (i.e., obtained by faulty methods or even perhaps falsified)?

  1. (20 points) The cathat plant has flowers that are either pink, blue or lavender. A genetic model holds that a single gene controls the color of the flowers, with pure forms (genotypes p/p or b/b) having pink or blue flowers respectively; but neither is dominant, so that hybrids (p/b or b/p) have lavender flowers. In an experiment, lavender-flowered cathats are crossed with each other, and, out of 40 offspring selected at random, 15 have pink flowers, 22 have lavender flowers and 3 have blue flowers.

(a) What kind of test would be used to decide whether these results are significant evidence against the genetic model? (b) In computing the value of the variable (z or t or χ^2 ) used in this test, should we use the numbers of plants with flowers of each color, or the percentages of each color? Or doesn’t it matter? (c) Compute the value of the variable. (d) Is this data significant evidence against the model?

  1. (15 points) An ecologist projects the level N of a certain nutrient in a stream by using linear regression on the amount F of fertilizer used in a few fields near the stream. Suppose the averages of F and N are 3 (tons) and 3.5 (parts per thousand) respectively, with standard deviations of 2.5 and 2 respectively, and a correlation of 0.3.

(a) What is his regression equation for projecting N from F ?. (b) Roughly how far off should the ecologist expect the projections to be as he makes them using his equation in (a)?

(c) Suppose his regression equation was N =. 4 F + 5 (which it isn’t). If the nutrient level turns out to be 7 when 8 tons of fertilizer were used, what is the corresponding residue (or residual) relative to his projection?

  1. (15 points) A market survey of 400 randomly chosen households in a large city finds that 280 of them have an internet connection and the average amount of money spent per meal is $8, with a standard deviation of $4.

(a) Find a 95% confidence interval for the percentage of households in the city that have internet connections. (b) How many households must be surveyed so that the confidence interval requested in (a) will turn out only a third as large as the one you found in (a)? (c) Find a 85% confidence interval for the average amount of money spent per meal in the city.

  1. (20 points) On the modified boxplot, name the values (a), (b) and (c) (the dots) relative to the distribution. For the scatterplot (d), estimate the correlation. For (e), three of the his- tograms were generated from a very large survey of individuals’ numbers of years of schooling; one is the actual data, another is the averages of samples of 50 taken from the data, a third is averages of samples of 500 taken from the data, and the fourth is not related. Arrange the histograms in the order just described.
  2. (20 points) A pinochle deck has 48 cards, in the usual four suits but only six ranks (9-10-J- Q-K-A), two of each card. (Thus, for example, there are 8 aces and 12 spades.) In each case below, a card is selected only after the deck is shuffled.

(a) If two cards are selected without replacement, what is the probability that both are kings? (b) If one card is selected, what is the probability that it is either a king or a club, or both? (c) If five cards are selected with replacement, what is the probability that at least one is a club? (d) If five cards are selected with replacement, what is the probability that exactly three are clubs?

  1. (10 points) Relative to the article “Who’ll Stop the Rain?” by Sharon Begley: An article in the Irish Times shortly after the Olympics began reported that there was no rain on Beijing during the Friday opening ceremonies, but there was rain on Sunday. Does that mean Begley is wrong?

For a sample of size n from a population with average μ and standard deviation σ: EV of sum of scores in sample = nμ SE of sum = σ ·

n EV of average of sample = μ SE of average = σ/

n For significance tests (especially with small samples), approximate (bootstrap) population standard

deviation σ with sample standard deviation s = SD+^ =

√ ∑ (x − x)^2 n − 1

= (SD of sample)

√ (^) n n− 1.

(The null hypothesis will give a value to use for μ.) For large samples (n ≥ 30), s is close to σ. For confidence intervals, also approximate population average μ with sample average x.

Special case: Population is 0’s and 1’s (or yeses and nos, or ins and outs, or.. .), fraction of 1’s is p, for a sample of size n: EV of count = np SE of count =

√ p(1 − p) ·

n EV of % (or proportion) = p SE of % (or proportion) =

√ p(1 − p)/

n For CIs, approximate (bootstrap) population proportion p with sample proportion ˆp.

For use with confidence interval or t-test for significance on small (n < 30) samples: degrees of freedom = n − 1

k% confidence interval for the average of a population: Let zk denote the z-value for which k percent of the data is between −zk and zk. Then the CI is

x ± zk · (SE for average)

(Similar for “proportion” in place of “average”.)

For significance test for difference of μ’s in two populations: SE for difference of averages of 2 samples =

√ (SE of first)^2 + (SE of second)^2 EV of difference = 0 by H 0. (For more than 2 samples, use one-way ANOVA.)

For deciding significance of differences in frequency distributions among categories: χ^2 = ∑ [(observed − expected)^2 /expected] degrees of freedom: in “list” distributions, # in list −1; in “table” distributions, (# of rows −1) · (# of columns −1)

Normal table (Area between −z and z)

t-table: column head is P (t ≥ entry) χ^2 -table: column head is P (χ^2 ≥ entry)

 - 0.0 0.0 0.9 63.19 1.8 92.81 2.7 99.31 3.6 99. z Area(%) z Area(%) z Area(%) z Area(%) z Area(%) - 0.05 3.99 0.95 65.79 1.85 93.57 2.75 99.4 3.65 99. - 0.1 7.97 1 68.27 1.9 94.26 2.8 99.49 3.7 99. - 0.15 11.92 1.05 70.63 1.95 94.88 2.85 99.56 3.75 99. - 0.2 15.85 1.1 72.87 2 95.45 2.9 99.63 3.8 99. - 0.25 19.74 1.15 74.99 2.05 95.96 2.95 99.68 3.85 99. - 0.3 23.58 1.2 76.99 2.1 96.43 3 99.73 3.9 99. - 0.35 27.37 1.25 78.87 2.15 96.84 3.05 99.771 3.95 99. - 0.4 31.08 1.3 80.64 2.2 97.22 3.1 99.806 4 99. - 0.45 34.73 1.35 82.3 2.25 97.56 3.15 99.837 4.05 99. - 0.5 38.29 1.4 83.85 2.3 97.86 3.2 99.863 4.1 99. - 0.55 41.77 1.45 85.29 2.35 98.12 3.25 99.885 4.15 99. - 0.6 45.15 1.5 86.64 2.4 98.36 3.3 99.903 4.2 99. - 0.65 48.43 1.55 87.89 2.45 98.57 3.35 99.919 4.25 99. - 0.7 51.61 1.6 89.04 2.5 98.76 3.4 99.933 4.3 99. - 0.75 54.67 1.65 90.11 2.55 98.92 3.45 99.944 4.35 99. - 0.8 57.63 1.7 91.09 2.6 99.07 3.5 99.953 4.4 99. - 0.85 60.47 1.75 91.99 2.65 99.2 3.55 99.961 4.45 99. 
  • 1 1.00 3.08 6.31 12.71 31.82 63. df 25% 10% 5% 2.5% 1% 0.5%
  • 2 0.82 1.89 2.92 4.30 6.96 9.
  • 3 0.76 1.64 2.35 3.18 4.54 5.
  • 4 0.74 1.53 2.13 2.78 3.75 4.
  • 5 0.73 1.48 2.02 2.57 3.36 4.
  • 6 0.72 1.44 1.94 2.45 3.14 3.
  • 7 0.71 1.41 1.89 2.36 3.00 3.
  • 8 0.71 1.40 1.86 2.31 2.90 3.
  • 9 0.70 1.38 1.83 2.26 2.82 3.
  • 10 0.70 1.37 1.81 2.23 2.76 3.
  • 11 0.70 1.36 1.80 2.20 2.72 3.
  • 12 0.70 1.36 1.78 2.18 2.68 3.
  • 13 0.69 1.35 1.77 2.16 2.65 3.
  • 14 0.69 1.35 1.76 2.14 2.62 2.
  • 15 0.69 1.34 1.75 2.13 2.60 2.
  • 16 0.69 1.34 1.75 2.12 2.58 2.
  • 17 0.69 1.33 1.74 2.11 2.57 2.
  • 18 0.69 1.33 1.73 2.10 2.55 2.
  • 19 0.69 1.33 1.73 2.09 2.54 2.
  • 20 0.69 1.33 1.72 2.09 2.53 2.
  • 21 0.69 1.32 1.72 2.08 2.52 2.
  • 22 0.69 1.32 1.72 2.07 2.51 2.
  • 23 0.69 1.32 1.71 2.07 2.50 2.
  • 24 0.68 1.32 1.71 2.06 2.49 2.
  • 25 0.68 1.32 1.71 2.06 2.49 2. - 1 0.46 1.07 2.71 3.84 6. df 50% 30% 10% 5% 1% - 2 1.39 2.41 4.60 5.99 9. - 3 2.37 3.67 6.25 7.82 11. - 4 3.36 4.88 7.78 9.49 13. - 5 4.35 6.06 9.24 11.07 15. - 6 5.35 7.23 10.65 12.59 16. - 7 6.35 8.38 12.02 14.07 18. - 8 7.34 9.52 13.36 15.51 20. - 9 8.34 10.66 14.68 16.92 21. - 10 9.34 11.78 15.99 18.31 23. - 11 10.34 12.90 17.28 19.68 24. - 12 11.34 14.01 18.55 21.03 26. - 13 12.34 15.12 19.81 22,36 27. - 14 13.34 16.22 21.06 23.69 29. - 15 14.34 17.32 22.31 25.00 30. - 16 15.34 18.42 23.54 26.30 32. - 17 16.34 19.51 24.77 27.59 33. - 18 17.34 20.60 25.99 28.87 34. - 19 18.34 21.69 27.20 30.14 36. - 20 19.34 22.78 28.41 31.41 37.