



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A lab guide for students on the topics of confidence intervals and hypothesis testing. It covers the calculation of confidence intervals for the mean, hypothesis tests, and the concept of p-value. Problem sets for students to practice these concepts.
Typology: Lab Reports
1 / 5
This page cannot be seen from the preview
Don't miss anything!
www.nmt.edu/~olegm/382/labs/Lab12.pdf
Note: the menus and other things you will read or type on the computer are in italics. Attach the printouts whenever needed.
Confidence intervals (C.I.’s) and hypothesis tests are cornerstones of statistical inference. In this Lab, we will discuss C.I.’s for one-sample problems. Also, we will take a look at hypothesis tests, and a general discussion of the p-value.
The confidence interval is usually of the format
point estimate ± margin of error
The actual calculation depends on the problem at hand, but Minitab can take care of it for you. In the menu Stat → Basic Statistics there is a battery of options for 1- and 2-sample tests.
Central Limit Theorem tell us that the mean X for a sample of size n has approximately Normal distribution with the mean μ and variance σ^2 /n.
For example, when σ is known, and n is large, the following is a (100)(1−α)%- C.I. for the mean μ
X ± zα/ 2
σ √ n
where zα/ 2 is a (1 − α/2) quantile of the standard Normal (Z) distribution.
The C.I. for the mean gives us a range of “plausible” values for μ. The in- terpretation of, say, 95% C.I. is that in 95% cases it will contain the “true” (unknown) population mean μ. Of course we could opt for higher confidence, say 99%, but we’ll pay the price with a wider interval. Another way to increase the precision of C.I. is, of course, increase the sample size n for your study.
Problem 1
(a) In the file sample1.txt, based on the sample in C1, compute 95% and 99% C.I.’s for the mean. (Assume σ = 1.) Which one is wider and why?
(b) Based on the sample in C2, compute 95% C.I. for the mean. Compare with the part (a) 95% C.I. Which one is wider and why?
(c) Can you compute a 100% C.I.? Why or why not?
(d) Do a simulation study of generating 100 90% C.I.’s from a Normal pop- ulation, mean 0 and variance 1. First, obtain 100 samples (rows) of size
With σ unknown, we would use the t-distribution confidence interval; however, when n is small we need to also take care that the underlying distribution is close to Normal.
Problem 2
The data below are the survival times (in hours) of 72 guinea pigs after they were injected with a given dose of tubercule bacilli in a medical experiment. The data are from the article “Acquisition of resistance of guinea pigs in- jected with different doses of virulent tubercule bacilli,” by T. Bjerkedal in the American Journal of Hygiene, (1960), pp. 130-148.
43 45 53 56 56 57 58 66 67 73 74 79 80 80 81 81 81 82 83 83 84 88 89 91 91 92 .... (see file bacilli.txt)
(a) Obtain a 95% t-C.I. for the mean survival time.
(b) Check the normality of the data using Normal probability plot.
(c) Take the log of survival times and find a 95% t-C.I. for mean log survival time. Check the normality of log survival times.
(d) Translate your C.I. from part (c) back into unlogged time scale and compare with part (a). Which of the C.I.’s do you believe more?
(e) Nonparametric alternative When normality is violated, we might use a nonparametric procedure, i.e. the one that does not rely on a particular distribution assumption for it to work. Compute a 95% Wilcoxon C.I. for the median survival time using Stat → Nonparametric → 1-sample Wilcoxon. Compare it to the one from the log transformation, part (d).
data. That is, P (A | B) 6 = P (B | A), generally!
(2) the extremely small p-value does not mean, in itself, that the effect is really significant. Think about a petty crime (like stealing office supplies) very well documented; and a very serious crime (like billion-dollar tax evasion) without witnesses. For example, your distribution might be different from Normal ever so slightly, but if your sample size n is large, the difference will almost certainly be detected.
Problem 3. Fishing for evidence
We will conduct another simulation study to show how we can get “significant” results from some completely innocent data. Generate 15 columns and n = 40 rows of standard Normal variables. Note that the variables are generated to be statistically inependent. Obtain the correlation table. The p-values for the test H 0 : “variable i is not related to variable j” are contained below the correlation values. Pick α = 0.10 and record all the pairs of variables for which we “found” a significant difference, that is, for which p-values were below α. How many have you found? What was the highest absolute value of between-pair correlation?
Explanation: The false rejection rate for this test is equal to α. With
15(15 − 1) 2
pairwise comparisons, we are bound to get an average of 0.10(105) = 10. 5 “significant” results. In order to adjust for multiple comparisons, we should use α∗^ = α/105 level for each pairwise comparison. This is known as Bonferroni correction.
A natural connection between 100(1 − α)% C.I. and a hypothesis test with the level α exists. The value μ 0 for the null hypothesis H 0 : μ = μ 0 will be accepted whenever it falls into the C.I.
Problem 4
In the file SAT2sample.txt, the 3 columns of data represent
C1: Total scores on the math and reading portions of the SAT of a random sample of 49 students from a large city after their first take of the 2007 SAT. C2: Scores of a different random sample of 49 students from the same large city after their second take of the SAT. C3: Scores of another random sample of 49 students from the same city who
had taken a 1 month course from a local SAT instruction school before taking the test for the first time.
(a) Perform a 2-sample t-test to determine if there is a significant difference between the scores of first-time takers and the scores of the students taking the test the first time after taking the 1-week course. Pick the level of significance you think is reasonable. Aslo, find and interpret the confidence interval.
(b) Perform a 2-sample t-test to determine if there is a significant difference in the scores of second-time takers and the scores of the students taking the test the first time after taking the 1-week course. Find and interpret the confidence interval.
(c) The company hiring you is especially interested in knowing if their one- week course gives a significant improvement in scores over students tak- ing the test for the first time with no additional preparation. Write a brief report to the company describing your findings. Try to use the language that the company executives would understand.
Problem 5
The following determinations of the parallax of the Sun (the angle spanned by the Earth’s radius as if it were viewed and measured from the Sun’s surface) were made in 1761 by noted astronomer James Short. The units are in seconds of a degree (1/360 degree).
8.50 8.06 8.65 9.71 8.80 7.99 8.50 8.43 8. 8.50 8.40 8.58 7.33 8.44 8.71 8.28 8.82 8. ...
The complete data set is at parallax.txt.
With a careful determination of the radius of the Earth and a precise value of the parallax, the average distance of the Earth to the Sun can be obtained. The currently accepted value of the parallax is 8.798. Within this framework, define
μ = the population mean of all potential measurements of the parallax using Short’s device.
We are interested in whether Short’s data agree that μ could be equal to its currently accepted value, that is, we wish to test whether μ = 8.798.
(a) Describe the distribution of the parallax measurements. Be complete.
(b) Perform the standard z-test on these data, at the 5% level, and construct a 95% C.I. for μ. Interpret the results, given the question of interest.