






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Information about a statistics exam held at the national university of ireland, galway during the spring semester of 2007-2008. Details about the exam such as the subjects covered, the professors involved, and the time allowed. It also includes various statistical problems that students were expected to solve, including calculating means, medians, modes, and standard deviations, as well as interpreting histograms and performing hypothesis tests.
Typology: Exams
1 / 10
This page cannot be seen from the preview
Don't miss anything!
Professor E.M. Scott Professor J. P. Hinde, Paul Wilson, M.A., H.D.E.
Time allowed: Three Hours. Answer any four questions. All questions, but not necessarily parts therein, carry equal marks. A list of Formulæ is attached. All required statistical distribution tables are to be found in the supplied mathematical (“log”) tables.
Question One is on the next page
117.0 118.5 112.7 119.9 111.7 129.9 114.6 107.7 110.4 112. 111.5 110.2 111.7 114.2 111.5 111.2 113.8 110.2 115.3 110. 101.2 110.6 118.4 127.9 112.3 120.7 113.1 113.9 116.4 114. 122.8 125.3 121.1 124.4 129.3 116.0 122.6 110.5 110.4 115. 112.8 119.4 113.1 114.9 119.1 113.4 112.6 112.5 110.4 125. 114.0 112.2 113.2 108.0 117.5 111.1 111.8 116.5 117.6 115. 120.8 110.7 112.5 112.0 110.9 113.6 113.4 115.1 110.4 120.
Illustrate the data with a histogram with intervals 100.0–110.0, 110.0–115.0, 115.0–120.0, 120 .0–125.0 and 125.0–130.0. (It is not necessary to use graph paper, but you may do so if you wish.) (b) Find the mean, median and mode (if any) and (sample) standard deviation of the following: 112 136 143 166 101 111 221 226 141 107 (You may use a calculator). (c) A survey of ten people was taken. In this survey the people were asked to state their sex (Male = 1, Female = 2), their preferred method of exercise (1 = Gym, 2 = Walking/Jogging, 3 = Swimming, 4 = Playing Sport, 5 = None of these), their mean annual expenditure on ”keeping fit”, and to indicate, on a scale from 0 – 4, the extent to which they were concerned about their health (0 = not worried, 4 = extremely worried). The results of the survey are given below. i. State whether each variable (column heading) is a nominal, ordinal, discrete interval or continuous interval variable. ii. Calculate an appropriate measure of central tendency (average) for each variable.
Sex Pref Expenditure Concern 1 2 100 3 1 1 400 1 2 1 140 4 1 3 511 2 1 3 125 2 1 5 980 1 2 1 380 2 2 2 200 3 2 2 212 1 1 2 170 0
(d) Events A and B are independent. P (A) = 0.3, P (B) = 0.2. Calculate: i. the probability that neither A nor B occurs, ii. the probability that B but not A occurs.
Question Two is on the next page
Age 16–20 21–35 36–50 51–60 61– Quantity 6 28 30 32 4
At a level of significance of α = 0.05, do the results of the survey provide evidence that the director is correct? Your answer should include reference to appropriate hypothesis tests and assumptions underlying the test you use. (b) 300 adults were classified according to their sex and views on genetically modified food. The results are summarised in the table below. In Favour No Opinion Against Male 36 61 63 Female 26 45 69
Is there evidence, at α = 0.05, of a relationship between the sex of a person and his or her views on genetically modified food? Your answer should include reference to appropriate hypothesis tests and assumptions underlying the test you use.
This question is continued on the next page
(c) Researchers wish to investigate if the heights of a certain population of males is normally distributed. A sample of the heights of 100 such males is taken, and the mean sample height is 70 inches, with a sample standard deviation of 2 inches. The minitab output for an analysis of this problem is given below. (Note that 10 people were observed with heights in the interval 66.5-67.5 etc. The “66” category may be taken to include all heights less than 66.5 inches, and the “74” category all heights greater than 73.5 inches.) The minitab output from a χ^2 goodness of fit test is given below.
Chi-Square Goodness-of-Fit Test for Observed Counts in Variable: Count
Using category names in Height
Historical Test Contribution Category Observed Counts Proportion Expected to Chi-Sq 66 8 0.040 0.040040 4.0040 3. 67 10 0.066 0.066066 6.6066 1. 68 7 0.121 0.121121 12.1121 2. 69 17 0.174 0.174174 17.4174 0. 70 19 0.197 0.197197 19.7197 0. 71 19 0.174 0.174174 17.4174 0. 72 10 0.121 0.121121 12.1121 0. 73 10 0.066 0.066066 6.6066 1. 74 0 0.040 0.040040 4.0040 4.
N DF Chi-Sq P-Value 100 8 14.1840 0.
i. Based upon p-value in the above output, may we, at α = 0.05, reject the null hypothesis that the population heights are normally distributed? Justify your answer. ii. Here the “expected values” are based upon “historical counts” that have been calculated by the researchers, based upon the data following a N (70, 4) distribution. Given infor- mation in the question, why should we conclude that the number of degrees of freedom stated in the output is in fact incorrect? iii. Based upon the correct number of degrees of freedom, may we, at α = 0.05, reject the null hypothesis that the population heights are normally distributed? Justify your answer. iv. Is there anything else in the minitab output that might raise (slight) concern about the validity of the analysis?
Question Four is on the next page
(c) Nine people had their blood sugar level recorded before (Glucose1) and after (Glucose2) undertaking a strict diet. It is wished to investigate whether this diet will affect blood glucose levels. The data were analysed using a paired t-test. The minitab output is presented in Figure 2
Figure 2: Blood Sugar Levels Before and After Diet
Results for: Bloodsugar2.MTW
Paired T-Test and CI: Glucose1, Glucose
Paired T for Glucose1 - Glucose
N Mean StDev SE Mean Glucose1 9 90.67 26.72 8. Glucose2 9 89.44 27.13 9.
Difference 9 1.22 6.08 2.
95% CI for mean difference: (-3.45, 5.89) T-Test of mean difference = 0 (vs not = 0): T-Value = 0.60 P-Value = 0.
i. State the null and alternative hypotheses that are being tested above, ii. Do we reject this null hypothesis? Justify and interpret your answer. iii. It would be possible to analyse these data using an independent samples t-test. Why is the above method preferable in this case?
Question Five is on the next page
A. residual, B. least squares regression line? ii. What is meant by the term correlation? Your answer should be illustrated by diagrams and include brief explanations of what is meant by strong/weak and positive/negative correlation. iii. Explain how it is possible that two variables may have a correlation coefficient close to one, indicating strong correlation, but the p-value associated with the correlation coefficient may also be large, indicating that it unsafe to assume a non–zero correlation. (b) The rate of flow of a stream at a given point, its depth at that point (Depth1) and at a another point (Depth2) was recorded. The results are shown below. Flow 0. 636 0. 319 0. 734 1. 327 0. 040 1. 300 7. 350 5. 890 3. 102 1. 824 Depth1 0. 34 0. 29 0. 28 0. 42 0. 34 0. 45 0. 76 0. 73 0. 51 0. 40 Depth2 0. 96 0. 92 0. 90 0. 85 0. 84 0. 84 0. 82 0. 80 0. 83 0. 86 minitab regression analyses of Flow vs. Depth1 and Flow versus Depth2 are presented below. For the moment, assume that these analyses are suitable for the data. i. Which of these models would you prefer? Justify your answer. ii. Interpret the various p-values given in the output.
The regression equation is Flow = - 4.15 + 14.2 Depth
Predictor Coef SE Coef T P Constant -4.1546 0.5921 -7.02 0. Depth1 14.174 1.234 11.49 0.
S = 0.629326 R-Sq = 94.3% R-Sq(adj) = 93.6%
The regression equation is Flow = 30.0 - 32.2 Depth
Predictor Coef SE Coef T P Constant 30.00 11.68 2.57 0. Depth2 -32.19 13.53 -2.38 0.
S = 2.01454 R-Sq = 41.4% R-Sq(adj) = 34.1%
This question is continued on the next page
xi n
s^2 =
(xi−x¯)^2 n− 1 =
x^2 i −n¯x^2 n− 1
σ^2 =
(xi−x¯)^2 N =
x^2 i −N x¯^2 N
P (X = r) =
n r
pr^ q(n−r)
μ = np σ^2 = npq
−λλr r! μ = λ σ^2 = λ
σ^22 n 2
n 1 +^ n^12 where: s^2 p = (n^1 −1)s
(^21) +(n 2 −1)s (^22) n 1 +n 2 − 2
χ^2 obs =
∑ (^) (obs-exp) 2 exp