Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Collecting Data in Reasonable Ways - Introduction to Statistical Reasoning | STA 1043, Study notes of Statistics

Statistics: Learning from Data Peck TestBank Material Type: Notes; Class: Intro to Statistical Reasoning; Subject: Statistics; University: University of Texas - San Antonio; Term: Fall 2013;

Typology: Study notes

2013/2014

Uploaded on 06/04/2014

jennifer-sonnen
jennifer-sonnen 🇺🇸

5

(1)

4 documents

1 / 313

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 01 - Collecting Data In Reasonable Ways
TRUE/FALSE
1. The entire collection of individuals or objects about which information is desired is called
a sample.
ANS: F REF: Section 1.1 AP
2. A study is an observational study if the investigator observes the behavior of a response
variable when one or more factors are manipulated.
ANS: F REF: Section 1.1 AP
3. By definition, a simple random sample of size n is any sample that is selected in a manner
to guarantee every individual in the population has an equal chance of selection.
ANS: F REF: Section 1.2 AP
4. Response bias can occur when responses are not actually obtained from all individuals
selected for inclusion in the sample.
ANS: F REF: Section 1.2 AP
5. Selection bias can occur if volunteers only are used in a study.
ANS: T REF: Section 1.2 AP
6. Stratified sampling is a sampling method that in no way involves simple random sampling.
ANS: F REF: Section 1.2 AP
7. Increasing sample size will generally eliminate bias in a sample.
ANS: F REF: Section 1.2 AP
8. As long as the sample size is small relative to the population, there is little practical
difference between sampling with replacement and sampling without replacement.
ANS: T REF: Section 1.2 AP
9. Clusters are non-overlapping subgroups of a population that have been identified as
homogeneous.
ANS: F REF: Section 1.2 AP
10. Random subpopulations of a population are called strata.
ANS: F REF: Section 1.2 AP
*AP and the Advanced Placement Program are registered trademarks of the College Entrance Examination
Board, which was not involved in the production of, and does not endorse, this product.
446
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Collecting Data in Reasonable Ways - Introduction to Statistical Reasoning | STA 1043 and more Study notes Statistics in PDF only on Docsity!

Chapter 01 - Collecting Data In Reasonable Ways

TRUE/FALSE

  1. The entire collection of individuals or objects about which information is desired is called a sample.

ANS: F REF: Section 1.1 AP

  1. A study is an observational study if the investigator observes the behavior of a response variable when one or more factors are manipulated.

ANS: F REF: Section 1.1 AP

  1. By definition, a simple random sample of size n is any sample that is selected in a manner to guarantee every individual in the population has an equal chance of selection.

ANS: F REF: Section 1.2 AP

  1. Response bias can occur when responses are not actually obtained from all individuals selected for inclusion in the sample.

ANS: F REF: Section 1.2 AP

  1. Selection bias can occur if volunteers only are used in a study.

ANS: T REF: Section 1.2 AP

  1. Stratified sampling is a sampling method that in no way involves simple random sampling.

ANS: F REF: Section 1.2 AP

  1. Increasing sample size will generally eliminate bias in a sample.

ANS: F REF: Section 1.2 AP

  1. As long as the sample size is small relative to the population, there is little practical difference between sampling with replacement and sampling without replacement.

ANS: T REF: Section 1.2 AP

  1. Clusters are non-overlapping subgroups of a population that have been identified as homogeneous.

ANS: F REF: Section 1.2 AP

  1. Random subpopulations of a population are called strata.

ANS: F REF: Section 1.2 AP

*AP and the Advanced Placement Program are registered trademarks of the College Entrance Examination Board, which was not involved in the production of, and does not endorse, this product.

446

  1. Blocking is a technique that can be used to filter out the effects of extraneous factors.

ANS: T REF: Section 1.3 AP

  1. A placebo is identical in appearance to the treatment of interest, but contains no active ingredients.

ANS: T REF: Section 1.3 AP

  1. In a well-designed experiment, the factors are confounded whenever possible.

ANS: F REF: Section 1.3 AP

  1. A treatment is any particular combination of values for the explanatory variables.

ANS: T REF: Section 1.3 AP

  1. Two factors are extraneous if their effects on the response variable cannot be distinguished from one another.

ANS: F REF: Section 1.3 AP

  1. Random assignment to treatments will guarantee groups that are exactly alike for experimental purposes.

ANS: F REF: Section 1.3 AP

  1. The method of control wherein an extraneous variable is held constant is called blocking.

ANS: F REF: Section 1.3 AP

  1. A control group provides a baseline for comparison to a treatment group.

ANS: T REF: Section 1.3 AP

  1. Random assignment of volunteers should result in comparable experimental groups.

ANS: T REF: Section 1.3 AP

  1. If the subjects as well as the person measuring the response are aware of the treatment assigned to the subject, only single-blinding is being used.

ANS: F REF: Section 1.3 AP

  1. Replicating in an experiment means that the number of subjects is greater than 1.

ANS: F REF: Section 1.3 AP

  1. A study is commissioned to determine whether piglets gain body mass more rapidly when a certain hormone is introduced into their feed. In January, a random sample of 40 10-week-old piglets receives a diet that includes the hormone. Sixteen weeks later, the average weight increase is determined. A similar experiment is conducted the following June with a random sample of 36 piglets, except the hormone is removed from the diet.

Which of the following do you think represents the most serious flaw in this study?

a. the presence of a confounding variable b. the presence of a sampling bias c. an absence of experimental control d. the presence of a measurement bias e. inadequate information regarding dietary needs of piglets

ANS: A REF: Section 1.2 AP

  1. Which of the following statements is false?

a. The explanatory variables are those variables that have values that are controlled by the experimenter. b. The response variable is the variable that the experimenter thinks may be affected by the explanatory variables. c. An experimental unit is the smallest entity to which a treatment is applied. d. Two variables are confounded if their effects on the response variable can be distinguished. e. An experiment in which experimental units are randomly assigned to treatments is called a completely randomized experiment.

ANS: D REF: Section 1.3 AP

  1. Which of the following best summarizes “nonresponse bias”?

a. a tendency for samples to differ from the corresponding population as a result of systematic exclusion of some part of the population b. a tendency for samples to differ from the corresponding population because data are not obtained from all individuals selected for inclusion in the sample c. a tendency for samples to differ from the corresponding population because the method of observation tends to produce values that differ from the true value d. a bias on the part of the researcher towards those who chose not to participate in a survey e. None of these describes nonresponse bias.

ANS: B REF: Section 1.2 AP

  1. A researcher wishes to study the relationship between the level of background noise and mental concentration. The treatment (noise level) will have three levels: no noise, low-intensity noise, and high-intensity noise. the subjects are to be divided into three groups, and each group is to receive one of the treatments. He has available to him a set of 60 female volunteers and a set of 90 male volunteers. What experimental design strategy would help him eliminate the introduction of gender as a confounding variable?

a. stratified sampling b. replication c. blocking d. systematic sampling e. double-blind trials

ANS: C REF: Section 1.3 AP

  1. For which of the following types of studies is it impossible to draw cause-and effect conclusions? I). Completely randomized experiments II). Randomized block experiments III). Observational studies

a. I only b. III only c. III only d. It is never possible to draw cause-and-effect conclusions. e. It is always possible to draw cause-and-effect conclusions.

ANS: C REF: Section 1.4 AP

  1. To estimate the proportion of students who plan to purchase tickets to an upcoming school fundraiser, a high school decides to sample 100 students as they register for the spring semester. There are 2000 students at the school. Which of the following sampling plans would result in a simple random sample?

a. Number the students from 1 to 2000 and then use random numbers to select 100 students. b. Survey the first 100 students to register. c. Randomly select 100 students from a list of the 950 female students at the school. d. Divide the students into early registrants (the first 1000 to register) and late registrants (the last 1000 to register). Use random numbers to identify 50 of the early registrants and 50 of the late registrants to survey. e. Select one of the first 20 students to register using a random number table and then select every 20th student to register thereafter.

ANS: A REF: Section 1.2 AP

  1. Briefly describe how populations and samples differ.

ANS: A population consists of an entire group about which some information is desired. A sample consists of only some part of this group that has been selected for study.

REF: Section 1.1 AP

  1. A friend of yours, who is not taking statistics, wonders why it is that anyone would choose to take a sample. "Obviously," she says, "you would get better information from a census." In a short paragraph, explain why it is that statisticians take samples rather than taking a census.

ANS: Although we may get better information from a census, it is usually far too costly and time consuming to contact every member of the population. A large random sample will be nearly as good for far less cost.

REF: Section 1.2 AP

  1. The most basic sampling method studied in statistics is the simple random sample (SRS). In your own words, what is the correct definition of a simple random sample of size n?

ANS: A simple random sample of size n is a sample that is selected from a population in a way that ensures that every different possible sample of the desired size has the same chance of being selected.

Note: It is important that students not only state that each person has the same chance of being chosen, but also each possible sample of size n has the same chance of being chosen.

REF: Section 1.2 AP

  1. The ZZZ chain of motels has a standard method of constructing their rooms to maximize the ease of parking for its customers. The rooms are arranged in adjacent buildings so that each customer can park outside the rented room. The layout for one of the hotels with 48 rooms located along a famous highway is diagrammed below:

Route 66

Building A

Building B

The manager would like to survey customers in 12 of his rooms (one randomly selected customer for each room selected in the sample) to assess their satisfaction with the motel services. The surveys will be placed on the customers' beds before they check in to the motel. In order to make the directions easy to follow, he elects to use systematic sampling.

(a) Explain how you would use random numbers to set up the systematic sampling process.

(b) Write a short paragraph for the maids that helps them carry out your method in part (a).

ANS: a) Since there are 48 units in the population and we want a sample of size 12, we want to choose every fourth room after randomly choosing one of the first four rooms to start with. If we are using a random digit table, we would go through the table until we get a number from 1 to 4. Then, we would keep adding 4 to that number until we get to the end of the hotel rooms. For example, if we come upon the number 3 first, we would survey the 3rd room, the 7th room, the 11th room, etc.

b) Dear Maids, when you are placing the surveys in the rooms, please follow the following procedure. Starting at the northwest corner of building A and moving east, place a survey in the third room, the seventh room, and every fourth room thereafter, moving back and forth along the four rows of rooms.

REF: Section 1.2 AP

ANS:

a) This is an example of response bias, since the awareness of their diagnosis may have caused them to change their response. It isn't non-response bias since they were able to obtain responses from the nurses and it isn't selection bias since they did not attempt to generalize to a larger population.

b) This is an example of non-response bias, since some of the children selected for the study were not able to participate after they died. It is not selection bias since the children were not left out on purpose and it isn't response bias since the researchers were unable to obtain responses in the first place.

REF: Section 1.2 AP

  1. Three methods for random sampling are: (a) simple random sampling, (b) stratified random sampling, and (c) cluster sampling. In a few sentences, discuss the similarities and differences among these sampling methods. Specifically, what sampling circumstances would lead you to choose each of these methods?

ANS: In simple random sampling, every individual and every possible sample of size n has an equal chance of being selected for the study. In stratified random sampling, the population is divided into non-overlapping homogeneous groups (called strata) and a simple random sample is selected from each strata. In cluster sampling, the population is divided into non-overlapping (preferably heterogeneous) groups called clusters and then a random sample of clusters is selected and every member of the selected clusters is studied.

Cluster sampling works best when the population is already divided into easily identifiable groups that are heterogeneous (i.e. each cluster can reasonably be assumed to be representative of the entire population). Stratified random sampling works best when there are easily identified groups in the population that are anticipated to have very different responses to the question of interest. Simple random sampling is best when neither of the circumstances listed above are present.

REF: Section 1.2 AP

  1. A pharmaceutical company wants to test its new drug that is designed to help balding men grow more hair. From their records of past customers, the company has data on about 5,000 men. The data contains information about the men's hair color, age, and percent of baldness. (A partial list is given below.) For their anticipated experiment, they want to take a sample that is representative of their customers.

Hair color Age (yrs) % Baldness Light 67 83 Dark 62 73 Light 41 25 Dark 52 50 Dark 43 14 Light 69 96 Dark 56 57 … … … Light 32 40

(a) Briefly describe how you would select a simple random sample of size n = 20 from this list of customers.

(b) Describe in a short paragraph why you might wish to use a stratified random sample.

ANS:

a) To select a simple random sample of size 20, we could number the subjects from 0001- and use a random digit table. On the table, we would look at sets of 4 digits until 20 numbers from 0001-5000 were selected (ignoring any repeats) and these would be the men selected.

b) If the researchers anticipate an association between any of the variables listed (hair color, age, or % baldness) and the response variable, they should stratify by that variable so that the sample they get will not over- or under-represent a subgroup which may respond differently than the population in general.

REF: Section 1.2 AP

  1. In evaluating an experiment, how would you determine if a variable is an explanatory variable or an extraneous variable?

ANS: In an experiment, the explanatory variable is the one that researchers manipulate in order to observe changes in the response variable. An extraneous variable is any other variable which is thought to affect the response variable, but is not of interest in the study.

REF: Section 1.3 AP

  1. In evaluating an experiment, how would you determine if a variable is an explanatory variable or a response variable?

ANS: In an experiment, an explanatory variable is one whose value is manipulated or determined by the experimenter, while a response variable is one whose value is measured at the end of the experiment.

REF: Section 1.3 AP

(c) During the course of the experiment the investigators were very careful with the wooden heron model not to come in contact with the glass of the aquaria or make noise in any other way. If they had been unsuccessful and their wooden heron made significant amounts of noise, how would that affect the interpretation of the results?

ANS:

(a) The explanatory variable is the presence/absence of the wooden heron model.

(b) The response variable is the antipredator behavior.

(c) The added noise would be a potential confounding variable. The tadpoles' response may be a startle response to a sudden change in their environment, and not specific to the detection of a predator.

PTS: 1 REF: Section 1.3 AP MSC: Section 1.

  1. A common practice of teachers is to have students exchange their quizzes and grade each others. In addition to decreasing the teacher's work load, the reduced time between quiz and feedback is thought to be a plus for learning. Your U.S. History teacher, aware of your statistical prowess, has asked you to design an experiment to test this theory. You have decided to use the mid-term exam (not graded by students) as your response measure. Your history teacher has three classes, one early in the morning, one at noon, and one late in the afternoon. Each class contains 30 students.

(a) Describe the treatments you will use in your experiment

(b) One possible confounding variable is the time of day, since students may be more alert at certain times of the day than at other times. Describe a method would you use to control this variable? (Unfortunately you cannot change the student schedules!)

(c) Do you feel the results of your experiment could be generalized to your statistics class? Why or why not?

ANS: (a) Individual pairs of students would be randomly assigned to "trade papers" or "not trade papers" treatment groups. The non-trading students' work would be graded by the teacher each day and given back the next day.

(b) Each class would be considered a "block." Within each block both treatments would be randomly assigned as indicated in part (a).

(c) The results might be generalizable to other classes, but w/o doing the experiment in those classes there is no evidence suggest one could generalize. Statistics and history seem like they might be different enough that, although they are both classes with homework, the subject matter might be learned differently and the instant checking of the quizzes might be less or more of a help in one class or the other.

REF: Section 1.3 AP

  1. In competitive sports, video recorders have been used more frequently in recent years. The idea behind the recorder is that coaches can replay training sessions for more effective feedback to the athlete. Some people believe video recording may make the athletes more nervous and actually decrease their performance. You have been asked to design an experiment to address this issue for competitive high school tennis players. You have decided to use the accuracy of tennis serves as your response variable, and the number of successful serves out of 100 as your performance measure. The subjects for your experiment are 60 high school male competitive tennis players of varying ability who have volunteered for the experiment.

(a) Describe the treatments in your experiment

(b) One possible confounding variable is the experience levels of the players. Explain how you would control this variable?

(c) Can the results of this experiment be generalized to all male tennis players? Why or why not?

ANS:

a) The two treatments will be: 1. The subject is recorded and 2. The subject is not recorded.

b) I would use blocking (pairing) to control the experience level of players. I would pair the two most experienced together as one block, the next two most experienced as the next block, and so on. Then the two members of each block would be randomly split into the two treatment groups. This way each treatment group should be roughly the equivalent with regard to experience level.

c) No, the results of this study should not be generalized to all male tennis players for at least two reasons. One, competitive tennis players are presumably more used to playing in front of crowds and would be less bothered by video recording than the typical player. Two, volunteers are not generally representative of any larger population.

Note: either reason should be sufficient to receive credit.

REF: Section 1.3 AP

  1. Suppose that two experiments were conducted to assess the effect of a new insect repellant. In Experiment A, a simple random sample was taken from the population of River City. In Experiment B, a simple random sample from a group of volunteers from the population of River City was used. The results of the experiments were the same: fewer insects landed on the arms that had been treated with the insect repellant. The volunteers were randomly assigned to the two treatments in both experiments.

(a) For each experiment, A and B, discuss whether one can legitimately infer a cause-and-effect relation between the use of the repellant and fewer insects landing from each of these experiments? Why or why not?

(b) For each experiment, A and B, discuss whether one can legitimately generalize to the population of River City from each of these experiments? Why or why not?

Chapter 02 - Graphical Methods for Describing Data Distributions

TRUE/FALSE

  1. A data set is discrete if the possible values are isolated points on the number line.

ANS: T REF: Section 2.1 AP

  1. A data set consisting of many observations of a single characteristic is a categorical data set.

ANS: F REF: Section 2.1 AP

  1. A data set is multivariate if it consists only of numeric variables.

ANS: F REF: Section 2.1 AP

  1. Frequency distributions can only be used with categorical data.

ANS: F REF: Section 2.2 AP

  1. The relative frequency for a particular category is the number of times the category appears in the data.

ANS: F REF: Section 2.2 AP

  1. Bar charts should be used with categorical data.

ANS: T REF: Section 2.2 AP

  1. Dotplots work best for small and moderate sized numerical data sets.

ANS: T REF: Section 2.3 AP

  1. An outlier is an unusually small or large data value.

ANS: T REF: Section 2.3 AP

  1. The quantity often gives a rough estimate of the appropriate number of intervals in a histogram.

ANS: T REF: Section 2.3 AP

  1. A curve with tails that decline more rapidly than the tails of a normal curve is called a heavy-tailed distribution.

ANS: F REF: Section 2.3 AP

  1. The density of a class can be calculated by multiplying the relative frequency of the class times the class width.

ANS: F REF: Section 2.3 AP

  1. For stem and leaf plots with single-digit leaves, commas must be used to separate the leaves.

ANS: F REF: Section 2.3 AP

  1. One advantage of histograms is that they may be used for large data sets.

ANS: T REF: Section 2.3 AP

  1. If the upper tail of a distribution stretches out farther than the lower tail, the distribution is negatively skewed.

ANS: F REF: Section 2.3 AP

  1. In a scatter plot, both the horizontal and vertical axes must be set at zero.

ANS: F REF: Section 2.4 AP

  1. A pie chart is most useful for numeric data.

ANS: F REF: Section 2.5 AP

MULTIPLE CHOICE

  1. A survey form solicited the following responses:

I). age of respondent II). gender of respondent III). level of job satisfaction (completely dissatisfied/somewhat dissatisfied/somewhat satisfied/completely satisfied) IV). annual income

Which of the responses represent categorical data?

a. I only b. II only c. III only d. II and III only e. All the responses are categorical.

ANS: D REF: Section 2.1 AP

  1. Which of the following variables are discrete?

I). the volume of liquid in a 16-ounce bottle of soda pop II). the percentage of males 18-25 who actively view online pornography III). the number of broken eggs in a package of a dozen eggs IV). a count of the statistics majors at a certain university

a. II only b. II and III only c. III and IV only d. I only e. All of these variables are discrete.

ANS: C REF: Section 2.1 AP

  1. A survey asked adult respondents how dependent they were on various electronic devices. The accompanying table summarizes the responses.

Response Relative Frequency Personal Computer

Cell Phone DVD Player

Cannot imagine living without

Would miss but could do without

Could definitely live without

Select a comparative bar chart that shows the distributions of responses for the three different technologies.

a. d.

b. e.

c.

ANS: B REF: Section 2.2 AP

  1. According to the stem-and-leaf display below, how many times does the number 58 appear in the data set?

a. 0 b. 1 c. 2 d. 5 e. 58

ANS: C REF: Section 2.3 AP