




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A chapter from a report by the U.S. General Accounting Office (GAO) on quantitative analysis in data analysis. It covers the principles of data analysis, the importance of understanding various data analysis methods, and the choice of appropriate methods based on the level of measurement, unit of analysis, distribution of variables, and completeness of data. The document also discusses the concepts of central tendency, spread, and association among variables, and provides examples of measures of association for nominal and ordinal variables.
Typology: Schemes and Mind Maps
1 / 132
This page cannot be seen from the preview
Don't miss anything!
United States General Accounting Office
June (^1992) Quantitative Data
Analysis: An
Introduction
Preface
We look forward to receiving comments from the readers of this paper. They should be addressed to Eleanor Chelimsky at 202-275-1854.
Werner Grosshans Assistant Comptroller General Office of Policy
Eleanor Chelimsky Assistant Comptroller General for Program Evaluation and Methodology
Page 2 GAO/PEMD-10.1.11 Quantitative Analysis
Page 3 GAO/PEMD-10.1.11 Quantitative Analysis
Contents
Histograms and Probabihty Distributions Sampling Distributions Population Parameters Point Estimates of Population Parameters Interval Estimates of Population Parameters
What Do We Mean by Causal Association? Evidence for Causation Limitations of Causal Analysis
When Plans Are Being Made for Data Collection 105 As the Data Analysis Begins 106 As the Results Are Produced and Interpreted 109
Table 1.2: Data Sheet for a Study of College Student 15 Loan Balances Table 1.3: Tabular Display of a Distribution Table 2.1: Distribution of Staff Turnover Rates in Long-Term Care Facilities Table 2.2: Three Common Measures of Central Tendency
Page 6 GAO/PEMD-10.1.11 Quantitative Analysis
Contents
Table 2.3 : Illustrative Measures of Central Tendency
Table 3.1: Measures of Spread Table 4.1: Data Sheet With Two Variables Table 4.2: Cross-Tabulation of Two Ordinal Variables
Table 4.3: Percentaged Cross-Tabulation of Two 51 Ordinal Variables Table 4.4: Cross-Tabulation of Two Nominal Variables
Table 4.5: Two Ordinal Variables Showing No Association
Table 5.1: Data Sheet for 100 Samples of College 78 Students Table 5.2: Point and Interval Estimates for a Set 85 of Samples
Figure 1.2: Two Distributions Figure 1.3: Histogram for a Nominal Variable Figure 3.1: Histogram of Hospital Mortality Rates Figure 3.2: Spread of a Distribution Figure 3.3: Spread in a Normal Distribution Figure 4.1: Scatter Plots for Spending Level and Test Scores Figure 4.2: Regression of Test Score on Spending Level Figure 4.3: Regression of Spending Level on Test Score Figure 4.4: Linear and Nonlinear Associations Figure 5.1: Frequency Distribution of Loan Balances Figure 5.2: Probability Distribution of Loan Balances Figure 5.3: Sampling Distribution for Mean Student Loan Balances
20 21 23 37 41 45 56
60
62
69 73
75
79
Figure 6.1: Causal Network 93
Page 6 GAO/PEMD-10.1.11 Quantitative Analysis
Chapter 1
Introduction
Data analysis is more than number crunching. It is an activity that permeates aLl stages of a study. Concern with analysis should (1) begin during the design of a study, (2) continue as detailed plans are made to collect data in different forms, (3) become the focus of attention after data are collected, and (4) be completed only during the report writing and reviewing stages.’
The basic thesis of this paper is that successful data analysis, whether quantitative or qualitative, requires (1) understanding a variety of data analysis methods; (2) planning data analysis early in a project and making revisions in the plan as the work develops; (3) understanding which methods will best answer the study questions posed, given the data that have been collected; and (4) once the analysis is finished, recognizing how weaknesses in the data or the analysis affect the conclusions that can properly be drawn. The study questions govern the overall analysis, of course. But the form and quality of the data determine what analyses can be performed and what can be inferred from them. This implies that the evaluator should think about data analysis at four junctures:
l when the study is in the design phase, l when detailed plans are being made for data collection,
‘Relative to GAO job phases, the first two checkpoints occur during the job design phase, the third occurs during data collection and analysis, and the fourth during product preparation. For detail on job phases see the Gcnrral Policy Manual, chapter 6, and the Project Manual, chapters 6.2, 6.3, and 6.4.
Page 8 GAO/PEMD-10.1.1 1 Quantitative Analysis
Chapter 1 Introduction
evaluators should decide what data will be needed to answer the questions and how they will analyze the data. In other words, they need to develop a data analysis plan. Determining the type and scope of data analysis is an integral part of an overall design for the study. (See the transfer paper entitled Designing Evaluations, listed in “Papers ln This Series.“) Moreover, confronting data collection and analysis issues at this stage may lead to a reformulation of the questions to ones that can be answered within the time and resources available.
planning the details of data collection, analysis must be considered again. Observations can be made and, if they are qualitative (that is, text data), converted to numbers in a variety of ways that affect the kinds of analyses that can be performed and the interpretations that can be made of the results. Therefore, decisions about how to collect data should be influenced by the analysis options in mind.
whether their expectations regarding data characteristics and quality have been met. Choice among possible analyses should be based partly on the nature of the data-for example, whether many observed values are small and a few are large and whether the data are complete. If the data do not fit the assumptions of the methods they had planned to
Page 9 GAO/PEMD-10.1 .I 1 Quantitative Analysis
Chapter 1 Introduction
effects of a program? Are the data “strong” enough to warrant a far-reaching recommendation? These questions and many others may occur to the evaluators and reviewers and good answers will come only if the analyst is “close” to the data but always with an eye on the overall study questions.
Most GAO statistical analyses address one or more of the four generic questions presented in table 1.1.
Each generic question is illustrated with several specific questions and examples of the kinds of statistics that might be computed to answer the questions. The specific questions are loosely based on past GAO studies of state bottle bills (U.S. General Accounting Office, 1977 and 1980).
Table 1.1: Generic Types of Quantitative Questions Gene& question ---
What is a typical value of the variable?
How much spread is there among the cases?
To what extent are two or more variables associated?
To what extent are there causal relationships among two or more variables?
Specific question At the state level, how many pounds of soft drink bottles (per unit of population) were typically returned annually How similar are the individual states’ return rates? What factors are most associated with high return rates: existence of state bottle bills? state economic conditions? state levels of environmental awareness? What factors cause high return rates: existence of state bottle bills? stale economic conditions? state level of environmental awareness?
Useful statistics Measures of central tendency (ch.2)
Measures of spread (ch. 3)
Measures of association (ch.
Measures of association (ch.4): Note that association is but one of three conditions necessary to establish causation (ch.6)
Page 11 GAO/PEMD-10.1.11 Quantitative Analysis
Chapter 1 Introduction
Bottle bills have been adopted by about nine states and are intended to reduce solid waste disposal problems by recycling. Other benefits can also be sought, such as the reduction of environmental litter and savings of energy and natural resources. One of GAO’s studies was a prospective analysis, intended to inform discussion of a proposed national bottle bill. The quantitative analyses were not the only relevant factor. For example, the evaluators had to consider the interaction of the merchant-based bottle bill strategy with emerging state incentives for curbside pickups or with other recycling initiatives sponsored by local communities. The quantitative results were, however, relevant to the overall conclusions regarding the likely benefits of the proposed national bottle bill.
The first three generic questions in table 1.1 are standard fare for statistical analysis. GAO reports using quantitative analysis usually include answers in the form of descriptive statistics such as the mean, a measure of central tendency, and the standard deviation, a measure of spread. In chapters 2, 3, and 4 of this paper, we focus on descriptive statistics for answering the questions.
To answer many questions, it is desirable to use probability samples to draw conclusions about populations. In chapter 5, we address the first three questions from the perspective of inferential statistics. The treatment there is necessarily brief, focused on point and interval estimation methods.
The fourth generic question, about causality, is more difficult to answer than the others. Providing a good answer to a causal question depends heavily upon the study design and somewhat advanced statistical methods; we treat the topic only lightly in chapter 6. Chapter 7 discusses some broad strategies for avoiding pitfalls in the analysis of quantitative data.
Page 12 GAWPEMD-10.1.11 Quantitative Analysis
E
Chapter 1 Introduction
variable would be gender and would be composed of the attributes female and male.4 Age might be another variable composed of the integer values from 0 to
It is convenient to refer to the variables we are especially interested in as response variables. For example, in a study of the effects of a government retraining program for displaced workers, employment rate might be the response variable. In trying to determine the need for an acquired immune deficiency syndrome (AIDS) education program in different segments of the U.S. population, evaluators might use the incidence of AIDS as the response variable. We usually also collect information on other variables with which we hope to better understand the response variables. We occasionally refer to these other variables as supplementary variables.
The data that we want to analyze can be displayed in a rectangular or matrix form, often called a data sheet (see table 1.2). To simplify matters, the individual persons, things, or events that we get information about. are referred to generically as cases. (The intensive study of one or a few cases, typically combining quantitative and qualitative data, is referred to as case study research. See the GAO transfer paper entitled Case Study Evaluations.) Traditionally, the rows in a data sheet correspond to the cases and the columns correspond to the variables
41nsteadof referring to the attribules of a variable, some prefer to say that the variable takes on a number of “values.” For example, the variable gender can have two values, male and female. Also, some statisticians use the expression “attribute sampling” in reference to probability sampling procedures for estimating proportions. Although attribute sampling is related to attribute as used in data analysis, the terminology is not perfectly parallel. See the discussion of attribute sampling in the Lransfer paper entitled Using Statislical Sampling, listed in “Papers in This Series.”
Page 14 GAO/PEMD-10.1.11 Quantitative Analysis
Chapter 1 Introduction
of interest. The numbers or words in the cells then correspond to the attributes of the cases.
Study of College Student Loan Balances Case 1 2 (^3) .-. .-... 4 5 6 7 8 9
10 11 12 13 14 15
23 19 21 . 21 22 21 20 22 21 19 20 20 23 24
Class Sophomore Freshman hi& Graduate Freshman Sophomore Sophomore Freshman Junior Sophomore Freshman Sophomore Sophomore Senior Senior
;w institution Private %bl,c Public Private Private Public Public Public Private Public Private Public Public Private Private
Loan balance $3, 1,
8, 1, 3, 2, 6, 2, 1, 2, 3, 2, 3, 5,
Table 1.2 shows 15 cases, college students, from a hypothetical study of student loan balances at higher education institutions. The first column shows an identification number for each case, and the rest of the columns indicate four variables: age of student, class, type of institution, and loan balance. Two of the variables, class and type of institution, are presently in text form. As will be seen shortly, they can be converted to numbers for purposes of quantitative analysis. Loan balance is the response variable and the others are supplementary.
Page 15 GAO/PEMD-10.1.11 Quantitative Analysis
Chapter 1 Introduction
arrayed into five classifications, such as greatly dislike, moderately dislike, indifferent to, moderately like, greatly like. Participants in a government program might be asked to categorize their views of the program offerings in this way. Although the ordinal level of measurement yields a ranking of attributes, no assumptions are made about the “distance” between the classifications. In this example, we do not assume that the difference between persons who greatly like a program offering and ones who moderately like it is the same as the difference between persons who moderately like the offering and ones who are indifferent to it. For data analysis, numbers are assigned to the attributes (for example, greatly dislike = -2, moderately dislike = -1, indifferent to = 0, moderately like = + 1, and greatly like = + 2>, but the numbers are understood to indicate rank order and the “distance” between the numbers has no meaning. Any other assignment of numbers that preserves the rank order of the attributes would serve as well. In the student loan study, class is an ordinal variable.
The attributes of an interval variable are assumed to be equally spaced. For example, temperature on the Fahrenheit scale is an interval variable. The difference between a temperature of 45 degrees and 46 degrees is taken to be the same as the difference between 90 degrees and 9 I degrees. However, it is not assumed that a go-degree object has twice the temperature of a 45-degree’object [meaning that the ratio of temperatures is not necessarily 2 to 1). The condition that makes the ratio of two observations uninterpretable is the absence of a true zero for the variable. In general, with variables measured at the interval level, it makes no sense to try to interpret the ratio of two observations.
The attributes of a ratio variable are assumed to have equal intervals and a true zero point. For example, age is a ratio variable because the negative age of a
Page 17 GAO/PEMD-10.1.11 Quantitative Analysis
Chapter 1 Introduction
person or object is not meaningful and, thus, the birth of the person or the creation of the object is a true zero point. With ratio variables, it makes sense to form ratios of observations and it is thus meaningful, for example, to say that a person of 90 years is twice as old as one of 45. In the study of student loans, age and loan balance are both ratio variables (the attributes are equally spaced and the variables have true zero points). For analysis purposes, it is seldom necessary to distinguish between interval and ratio variables so we usually lump them together and caII them interval-ratio variables.
under study-the entities that we want to say something about. Frequently, the appropriate units of analysis are easy to select. They follow from the purpose of the study. For example, if we want to know how people feel about the offerings of a government program, individual people would be the logical unit of analysis. In the statistical analysis, the set of data to be manipulated would be variables defined at the level of the individual.
However, in some studies, variables can potentially be analyzed at two or more Ievels of aggregation. Suppose, for example, that evaluators wished to evaluate a compensatory reading program and had acquired reading test scores on a large number of children, some who participated in the program and some who did not. One way to analyze the data would be to treat each child as a case.
But another possibility would be to aggregate the scores of the individual children to the classroom level. For example, they could compute the average scores for the children in each classroom that participated in their study. They could then treat each classroom as a unit, and an average reading test score would be an attribute of a classroom. Other variables,
Page 18 GAOfPEMD-10.1.11 Quantitative Analysis