Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Quantitative Analysis: Central Tendency, Spread, and Association, Schemes and Mind Maps of Quantitative Techniques

A chapter from a report by the U.S. General Accounting Office (GAO) on quantitative analysis in data analysis. It covers the principles of data analysis, the importance of understanding various data analysis methods, and the choice of appropriate methods based on the level of measurement, unit of analysis, distribution of variables, and completeness of data. The document also discusses the concepts of central tendency, spread, and association among variables, and provides examples of measures of association for nominal and ordinal variables.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/12/2022

kourtney
kourtney 🇺🇸

4.8

(6)

221 documents

1 / 132

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
I
United States General Accounting Office
GAO
Program Evaluation and Metktodology
Division
I
June 1992
Quantitative Data
Analysis: An
Introduction
GAO/PEMD-10.1.11
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1d
pf1e
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf61
pf62
pf63

Partial preview of the text

Download Quantitative Analysis: Central Tendency, Spread, and Association and more Schemes and Mind Maps Quantitative Techniques in PDF only on Docsity!

I

United States General Accounting Office

GAO

Program Evaluation and Metktodology

Division

I

June (^1992) Quantitative Data

Analysis: An

Introduction

GAO/PEMD-10.1.

Preface

We look forward to receiving comments from the readers of this paper. They should be addressed to Eleanor Chelimsky at 202-275-1854.

Werner Grosshans Assistant Comptroller General Office of Policy

Eleanor Chelimsky Assistant Comptroller General for Program Evaluation and Methodology

Page 2 GAO/PEMD-10.1.11 Quantitative Analysis

Page 3 GAO/PEMD-10.1.11 Quantitative Analysis

Contents

Chapter 5

Estimating

Population

Parameters

Histograms and Probabihty Distributions Sampling Distributions Population Parameters Point Estimates of Population Parameters Interval Estimates of Population Parameters

Chapter 6

Determining

Causation

What Do We Mean by Causal Association? Evidence for Causation Limitations of Causal Analysis

Chapter 7 102

Avoiding Pitfak In the Early^ Planning^ Stages^102

When Plans Are Being Made for Data Collection 105 As the Data Analysis Begins 106 As the Results Are Produced and Interpreted 109

Bibliography 112

Glossary 117

Contributors 127

Papers in This

Series

Tables Table^ 1.1: Generic^ Types of Quantitative^ Questions^11

Table 1.2: Data Sheet for a Study of College Student 15 Loan Balances Table 1.3: Tabular Display of a Distribution Table 2.1: Distribution of Staff Turnover Rates in Long-Term Care Facilities Table 2.2: Three Common Measures of Central Tendency

Page 6 GAO/PEMD-10.1.11 Quantitative Analysis

Contents

Table 2.3 : Illustrative Measures of Central Tendency

Table 3.1: Measures of Spread Table 4.1: Data Sheet With Two Variables Table 4.2: Cross-Tabulation of Two Ordinal Variables

Table 4.3: Percentaged Cross-Tabulation of Two 51 Ordinal Variables Table 4.4: Cross-Tabulation of Two Nominal Variables

Table 4.5: Two Ordinal Variables Showing No Association

Table 5.1: Data Sheet for 100 Samples of College 78 Students Table 5.2: Point and Interval Estimates for a Set 85 of Samples

figures Figure^ 1.^ I : Histogram^ of Loan Balances

Figure 1.2: Two Distributions Figure 1.3: Histogram for a Nominal Variable Figure 3.1: Histogram of Hospital Mortality Rates Figure 3.2: Spread of a Distribution Figure 3.3: Spread in a Normal Distribution Figure 4.1: Scatter Plots for Spending Level and Test Scores Figure 4.2: Regression of Test Score on Spending Level Figure 4.3: Regression of Spending Level on Test Score Figure 4.4: Linear and Nonlinear Associations Figure 5.1: Frequency Distribution of Loan Balances Figure 5.2: Probability Distribution of Loan Balances Figure 5.3: Sampling Distribution for Mean Student Loan Balances

20 21 23 37 41 45 56

60

62

69 73

75

79

Figure 6.1: Causal Network 93

Page 6 GAO/PEMD-10.1.11 Quantitative Analysis

Chapter 1

Introduction

Guiding

Principles

Data analysis is more than number crunching. It is an activity that permeates aLl stages of a study. Concern with analysis should (1) begin during the design of a study, (2) continue as detailed plans are made to collect data in different forms, (3) become the focus of attention after data are collected, and (4) be completed only during the report writing and reviewing stages.’

The basic thesis of this paper is that successful data analysis, whether quantitative or qualitative, requires (1) understanding a variety of data analysis methods; (2) planning data analysis early in a project and making revisions in the plan as the work develops; (3) understanding which methods will best answer the study questions posed, given the data that have been collected; and (4) once the analysis is finished, recognizing how weaknesses in the data or the analysis affect the conclusions that can properly be drawn. The study questions govern the overall analysis, of course. But the form and quality of the data determine what analyses can be performed and what can be inferred from them. This implies that the evaluator should think about data analysis at four junctures:

l when the study is in the design phase, l when detailed plans are being made for data collection,

  • after the data are collected, and l as the report is being written and reviewed.

‘Relative to GAO job phases, the first two checkpoints occur during the job design phase, the third occurs during data collection and analysis, and the fourth during product preparation. For detail on job phases see the Gcnrral Policy Manual, chapter 6, and the Project Manual, chapters 6.2, 6.3, and 6.4.

Page 8 GAO/PEMD-10.1.1 1 Quantitative Analysis

Chapter 1 Introduction

Designing the Study As policy-relevant^ questions^ are being^ formulated,

evaluators should decide what data will be needed to answer the questions and how they will analyze the data. In other words, they need to develop a data analysis plan. Determining the type and scope of data analysis is an integral part of an overall design for the study. (See the transfer paper entitled Designing Evaluations, listed in “Papers ln This Series.“) Moreover, confronting data collection and analysis issues at this stage may lead to a reformulation of the questions to ones that can be answered within the time and resources available.

Data Collection When^ evaluators^ have advanced^ to the point^ of

planning the details of data collection, analysis must be considered again. Observations can be made and, if they are qualitative (that is, text data), converted to numbers in a variety of ways that affect the kinds of analyses that can be performed and the interpretations that can be made of the results. Therefore, decisions about how to collect data should be influenced by the analysis options in mind.

Data Analysis After^ the data are collected,^ evaluators^ need to see

whether their expectations regarding data characteristics and quality have been met. Choice among possible analyses should be based partly on the nature of the data-for example, whether many observed values are small and a few are large and whether the data are complete. If the data do not fit the assumptions of the methods they had planned to

Page 9 GAO/PEMD-10.1 .I 1 Quantitative Analysis

Chapter 1 Introduction

effects of a program? Are the data “strong” enough to warrant a far-reaching recommendation? These questions and many others may occur to the evaluators and reviewers and good answers will come only if the analyst is “close” to the data but always with an eye on the overall study questions.

Quantitative

Questions

Addressed in the

Chapters of This

Most GAO statistical analyses address one or more of the four generic questions presented in table 1.1.

Paper

Each generic question is illustrated with several specific questions and examples of the kinds of statistics that might be computed to answer the questions. The specific questions are loosely based on past GAO studies of state bottle bills (U.S. General Accounting Office, 1977 and 1980).

Table 1.1: Generic Types of Quantitative Questions Gene& question ---

What is a typical value of the variable?

How much spread is there among the cases?

To what extent are two or more variables associated?

To what extent are there causal relationships among two or more variables?

Specific question At the state level, how many pounds of soft drink bottles (per unit of population) were typically returned annually How similar are the individual states’ return rates? What factors are most associated with high return rates: existence of state bottle bills? state economic conditions? state levels of environmental awareness? What factors cause high return rates: existence of state bottle bills? stale economic conditions? state level of environmental awareness?

Useful statistics Measures of central tendency (ch.2)

Measures of spread (ch. 3)

Measures of association (ch.

Measures of association (ch.4): Note that association is but one of three conditions necessary to establish causation (ch.6)

Page 11 GAO/PEMD-10.1.11 Quantitative Analysis

Chapter 1 Introduction

Bottle bills have been adopted by about nine states and are intended to reduce solid waste disposal problems by recycling. Other benefits can also be sought, such as the reduction of environmental litter and savings of energy and natural resources. One of GAO’s studies was a prospective analysis, intended to inform discussion of a proposed national bottle bill. The quantitative analyses were not the only relevant factor. For example, the evaluators had to consider the interaction of the merchant-based bottle bill strategy with emerging state incentives for curbside pickups or with other recycling initiatives sponsored by local communities. The quantitative results were, however, relevant to the overall conclusions regarding the likely benefits of the proposed national bottle bill.

The first three generic questions in table 1.1 are standard fare for statistical analysis. GAO reports using quantitative analysis usually include answers in the form of descriptive statistics such as the mean, a measure of central tendency, and the standard deviation, a measure of spread. In chapters 2, 3, and 4 of this paper, we focus on descriptive statistics for answering the questions.

To answer many questions, it is desirable to use probability samples to draw conclusions about populations. In chapter 5, we address the first three questions from the perspective of inferential statistics. The treatment there is necessarily brief, focused on point and interval estimation methods.

The fourth generic question, about causality, is more difficult to answer than the others. Providing a good answer to a causal question depends heavily upon the study design and somewhat advanced statistical methods; we treat the topic only lightly in chapter 6. Chapter 7 discusses some broad strategies for avoiding pitfalls in the analysis of quantitative data.

Page 12 GAWPEMD-10.1.11 Quantitative Analysis

E

Chapter 1 Introduction

variable would be gender and would be composed of the attributes female and male.4 Age might be another variable composed of the integer values from 0 to

It is convenient to refer to the variables we are especially interested in as response variables. For example, in a study of the effects of a government retraining program for displaced workers, employment rate might be the response variable. In trying to determine the need for an acquired immune deficiency syndrome (AIDS) education program in different segments of the U.S. population, evaluators might use the incidence of AIDS as the response variable. We usually also collect information on other variables with which we hope to better understand the response variables. We occasionally refer to these other variables as supplementary variables.

The data that we want to analyze can be displayed in a rectangular or matrix form, often called a data sheet (see table 1.2). To simplify matters, the individual persons, things, or events that we get information about. are referred to generically as cases. (The intensive study of one or a few cases, typically combining quantitative and qualitative data, is referred to as case study research. See the GAO transfer paper entitled Case Study Evaluations.) Traditionally, the rows in a data sheet correspond to the cases and the columns correspond to the variables

41nsteadof referring to the attribules of a variable, some prefer to say that the variable takes on a number of “values.” For example, the variable gender can have two values, male and female. Also, some statisticians use the expression “attribute sampling” in reference to probability sampling procedures for estimating proportions. Although attribute sampling is related to attribute as used in data analysis, the terminology is not perfectly parallel. See the discussion of attribute sampling in the Lransfer paper entitled Using Statislical Sampling, listed in “Papers in This Series.”

Page 14 GAO/PEMD-10.1.11 Quantitative Analysis

Chapter 1 Introduction

of interest. The numbers or words in the cells then correspond to the attributes of the cases.

Study of College Student Loan Balances Case 1 2 (^3) .-. .-... 4 5 6 7 8 9

10 11 12 13 14 15

Age

23 19 21 . 21 22 21 20 22 21 19 20 20 23 24

Class Sophomore Freshman hi& Graduate Freshman Sophomore Sophomore Freshman Junior Sophomore Freshman Sophomore Sophomore Senior Senior

;w institution Private %bl,c Public Private Private Public Public Public Private Public Private Public Public Private Private

Loan balance $3, 1,

8, 1, 3, 2, 6, 2, 1, 2, 3, 2, 3, 5,

Table 1.2 shows 15 cases, college students, from a hypothetical study of student loan balances at higher education institutions. The first column shows an identification number for each case, and the rest of the columns indicate four variables: age of student, class, type of institution, and loan balance. Two of the variables, class and type of institution, are presently in text form. As will be seen shortly, they can be converted to numbers for purposes of quantitative analysis. Loan balance is the response variable and the others are supplementary.

Page 15 GAO/PEMD-10.1.11 Quantitative Analysis

Chapter 1 Introduction

arrayed into five classifications, such as greatly dislike, moderately dislike, indifferent to, moderately like, greatly like. Participants in a government program might be asked to categorize their views of the program offerings in this way. Although the ordinal level of measurement yields a ranking of attributes, no assumptions are made about the “distance” between the classifications. In this example, we do not assume that the difference between persons who greatly like a program offering and ones who moderately like it is the same as the difference between persons who moderately like the offering and ones who are indifferent to it. For data analysis, numbers are assigned to the attributes (for example, greatly dislike = -2, moderately dislike = -1, indifferent to = 0, moderately like = + 1, and greatly like = + 2>, but the numbers are understood to indicate rank order and the “distance” between the numbers has no meaning. Any other assignment of numbers that preserves the rank order of the attributes would serve as well. In the student loan study, class is an ordinal variable.

The attributes of an interval variable are assumed to be equally spaced. For example, temperature on the Fahrenheit scale is an interval variable. The difference between a temperature of 45 degrees and 46 degrees is taken to be the same as the difference between 90 degrees and 9 I degrees. However, it is not assumed that a go-degree object has twice the temperature of a 45-degree’object [meaning that the ratio of temperatures is not necessarily 2 to 1). The condition that makes the ratio of two observations uninterpretable is the absence of a true zero for the variable. In general, with variables measured at the interval level, it makes no sense to try to interpret the ratio of two observations.

The attributes of a ratio variable are assumed to have equal intervals and a true zero point. For example, age is a ratio variable because the negative age of a

Page 17 GAO/PEMD-10.1.11 Quantitative Analysis

Chapter 1 Introduction

person or object is not meaningful and, thus, the birth of the person or the creation of the object is a true zero point. With ratio variables, it makes sense to form ratios of observations and it is thus meaningful, for example, to say that a person of 90 years is twice as old as one of 45. In the study of student loans, age and loan balance are both ratio variables (the attributes are equally spaced and the variables have true zero points). For analysis purposes, it is seldom necessary to distinguish between interval and ratio variables so we usually lump them together and caII them interval-ratio variables.

Unit of Analysis Units^ of analysis^ are the persons,^ things,^ or events

under study-the entities that we want to say something about. Frequently, the appropriate units of analysis are easy to select. They follow from the purpose of the study. For example, if we want to know how people feel about the offerings of a government program, individual people would be the logical unit of analysis. In the statistical analysis, the set of data to be manipulated would be variables defined at the level of the individual.

However, in some studies, variables can potentially be analyzed at two or more Ievels of aggregation. Suppose, for example, that evaluators wished to evaluate a compensatory reading program and had acquired reading test scores on a large number of children, some who participated in the program and some who did not. One way to analyze the data would be to treat each child as a case.

But another possibility would be to aggregate the scores of the individual children to the classroom level. For example, they could compute the average scores for the children in each classroom that participated in their study. They could then treat each classroom as a unit, and an average reading test score would be an attribute of a classroom. Other variables,

Page 18 GAOfPEMD-10.1.11 Quantitative Analysis