Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Guido's Guide to PROC FREQ – A Tutorial for Beginners ..., Summaries of Statistics

ABSTRACT. PROC FREQ is an essential procedure within BASE SAS® used primarily for counting, displaying and analyzing categorical type data.

Typology: Summaries

2021/2022

Uploaded on 09/27/2022

houhou
houhou 🇺🇸

4

(7)

269 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Guido’s Guide to PROC FREQ – A Tutorial for Beginners Using
the SAS® System
Joseph J. Guido, University of Rochester Medical Center, Rochester, NY
ABSTRACT
PROC FREQ is an essential procedure within BASE SAS® used primarily for counting, displaying
and analyzing categorical type data. It is such a powerful procedure that you will find it
documented not only in BASE SAS but also in SAS®/STAT documentation. This Beginning
Tutorial will touch upon both the uses of PROC FREQ in BASE SAS and SAS/STAT. Don’t
worry, I promise that you do not need a statistical background to understand this procedure. This
tutorial will teach you the basics of PROC FREQ and give you a framework to build upon and
extend your knowledge of the SAS System.
INTRODUCTION
According to the paper by Carrie Mariner entitled “Answering the Right Question with the Right
PROC”, PROC FREQ answers the question How many?, PROC MEANS answers the question
How much? and PROC REPORT will answer Can you produce a report that looks like this?
We are going to answer the question “How many?” as we work through some basic PROC
FREQ examples in this paper.
The Version 9 SAS® Procedure Manual states, “The FREQ procedure produces one-way to
n-way frequency and cross tabulation (contingency) tables. For two-way tables, PROC FREQ
computes tests and measures of association. For n-way tables, PROC FREQ does stratified
analysis, computing statistics within, as well as across, strata. Frequencies and statistics can also
be output to SAS data sets.”
We will begin with the very basics and consider the one-way frequency tables. These are used
by SAS Programmers and Analysts all the time without giving the matter a second thought. What
I mean is that if one is counting, doing error checking of the data or categorizing data, then PROC
FREQ is the usual choice. The variables in the dataset can be either character or numeric.
The following statements are used in PROC FREQ according to the SAS® Procedure Manual:
PROC FREQ < options > ;
TABLES requests < / options > ;
BY variables ;
WEIGHT variable < / option > ;
TEST options ;
EXACT statistic-options < / computation-options > ;
OUTPUT < OUT=SAS-data-set > options ;
RUN;
I have underlined the 4 statements in PROC FREQ which I will be discussing in this paper. The
PROC FREQ statement is the only required statement for the FREQ procedure. If you specify the
following statements, PROC FREQ produces a one-way frequency table for each variable in the
most recently created data set.
PROC FREQ;
RUN;
1
Foundations & FundamentalsNESUG 2007
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Guido's Guide to PROC FREQ – A Tutorial for Beginners ... and more Summaries Statistics in PDF only on Docsity!

Guido’s Guide to PROC FREQ – A Tutorial for Beginners Using

the SAS® System

Joseph J. Guido, University of Rochester Medical Center, Rochester, NY

ABSTRACT

PROC FREQ is an essential procedure within BASE SAS® used primarily for counting, displaying and analyzing categorical type data. It is such a powerful procedure that you will find it documented not only in BASE SAS but also in SAS®/STAT documentation. This Beginning Tutorial will touch upon both the uses of PROC FREQ in BASE SAS and SAS/STAT. Don’t worry, I promise that you do not need a statistical background to understand this procedure. This tutorial will teach you the basics of PROC FREQ and give you a framework to build upon and extend your knowledge of the SAS System.

INTRODUCTION

According to the paper by Carrie Mariner entitled “Answering the Right Question with the Right PROC”, PROC FREQ answers the question How many? , PROC MEANS answers the question How much? and PROC REPORT will answer Can you produce a report that looks like this? We are going to answer the question “ How many?” as we work through some basic PROC FREQ examples in this paper.

The Version 9 SAS® Procedure Manual states, “The FREQ procedure produces one-way to n -way frequency and cross tabulation (contingency) tables. For two-way tables, PROC FREQ computes tests and measures of association. For n -way tables, PROC FREQ does stratified analysis, computing statistics within, as well as across, strata. Frequencies and statistics can also be output to SAS data sets.”

We will begin with the very basics and consider the one-way frequency tables. These are used by SAS Programmers and Analysts all the time without giving the matter a second thought. What I mean is that if one is counting, doing error checking of the data or categorizing data, then PROC FREQ is the usual choice. The variables in the dataset can be either character or numeric.

The following statements are used in PROC FREQ according to the SAS® Procedure Manual:

PROC FREQ < options > ; TABLES requests < / options > ; BY variables ; WEIGHT variable < / option > ; TEST options ; EXACT statistic-options < / computation-options > ; OUTPUT < OUT= SAS-data-set > options ; RUN;

I have underlined the 4 statements in PROC FREQ which I will be discussing in this paper. The PROC FREQ statement is the only required statement for the FREQ procedure. If you specify the following statements, PROC FREQ produces a one-way frequency table for each variable in the most recently created data set.

PROC FREQ;

RUN;

DISCUSSION

Using the dataset Trial.sas7bdat from the Glenn Walker book “Common Statistical Methods for Clinical Research with SAS ® Examples" used in the SAS courses that I teach will illustrate an example. In this fictitious dataset there are 100 patients, and we want to know how many are males and how many are females. We also want to know how many males are over age 55.

Notice that the two questions ask how many and so we know that PROC FREQ is the procedure of choice. We begin by asking SAS for frequencies (one-way) on each of the variables of interest SEX and AGE. On the TABLES statement we list both variables separated by a space.

Example 1

PROC FREQ Data=Trial; TABLES Sex Age; RUN;

[See Example 1 Output in Appendix]

While these results are informative they do not give us the desired end which is “How many males are over the age of 55”. We have the number of males and females and we have the age distribution but we don’t have an answer to our question.

We know there are 44 males and we know that there are 18 patients over the age of 55. So we decide to use a WHERE statement to select only those patients who are male.

Example 2

PROC FREQ Data=Trial; TABLES Age; WHERE Sex=’M’; RUN;

[See Example 2 Output in Appendix]

So while we can answer the questions about how many males are over the age of 55 (there are 8 males over the age of 55), we may want to reshape our data into meaningful groupings in case there are other similar questions that arise at a later time (e.g. How many females are aged 46- 55?). So we will create groupings of the ages using a PROC FORMAT statement.

Example 3

PROC FORMAT;

VALUE Age_Fmt Low-15=’Less than 16 years’ 16-25=’16 – 25 years’ 26-35=’26 – 35 years’ 36-45=’36 – 45 years’ 46-55=’46 – 55 years’ 56-High=’Over 55 years’; RUN;

Example 6

PROC SORT Data=Trial Out=TrialSorted; BY Center; RUN;

Now that we have sorted the dataset Trial by the variable Center and created a new dataset called TrialSorted using the Out= option on PROC SORT we are ready to proceed.

Example 7

PROC FREQ Data=TrialSorted; TABLES Age*Sex / nocol norow nopercent; FORMAT Age Age_Fmt.; LABEL Age=’Age of Patient’ Sex=’Sex of Patient’ Center=’Study Center’; BY Center; RUN;

[See Example 7 Output in Appendix]

You will note that while Example 5 and Example 7 give the same numeric results, the appearance is slightly different. Example 7 puts each table on a separate page and begins with a dashed line followed by Study Center = N where N is 1, 2 or 3 and then the dashed line continues.

Now we are ready for some simple statistical computations using PROC FREQ. We will look at the calculation of the Chi-square statistic and McNemar’s statistic. The first statistic is used on independent groups in the data (Males and Females). We compare these two groups by a variable called RESP which indicates response (0=No, 1=Yes). Without even knowing much about statistics we conclude that there is a statistically significant difference between the response variable for men and women. This is determined because the Chi-square statistic of 5.18 has an associated probability (p-value) of 0.0228. This means that there is a less than 1/20 chance that this finding is due to chance alone.

Example 8

PROC FREQ Data=Trial; TABLES Sex*Resp / CHISQ; LABEL Sex=’Sex of Patient’ Resp=’Response of Patient’; RUN;

[See Example 8 Output in Appendix]

Now let’s say that we have some additional data for this fictitious group of 100 patients. They are asked their opinion about a certain experimental procedure. Then we introduce an intervention (an educational program). Finally we survey the SAME 100 patients after the intervention (say in 6 weeks). We want to know if there has been a significant change or shift in the patient’s original opinion (yes or no).

We cannot use the CHISQ option with PROC FREQ because these are not independent groups. They are in fact the same 100 patients who we surveyed initially and then at 6 weeks. The SAS System has an option in PROC FREQ to handle this type of analysis. It is the AGREE option. There is one more twist here because we collected these new data after the original Trial dataset was created we need to either add the individual data points to Trial or we can create a dataset using the aggregate number for our two new pieces of data and then use the WEIGHT statement in PROC FREQ to analyze.

So let’s create a new dataset called Trial_aggr based on the counts of the data. We will be creating a 2 x 2 table as follows; 30 patients answered Yes initially and at follow-up, 10 patients answered Yes initially and then No at follow-up. Forty (40) patients answered No initially and then Yes at follow-up and 20 patients answer No initially and then No at follow-up.

Example 9

DATA Trial_aggr; INPUT Pre $ Post $ Count; DATALINES; Yes Yes 30 Yes No 10 No Yes 40 No No 20 ; RUN;

Now we are ready to run PROC FREQ on our newly constructed dataset Trial_aggr. You will notice that the code or syntax for Example 8 and Example 10 look similar except that we have replaced the CHISQ option with the AGREE option since we are working with data that is not independent (actually called paired data). Also the WEIGHT statement is used since without it SAS would report that we only had 4 observations (instead of 100 observations).

Example 10

PROC FREQ Data=Trial_aggr; TABLES Pre*Post / AGREE; LABEL Pre=’Initial Response of Patient’ Post=’Response of Patient after Intervention’; WEIGHT Count; RUN;

[See Example 10 Output in Appendix]

We have completed our Tutorial and now the rest is up to you. The best ways to improve your SAS skills are to practice, practice, and practice. The SAS Online Help facility and SAS manuals are excellent ways to do this. Both are available to you under the Help dropdown (Learning SAS Programming and SAS Help and Documentation).

APPENDIX

Example 1

SEX

SEX Frequency Percent

Cumulative Frequency

Cumulative Percent

F 56 56.00 56 56.

M 44 44.00 100 100.

AGE

AGE Frequency Percent

Cumulative Frequency

Cumulative

AGE

AGE Frequency Percent Cumulative Frequency Cumulative

  • 19^2 2.00^2 2. Percent
  • 21^1 1.00^3 3.
  • 24^2 2.00^5 5.
  • 26 2 2.00 7 7.
  • 27 4 4.00 11 11.
  • 28 3 3.00 14 14.
  • 29 1 1.00 15 15.
  • 30^1 1.00^16 16.
  • 31^3 3.00^19 19.
  • 32 5 5.00 24 24.
  • 33 2 2.00 26 26.
  • 34 2 2.00 28 28.
  • 35 2 2.00 30 30.
  • 36 4 4.00 34 34.
  • 37^3 3.00^37 37.
  • 38^3 3.00^40 40.
  • 39 3 3.00 43 43.
  • 40 2 2.00 45 45.
  • 41 4 4.00 49 49.
  • 42 6 6.00 55 55.
  • 43 1 1.00 56 56.
  • 44^3 3.00^59 59.
  • 45^4 4.00^63 63.
  • 46 1 1.00 64 64.
  • 47 2 2.00 66 66.
  • 48 3 3.00 69 69.
  • 49 1 1.00 70 70.
  • 50^2 2.00^72 72.
  • 51^5 5.00^77 77.
  • 52 2 2.00 79 79.
  • 53 1 1.00 80 80.
  • 54 1 1.00 81 81.
  • 55 1 1.00 82 82.
  • 56^2 2.00^84 84. AGE Frequency Percent Frequency Percent
  • 57^3 3.00^87 87.
  • 58^1 1.00^88 88.
  • 59 2 2.00 90 90.
  • 61 2 2.00 92 92.
  • 62 2 2.00 94 94.
  • 63 1 1.00 95 95.
  • 64^2 2.00^97 97.
  • 65^1 1.00^98 98.
  • 69 1 1.00 99 99.
  • 70 1 1.00 100 100.
  • Example
  • 19 1 2.27 1 2. Percent
  • 24 1 2.27 2 4.
  • 26^1 2.27^3 6.
  • 27^2 4.55^5 11.
  • 29 1 2.27 6 13.
  • 31 2 4.55 8 18.
  • 32 2 4.55 10 22.
  • 36 2 4.55 12 27.
  • 37^3 6.82^15 34.
  • 38^1 2.27^16 36.
  • 39^2 4.55^18 40.
  • 40 2 4.55 20 45.
  • 41 1 2.27 21 47.
  • 42 3 6.82 24 54.
  • 44 1 2.27 25 56.
  • 45^2 4.55^27 61.
  • 46^1 2.27^28 63.
  • 47 1 2.27 29 65.
  • 48 1 2.27 30 68.
  • 49 1 2.27 31 70.
  • 51 2 4.55 33 75.
  • 52 2 4.55 35 79.
  • 55^1 2.27^36 81.
  • 56^1 2.27^37 84.
  • 58 1 2.27 38 86.
  • 59 1 2.27 39 88.
  • 61 2 4.55 41 93.
  • 62 1 2.27 42 95.
  • 65^1 2.27^43 97.
  • 70^1 2.27^44 100.

Example 4

Table of AGE by SEX

AGE(Age of Patient)

SEX(Sex of Patient)

Frequency F M Total

16 - 25 years 3 2 5 26 - 35 years 17 8 25 36 - 45 years 16 17 33

46 - 55 years^10 9 Over 55 years^10 8

Total 56 44 100

Example 7

Study Center = 1

Table of AGE by SEX

AGE(Age of Patient)

SEX(Sex of Patient)

Frequency F M Total

16 - 25 years^2 1 26 - 35 years 6 5 11 36 - 45 years 7 4 11 46 - 55 years 6 4 10 Over 55 years 3 2 5

Total^24 16

Study Center = 2

Table of AGE by SEX

AGE(Age of Patient)

SEX(Sex of Patient)

Frequency F M Total

16 - 25 years 1 1 2

26 - 35 years^7 2 36 - 45 years^7 8 46 - 55 years^3 2 Over 55 years 2 2 4

Total 20 15 35

Study Center = 3

Table of AGE by SEX

AGE(Age of Patient)

SEX(Sex of Patient)

Frequency F M Total

26 - 35 years^4 1 36 - 45 years 2 5 7 46 - 55 years 1 3 4 Over 55 years 5 4 9

Total 12 13 25

Example 10

Table of Pre by Post Pre(Initial Response of Patient)

Post(Response of Patient after Intervention)

Frequency Percent Row Pct Col Pct Yes No Total

Yes 30

No 40

Total 70

Statistics for Table of Pre by Post

McNemar's Test Statistic (S) 18. 0 DF 1 Pr > S <.

Simple Kappa Coefficient

Kappa 0. ASE 0. 95% Lower Conf Limit -0. 95% Upper Conf Limit 0.

Sample Size = 100