Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Goodness of Fit Testing using Microsoft Windows: A Classroom Tool, Study Guides, Projects, Research of Statistics

The importance of goodness of fit testing in simulation and stochastic modeling classes and introduces a PC-based program called Gof to support this testing. The paper describes the Windows 3.0 version of Gof, its limitations, and its use in a simulation class.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/12/2022

shachi_984a
shachi_984a 🇺🇸

4.6

(15)

222 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Proceedings of the 1991 Winter Simulation Conference
Barry L. Nelson, W. David Kelton, Gordon M. Clark (eds.)
CLASSROOM GOODNESS OF FIT TESTING
USING MICROSOFT WINDOWS
Donald L. Byrkett
Systems Analysis Department
Miami University
Oxford, Ohio 45056
ABSTRACT
The chi-square goodness of fit test and other statistical
tests for comparing observed data with atheoretical
distribution are not well supported by the popular
statistical analysis packages such as SAS, SPSS,
BMDP, and MINITAB. To overcome this limitation,
the author has written aPC based program to support
goodness of fit testing and has used it in his simulation
class. The original version of this program was written
as atypical, menu oriented, DOS application using the
language True BASIC. Recently, under agrant from
IBM, this program was rewritten in the language Cto
run under the Microsoft Windows 3.0 operating
environment. This paper discusses the importance of
goodness of fit testing in simulation and other
stochastic modelling classes, describes the Windows
3.0 version of this program and its limitations, and
discusses how this program can be used in asimulation
class. This program is available without cost to Winter
Simulation Conference participants who wish to use it
in their simulation classes.
1INTRODUCTION
When teaching courses in stochastic model ling,
theoretical distributions are often used to model various
system activities. For example, in studying queueing
systems the Exponential distribution may be used to
model service time or in reliability systems the Weibull
distribution may be used to model acomponent
lifetime. Sometimes these distributions are chosen for
mathematical tractability but, in principle, they are
chosen to represent the activity being modelled. Unless
students are asked to colleet data on a particular activity
and to use that data to select atheoretical distribution to
model the activity, they have difficulty understanding
how the distribution represents the activity. Ibelieve
that this is such an important point that it should be
covered in every stochastic modelling course. This
882
requires adiscussion of data collection, coverage of
goodness of fit testing, and software that allows the
student to quickly try alternate distributions to model a
particular activity. This paper concerns the software
support.
The major statistical analysis packages (SAS, SPSS,
BMDP, and MINITAB) provide very limited goodness
of fit procedures. At most they provide support for
two distributions, the Exponential and Normal. Iam
aware of only two software packages that provide the
kind of capabilities needed. One is aFORTRAN
program listed in Phillips (1972). Though Ihave used
and modified this program, it contains anumber of
errors and is not well suited to an interactive
environment. The other is UniFit (Simulation
Modeling and Analysis Company, P. O. Box 40996,
Tucson, AR 85717), acommercially available PC
based software package for distribution fitting. Though
Ihave not used UniFit, Iexpect that it is an excellent
package. Unfortunately, the cost of obtaining
sufficient copies prohibits its use for classroom
instruction.
MenuGof and Gof are programs that Icreated to
provide auser friendly environment for students to try
fitting alternate distributions to aset of &ta. MenuGof
is atraditional, menu oriented, DOS based program
described in Byrkett (1989). This paper describes the
operation and use of Gof, aMicrosoft Windows 3.0
version of the same program. This program was
developed under agrant from IBM to develop
Windows based software for educational use.
2PROGRAM OVERVIEW
Gof is coded in the language Cusing the Microsoft C
6.0 Compiler and the Microsoft Windows 3.0 Software
Development Kit. The layout of the screen is
illustrated in Figure 1. The program is designed to
follow the standard Windows Application Style Guide
(1987). There is asingle main window with amenu
pf3
pf4
pf5

Partial preview of the text

Download Goodness of Fit Testing using Microsoft Windows: A Classroom Tool and more Study Guides, Projects, Research Statistics in PDF only on Docsity!

Proceedings of the 1991 Winter Simulation Conference Barry L. Nelson, W. David Kelton, Gordon M. Clark (eds.)

CLASSROOM GOODNESS OF FIT TESTING

USING MICROSOFT WINDOWS

Donald L. Byrkett

Systems Analysis Department Miami University Oxford, Ohio 45056

ABSTRACT

The chi-square goodness of fit test and other statistical tests for comparing observed data with a theoretical distribution are not well supported by the popular statistical analysis packages such as SAS, SPSS, BMDP, and MINITAB. To overcome this limitation, the author has written a PC based program to support goodness of fit testing and has used it in his simulation class. The original version of this program was written as a typical, menu oriented, DOS application using the language True BASIC. Recently, under a grant from IBM, this program was rewritten in the language C to run under the Microsoft Windows 3.0 operating environment. This paper discusses the importance of goodness of fit testing in simulation and other stochastic modelling classes, describes the Windows 3.0 version of this program and its limitations, and discusses how this program can be used in a simulation class. This program is available without cost to Winter Simulation Conference participants who wish to use it in their simulation classes.

1 INTRODUCTION

When teaching courses in stochastic model ling, theoretical distributions are often used to model various system activities. For example, in studying queueing systems the Exponential distribution may be used to model service time or in reliability systems the Weibull distribution may be used to model a component lifetime. Sometimes these distributions are chosen for mathematical tractability but, in principle, they are chosen to represent the activity being modelled. Unless students are asked to colleet data on a particular activity and to use that data to select a theoretical distribution to model the activity, they have difficulty understanding how the distribution represents the activity. I believe that this is such an important point that it should be covered in every stochastic modelling course. This 882

requires a discussion of data collection, coverage of goodness of fit testing, and software that allows the student to quickly try alternate distributions to model a particular activity. This paper concerns the software support. The major statistical analysis packages (SAS, SPSS, BMDP, and MINITAB) provide very limited goodness of fit procedures. At most they provide support for two distributions, the Exponential and Normal. I am aware of only two software packages that provide the kind of capabilities needed. One is a FORTRAN program listed in Phillips (1972). Though I have used and modified this program, it contains a number of errors and is not well suited to an interactive environment. The other is UniFit (Simulation Modeling and Analysis Company, P. O. Box 40996, Tucson, AR 85717), a commercially available PC based software package for distribution fitting. Though I have not used UniFit, I expect that it is an excellent package. Unfortunately, the cost of obtaining sufficient copies prohibits its use for classroom instruction. MenuGof and Gof are programs that I created to provide a user friendly environment for students to try fitting alternate distributions to a set of &ta. MenuGof is a traditional, menu oriented, DOS based program described in Byrkett (1989). This paper describes the operation and use of Gof, a Microsoft Windows 3. version of the same program. This program was developed under a grant from IBM to develop Windows based software for educational use.

2 PROGRAM OVERVIEW

Gof is coded in the language C using the Microsoft C 6.0 Compiler and the Microsoft Windows 3.0 Software Development Kit. The layout of the screen is illustrated in Figure 1. The program is designed to follow the standard Windows Application Style Guide (1987). There is a single main window with a menu

Classroom Goodness of Fit Testing (^883)

~le Modify Qktribution ~arameters^ Perform^ lest^ Ylew

Figure 1: Gof Screen Layout

bar (line 1) indicating the data file being analyzed (PROB19.DAT), a set_of six pull-down menus (lie 2) plus the Help menu, and a display containing the current hypothesis and the results of any goodness of fit test performed. Other windows may be opened or closed to display a histogram, sample statistics, the observations, the cell frequencies, or the test details. The illustration displays the main window and four display windows. Windows may be opened, closed, resized, moved, scroIled and are accessible using either a keyboard or mouse. The observations to be analyzed are stored in a standard text (ASCIJ) file which may be creshxl outside Gof using any text editor or within the Gof program using a built in editor. The set of data is read from this file and analyzed by Gof. Figure 2 illustrates the menu options provided under each of the pull-down menus. Gof is used by making selections from these menus. These menus are designed following standard Microsoft Windows practices. Selections are made by pointing with a mouse or by pressing the ALT key in combinations with the underlined character in the menu. Certain menu options may be selected directly by pressing the indicated functions keys (for example F2 selects the ~istogram option directly). Menu options followti by

three periods (...) indicate that when this option is selected a dialog box will appear to obtain additional information. The ~ile menu is almost standard on adl Windows based applications. It provides options for creating a new data file or for reading an existing data file (Qpen), for printing the test~Xresults (Print), ending the program (EZit), and learning about the Gof program (About Goo. The remaining pull-down menus are unique to the Gof application. The ~odify menu provides options for editing the observation data @ata) snd/or the cell bounds (Qelk) used for the chi-square test and the histogram. The Modify option Qata allows the user to edit the current file of observations. To create a new tiata file of observations, the user selects the File option @en to name the file and is then automatically pllaced in the editor to enter the data. Once a data file is opened or created, Gof automatically forms cells and creates a frequency distribution. (^) These automatic cell bounds may be rnoditled as desired using the ~odify option cells. The Distribution menu is used to select the desired theoretical distribution to be tested. Selecting an option from this menu places a check beside the desired

Classroom Goodness of Fit Testing 885

3 STATISTICAL DETAILS

Generally, the statistical background for this program is based on the material in Chapter 6 of Law and Kelton’s (1991) simulation text. Exceptions to this are noted.

3.1 D~tribution Selection

Gof is capable of comparing a set of observations to seven theoretical distributions as listed in the Distribution menu options. For computational simplicity, the Gamma distribution is limited to the case in which the shape parameter, alpha, is integer. The program is written in a modular fashion so that additional distributions can be added if desired though I have found this selection to be adequate for simulation classes.

3.2 Parameter Estimation

When a distribution is selected, Gof automatically calculates parameter estimates of the selected distribution. The program uses the procedures specified in Law and Kelton (1991) to compute the parameter estimates for all distributions except the Gamma distribution and the Triangular distribution. Parameter estimates of the Triangular and Gamma distribution are calculated using the method of moments. Alpha, in the Gamma distribution, is rounded to the nearest integer. Triamzular Gamma t = min{x.} alg% = ~ound@(n)2/S2(n)) % = max{X!} beta = X(n)/alp% $ = 3*X(n)1. $ .~

3.3 Chi-square Test

When the chi-square test is performed, the expected frequency in each cell of the defined histogram is calculated and compared with the observed frequency via the standard chi-square test statistic. No attempt is made to form cells of equal expected frequency. The user is warned if the expeeted frequency in any cell is less than 2.

3.4 Kohnogorov-Smirnov Test

When the Kolmogorov-!hnimov test is performed, the expected cumulative distribution value is calculated for each observation and compared with the observed cumulative distribution values on each side of each observation. The^ largest^ difference^ between^ the observed and expected values is computed as the test statistic.

This test has two limitations. First, it is only ap- propriate in situations where the parameters of the hypothesized distribution are known. Secom~d,it is not applicable to discrete distributions such as the Poisson. However, the test is performed in all cases lby the pro- gram. The critical value is approximated as 1.22/sqrt(n), 1. 36/sqrt(n), and 1.63/sqrt(n) for the three levels of alpha.

4 PROGRAM LIMITATIONS

Gof has a number of inherent limitations that make it only appropriate as a classroom instmction tool. First, there is a rather limited set of distributions available. There are plenty to choose from for claasrocrm w but a professional tool would provide a greater variety. Second, no computation efficiencies are built into the program. The goal was simply to get a working program available as quickly as possible. Third, the correct interpretation of the test statistics is up to the user. Critical values are provided but a conclusion is not stated. Fourth, the formation of cells for the chi-square test is up to the user. Cells are not formed so that the exp@ed frequency in each cell is the same as suggested by most statisticians. Though the user is warned if the expected frequency in any cell is less that 2, the test is performed using the cell boundaries spwified. Despite these limitations, Gof is invaluable in the classroom as a quick and dirty instructional tool. This is illustrated in the next section.

5 USE OF PROGRAM JN CLASSROOM

When teaching goodness of fit testing, I do not introduce lvfenuGof or Gof until I have covered how to model stochastic activities and asked the students to perform some goodness of fit tests by hand. Generally, I begin the discussion with the idea that a particular activity may be represented as a constant (deterministic model) or as a random variable (stochastic model). This decision is baaed on the inherent variability of the activity. To measure this variability, sample data is collected and the sample variance is calculated. If it is relatively large, a stochastic model is appropriate. Once it is determined that a stochastic model is appropriate, a distribution needs to be specified. Histograms are introduced as a means of generating an empirical distribution of the activity. The pros and cons of empirical modelling are discussed and the advantages of modelling activities with known theoretical distributions is introduced. This leads convenient y into the concept of parameter estimation of theoretical distributions and goodness of fit testing.

886 B.vrkett

Students are introduced to three methods of parameter estimation: the maximum likelihood method, the method of moments, and common sense. All of these methods are illustrated using a variety of theoretical distributions. Maximum likelihood estimates are preferred when they are readily available. Students are also introduced to both the chi-square goodness of fit test and Kolmogorov-Smirnov test. Example data sets are analyzed, by hand, in class and students are assigned data sets to analyze, by hand, for homework. Finally, Gof is introduced as a means of quickly trying several alternate distributions on a data set. Gof is a particularly useful program in a simulation class. Simulation models are quite flexible and allow the modeller to w almost any distribution to represent stochastic activities. Typically, I assign the students to develop a simulation model of a particular system, but instead of giving them the distributions for each activity, I give them a set of &ta observations for each activity. It is their responsibility to analyze the data and develop distribution models for each activity using Gof. Then they can employ these distribution models with the estimated parameters in their simulation model. This type of assignment is only possible with the aid of a program such as Gof. It provides a valuable learning experience. Students^ learn^ a lot^ about modelling and a lot about data analysis from this type of activity. They learn to usefully apply the theoretical distributions they studied in their probability classes. They also learn about the indefiniteness of the chi-square test as they perform the test using a variety of cell intervals. With one set of cell intervals, a particular distribution is rejected and with another, it is not. In many cases, they prefer the definiteness of the Kolmogorov-Smimov test.

6 PROGRAM AVAILABILITY

Copies of this software can be obtained on Intemet

using anonymous ftp from apsrisc. aps.muohio. edu

(134.53.3.230) in the directory /pub/gof. Copy the

files gof.exe and gofhelp.hlp. If you are not able to

use ftp, you can write me at the address given on the title page.

REFERENCES

Law, A.M. & Kelton, W.D. 1991. Simulation modeling and analysis, Second Edition. New York: McGraw-Hill. Microsoft Windows Software Development Kit Application Style Guide. 1987. Microsoft Corporation, tie Microsoft Way, Redmond, WA.

Phillips, D.T. 1972. Applied goodness of~t testing.

Norcross, GA: Institute of Industrial Engineering

Monograph Series, Gpaations Research Division, Pub. No. 1.

AUTHOR BIOGRAPHY

DONALD L. BYRKETT is a professor in the Systems Analysis Department at Miami University. He teaches in the areas of stochastic modeling, simulation, manufacturing systems, and microcomputer systems. He is currently writing an introductory book on the Windows environment.

Byrkett, D.L. 1989. Classroom goodness of fit testing. In Proceedings of the Midwest Decisions Sciences Znstitute, eds. R. Luebbe and B. Finch, 16-18. Decision Science Institute, Cincinnati, Ohio.