Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Analysis of Birth-Weight Data: Tables, Graphs, and Diagrams, Exercises of Construction

An insight into a statistical investigation aimed at improving the survival rate and care of British babies at birth. the importance of collecting and analyzing large datasets, summarizing data through tables, graphs, and diagrams, and the use of MINITAB for data analysis. The investigation focuses on birth-weight data and its relation to whether the mother smokes or not.

Typology: Exercises

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(652)

10K documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 POPULATIONS AND
VARIATES
OVERVIEW
An example of a statistical investigation, the Survey of British Births, is used
in order to introduce the study of statistics. Tables, graphs and diagrams are
shown to provide convenient initial summaries of the information in data.
1.1 A STATISTICAL INVESTIGATION
Statistics is the science that studies the collection and interpretation of numeri-
cal data.
An example of an investigation using statistics was the Survey of British Births,
1970. The aim of that investigation was to improve the survival rate and the care
of British babies at and soon after birth, by collecting and analysing data on
new-born babies.
Data collected on one or two babies would be of little use, because the items of
information being measured would vary substantially from one baby to another.
For example, their weights at birth would vary considerably.The aim was to gain
information about a population, namely ‘all British births’.
Definition 1.1
Apopulation is the collection of items under discussion. It may be finite, as in
this example, or infinite; it may be real, as here, or hypothetical (as in some of
the models we set up later in the book).
It was not practicable to collect full information on every birth in Britain, even
for a single year. A smaller population was specified which was expected to be
very similar to the whole population of British births. This smaller population
consisted of all babies who were born (alive or dead) after the 24th week of
gestation, between 0001 hours on Sunday 5 April and 2400 hours on Saturday
11 April 1970.
A large amount of information was collected about each baby, its mother, and
the circumstances of the birth. Data recorded included the birth-weight and sex
of the baby, the place of birth (home, hospital or elsewhere), and whether the
mother smoked or not. Some of these data are measured (e.g. weight),whereas
others are classified (e.g. sex must be male or female). We call these classified
records attributes. Each of the quantities or attributes recorded is called a variate.
COPYRIGHTED MATERIAL
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Statistical Analysis of Birth-Weight Data: Tables, Graphs, and Diagrams and more Exercises Construction in PDF only on Docsity!

1 POPULATIONS AND

VARIATES

OVERVIEW

An example of a statistical investigation, the Survey of British Births , is used in order to introduce the study of statistics. Tables, graphs and diagrams are shown to provide convenient initial summaries of the information in data.

1.1 A STATISTICAL INVESTIGATION

Statistics is the science that studies the collection and interpretation of numeri- cal data. An example of an investigation using statistics was the Survey of British Births,

1970. The aim of that investigation was to improve the survival rate and the care of British babies at and soon after birth, by collecting and analysing data on new-born babies. Data collected on one or two babies would be of little use, because the items of information being measured would vary substantially from one baby to another. For example, their weights at birth would vary considerably. The aim was to gain information about a population , namely ‘all British births’.

Definition 1.

A population is the collection of items under discussion. It may be finite, as in this example, or infinite; it may be real, as here, or hypothetical (as in some of the models we set up later in the book).

It was not practicable to collect full information on every birth in Britain, even for a single year. A smaller population was specified which was expected to be very similar to the whole population of British births. This smaller population consisted of all babies who were born (alive or dead) after the 24th week of gestation, between 0001 hours on Sunday 5 April and 2400 hours on Saturday 11 April 1970. A large amount of information was collected about each baby, its mother, and the circumstances of the birth. Data recorded included the birth-weight and sex of the baby, the place of birth (home, hospital or elsewhere), and whether the mother smoked or not. Some of these data are measured (e.g. weight), whereas others are classified (e.g. sex must be male or female). We call these classified records attributes. Each of the quantities or attributes recorded is called a variate.

COPYRIGHTED MATERIAL

Definition 1.

A variate is any quantity or attribute whose value varies from one unit of investigation to another.

The units in this example were the individual babies in the population. Note that ‘home’, ‘hospital’ and ‘elsewhere’ are the three possible values of the variate ‘place of birth’. It is convenient to stretch the meaning of the word ‘value’ to include cases like this, rather than invent a new word for variates that are not numerical.

Definition 1.

An observation is the value taken by a variate for a particular unit of investigation.

The number of variates used exceeded 100, and the number of babies in the population investigated was over 17 000. There were therefore about 17 000 observations on each of 100-plus variates. However, as is usual in any large or complex statistical investigation, there were ‘missing data’: not every variate was recorded on every baby. Mishaps led to some of these measurements not being available, though in fact there were few such mishaps in this study. Some variates were simply not applicable to all babies, e.g. the cause of death applied only to the dead babies. Even so, the total number of observations – the mass of data – was enormous. The original ‘raw’ data were collected by filling in questionnaire forms. These data were then transferred to punched cards so that a computer could be used to deal with them. The prime problem with such an amount of data is to summarize them so that they can be interpreted. Variates differ in nature, and the methods of analysis appropriate to a variate will obviously depend on its nature. We can distinguish between quantitative variates (like the birth-weight of the baby, or the daily number of cigarettes smoked by the mother at a specified time during pregnancy) and qualitative variates , or attributes (such as the sex of the baby or the place of birth).

Definition 1.

A quantitative variate is a variate whose values are numerical.

Definition 1.

A qualitative variate or attribute is a variate whose values are not numerical.

Quantitative variates can also be divided into two types: they may be continuous , if they can take any value we care to specify within some range, or discrete if their values change by steps or jumps. Thus birth-weight is continuous, because there is no reason why a baby should not have a weight of 6.943762 lb – even if no scales could measure it this accurately! However, a variate like the number of

POPULATIONS AND VARIATES

2

draw up frequency tables, one for each population. This time the variate being summarized is weight, a continuous measurement. Instead of looking at the frequency of each variate-value that occurs we first group the values into inter- vals, that is, subdivisions of the total range of possible values of the variate. In this example the variate is conveniently collected into the class-intervals 1–500, 501–1000,... , 4501–5000, 5001–5500 grams.

Definition 1.

A class-interval is a subdivision of the total range of values which a (continuous) variate may take.

Definition 1.

The class-frequency is the number of observations of the variate which fall in a given interval.

Definition 1.

The frequency distribution of a (continuous) variate is the set of class-intervals for the variate, together with the associated class-frequencies.

Table 1.2 shows the frequency distribution of birth-weight in the two populations. We cannot make a direct comparison of these two sets of frequencies because the total frequencies in the two populations differ. In order to obtain two sets of figures expressed in the same units, which can be compared, we calculate the relative frequencies as shown. Comparison is helped by drawing a graph. The frequency polygon illustrating a set of frequencies, or, as here, a set of relative frequencies, is obtained by plotting

Table 1.2 Frequency distribution of birth-weight of babies according to

whether mother smokes or has never smoked

Interval (g) 1– 501– 1001– 1501– 2001– 2501– Frequency S† 4 38 72 123 479 1661 Frequency N† 4 20 25 73 236 1089 Relative frequency S† 0.001 0.005 0.010 0.017 0.067 0. Relative frequency N† 0.001 0.003 0.004 0.011 0.036 0.

Interval (g) 3001– 3501– 4001– 4501– 5001– Frequency S† 2831 1560 343 36 2 Frequency N† 2530 1927 551 101 7 Relative frequency S† 0.396 0.218 0.048 0.005 0. Relative frequency N† 0.385 0.294 0.084 0.015 0. † S refers to the population of babies whose mothers smoke; N refers to the population of babies whose mothers have never smoked.

POPULATIONS AND VARIATES

4

class-frequencies or relative frequencies as y -values against the centre-points of class-intervals as x -values. Then the plotted points are joined by straight lines. In Fig. 1.1, the two frequency polygons for the populations are shown on the same graph. One can see that although the two distributions have a similar shape, the distribution of birth-weight for babies whose mothers smoke is to the left of the corresponding distribution for those whose mothers have never smoked. Considered as a whole, the birth-weights for babies whose mothers smoke are rather smaller than those for babies whose mothers have never smoked. There is a great deal of overlapping of the two populations: many individual babies in the ‘smokers’ population have weights greater than individuals in the ‘non-smokers’ population. Nevertheless the difference between the two populations viewed in their entirety is quite distinctive. The apparent evidence about smoking from this survey is not conclusive. The women chose for themselves whether or not to smoke. There is a possibility, which cannot be excluded and should not be overlooked, that the women who did smoke might be different in other ways (e.g. richer or more nervous) from those who did not. The difference in birth-weight of the babies might be a result of some other unknown factor, not of smoking. Before using results to make claims about possible causes, an investigator should always check up, so far as possible, whether other factors may be influencing the results. This example has illustrated how simple tables and graphs help to summarize a great deal of information. In the examples which now follow we consider the practical processes of constructing tables and graphs in more detail.

40

35

30

% Frequency

25

20

15

10

5

0 1000 2000 Birth-weight, grams

3000 4000 5000

Figure 1.1 Frequency polygons of birth-weights of babies whose mothers

never smoked ( ◦ --- ◦ ), and whose mothers smoked at the time of the

survey ( • — • ).

A STATISTICAL INVESTIGATION

5

The height of the bars is proportional to the frequency of each variate-value. The bars may be thickened, though still centred on the variate-values, to make the bars more obvious. The thickness has no significance. The bars must be kept distinct (cf. the histogram in the next example), to show that the variate-values are distinct.

Example 1.2 Frequency table and histogram of a continuous variate WWW

Measurement of the lengths of 100 eggs of cuckoos gave the following results (all readings in millimetres):

22.5, 20.1, 23.3, 22.9, 23.1, 22.0, 22.3, 23.6, 24.7, 23.7, 24.0, 20.4, 21.3, 22.0, 24.2, 21.7, 21.0, 20.1, 21.9, 21.9, 21.7, 22.6, 20.9, 21.6, 22.2, 22.5, 22.2, 24.3, 22.3, 22.6, 20.1, 22.0, 22.8, 22.0, 22.4, 22.3, 20.6, 22.1, 21.9, 23.0, 22.0, 22.0, 22.1, 22.0, 19.6, 22.8, 22.0, 23.4, 23.8, 23.3, 22.5, 22.3, 21.9, 22.0, 21.7, 23.3, 22.2, 22.3, 22.8, 22.9, 23.7, 22.0, 21.9, 22.2, 24.4, 22.7, 23.3, 24.0, 23.6, 22.1, 21.8, 21.1, 23.4, 23.8, 23.3, 24.0, 23.5, 23.2, 24.0, 22.4, 23.9, 22.0, 23.9, 20.9, 23.8, 25.0, 24.0, 21.7, 23.8, 22.8, 23.1, 23.1, 23.5, 23.0, 23.0, 21.8, 23.0, 23.3, 22.4, 22.4. Construct a frequency table. Illustrate the data graphically by means of: (a) a stem-and-leaf diagram; (b) a histogram; (c) a cumulative frequency diagram.

With data such as these the practice is to record to the nearest value for the number of decimal places chosen. Thus a recorded value of 22.5 represents a number in the interval 22.45–22.55 mm. A suitable class-interval must be chosen. The maximum and minimum obser- vations are 25.0 and 19.6. The range of the observations is therefore 25. 0 − 19. 6 = 5 .4. (More precisely it is 25.05 − 19.55 = 5.50, but this is an unnecessary refine- ment here.) About 10 intervals is usually suitable; division of the range by 10 gives

Interval (mm)

Centre (mm) Tally count^ Frequency

1 4 3 3 12 27 12

19.5–19.9 19. 20.0–20.4 20. 20.5–20.9 20. 21.0–21.4 21. 21.5–21.9 21. 22.0–22.4 22. 22.5–22.9 22. 23.0–23.4 23. 23.5–23.9 23. 24.0–24.4 24. 24.5–24.9 24. 25.0–25.4 25.

16 12 8 1 1 —– Total 100

CONSTRUCTING TABLES AND GRAPHS

7

a value of 0.54, which suggests a class-interval of 0.5 mm. The end-points of the interval should be chosen so that no observation can fall on them. They will thus be expressed to one place more of decimals than the actual observations. Suitable intervals are 19.45–19.95, 19.95–20.45,.... The class-centres are 19.7, 20.2,.... An alternative form of specification of the intervals is 19.5–19.9, 20.0–20.4,.... A tally count for the cuckoo-egg measurements is given in the table on p. 7. (a)An alternative to a tally count table is a stem-and-leaf diagram (or stem plot ) invented by J. W. Tukey (American, 1915–2000). It has the virtue of retaining all the information on individual values; we may think of it as a concise way of writing down a set of numbers. A stem-and-leaf diagram for the cuckoo-egg data takes the form:

Length of cuckoo eggs (in tenths of mm) Depth 19 0 19 6 1 20 1 4 1 1 5 20 9 6 9 8 21 3 0 1 11 21 7 9 9 7 6 9 9 7 9 8 7 8 23 22 0 3 0 2 2 3 0 0 4 3 1 0 0 1 0 0 3 0 2 3 0 2 1 4 0 4 4 50 22 5 9 6 5 6 8 8 5 8 9 7 8 50 23 3 1 0 4 3 3 3 4 3 2 1 1 0 0 0 3 38 23 6 7 8 7 6 8 5 9 9 8 8 5 22 24 0 2 3 4 0 0 0 0 10 24 7 2 25 0 1

Stem unit = 1 mm; Leaf unit = 0.1 mm

Tukey calls each row of the diagram a stem ; the stem labels such as 19 or 20 are put to the left of the ruled line while the digits to the right of the line are the leaves on the stems. A measurement 20.1 is represented on the third row, i.e. the stem with label ‘20’, by the digit, or leaf, ‘1’. The remaining three leaves on the same stem in this example represent the measurements 20.4, 20.1 and 20.1. The intervals used are identical with those in the tally count. The diagram above was constructed by reading along the rows of the data and adding the leaves one by one. Intervals should be of 5 units, as for the cuckoo eggs, or of 2 or 10 units. The column on the right might be used for a frequency count, as in the tally count above. Tukey recommends using it, as we have done, for depth , which is a measure of how deep any observation is in the distribution. The column gives cumulative depths counting from each end of the distribution. Thus, on the ‘21’ stem the cumulative depth is 11 = 1 + 4 + 3 + 3, the sum of the frequency counts of the earlier stems. For the cuckoo-egg data the median (see p. 24) happens to fall on an interval boundary, because it has depth 12 (100 + 1) = 50 .5. If this does not happen, there will be a central interval that cannot sensibly be cumulated with the totals from each end. The convention is to give the frequency count for that interval in brackets (see the example on p. 90).

POPULATIONS AND VARIATES

8

observations that fall below the upper end-point of the class-interval. Hence it is the upper end-point that we put in the table and use in the graph.

Upper end-point 19.45 19.95 20.45 20.95 21.45 21.95 22.45 22. Cumulative frequency 0 1 5 8 11 23 50 62 Upper end-point 23.45 23.95 24.45 24.95 25. Cumulative frequency 78 90 98 99 100

Figure 1.4 shows the cumulative frequencies plotted against the corresponding end-points. We must always make the first cumulative frequency in our table 0 to give a starting point for our diagram. This 0 will correspond to the upper end-point of the interval below the one that contains the first observation. The diagram that we have drawn in Fig. 1.4 may be called a cumulative frequency polygon, since it consists entirely of straight lines. When a statistician wants to make inferences about a population on the basis of a sample from it, the first step is very often to smooth this polygon into a curve. This will be illustrated in Fig. 4.2 (p. 77).

100

90

80

70

60

Cumulative frequency

50

40

30

20

10

19 20 21 22 23 Length of eggs (mm)

24 25 26

0

Figure 1.4 Cumulative frequency diagram of distribution of length of

cuckoo eggs.

POPULATIONS AND VARIATES

10

Example 1.3 Histogram of a continuous variate, with unequal

grouping intervals

A certain disease affects children in their early years, and sometimes kills them. The frequency table of the age at death, in years, of 95 children dying from this disease is:

Age at death (years) 0– 1– 2– 3– 4– 5–10 Total Frequency 10 40 20 10 5 10 95

Draw a histogram to represent the data.

Note that the convention with age is to record not to the nearest number of years but to the number of completed years. A child said to be aged 3 years has an age in the interval 3 to 4 years, including the lower end-point but excluding the upper end-point; in symbols, if the age is x , 3  x < 4. This interval is most conveniently denoted 3–, and this convention has been used in the preceding table. The class-centre of the class-interval 3– is 3.5 years. The histogram (Fig. 1.5) is drawn in a similar manner to that in Example 1. except that we must take care with the final class-interval 5–10, since it is longer than the other class-intervals. We do not draw a rectangle with height equal to the frequency, 10, over the final interval, because that does not represent the data fairly. The incidence of the disease wanes with age after age 1; we see that there were 10 cases per year of age at age 3, 5 cases per year of age at age 4 and 2

40

30

20

Frequency density (number per year of age)

10

0 1 2 3 4 Age at death (years)

5 6 7 8 9 10

Figure 1.5 Histogram of distribution of age at death of diseased children,

demonstrating the use of unequal class-intervals.

CONSTRUCTING TABLES AND GRAPHS

11

Calculation tables

  1. When possible carry out calculations on the record sheet. (Transfer of data from one sheet to another introduces errors.)
  2. When data are transferred from the record sheet, double-check the numbers transferred. Check, for example, that the total of columns of transferred numbers equals the total of the original numbers on the record sheet, as well as reading the numbers over again.
  3. Give the calculation a logical pattern on the sheet.
  4. Choose units that will simplify the calculation. Consider a suitable coding (see p. 33); for example, decimal numbers may be multiplied by a power of 10 to convert them to integers.

Summary tables

  1. Make the table clear and simple, with the main numbers for comparison close to each other. It is usually easier to compare numbers in the same column than those in the same row.
  2. If the table is becoming complicated break it down into smaller tables.
  3. Arrange rows and columns in some natural order or in size order. Where possible, put big numbers at the top of the table.
  4. Give row and column averages or totals, where they are meaningful.
  5. Choose units that the reader will understand and that will keep the table simple, e.g. a unit of millions of £ may be convenient. State the unit of measurement.
  6. Give a clear title and a verbal summary of the main points of the table.
  7. Round all numbers to two effective digits. Effective digits are those which vary within a set of data. Each of the following sets of data is quoted to two effective digits: (a) 129, 136, 119, 151; (b) 6.2, 7.3, 4.2, 3.8; (c) 290, 530, 640, 310. Note that this does not contradict (4) above; that remains an appropriate rule when collecting and processing data (as opposed to presenting a summary). To fix the final digit, round to the nearest number. Thus 1.32 becomes 1. and 27.9 becomes 28. If the value before rounding is exactly in the middle of the two possible rounded numbers, then round to the even digit. Thus 2. becomes 2.4 and 2.75 becomes 2.8. It is desirable not to consistently round in one direction, say up, because that would yield misleading averages.

Diagrams

In a table use is made of the geometrical ordering of rows and columns to exhibit relations between the numbers in the table. In diagrams (which we take to include graphs), the magnitude of numbers is represented geometrically in order to aid comparisons. In this chapter, frequency diagrams have been considered. A simple plot of the values of the observations on a line is often useful when assessing the nature of

TABLES AND DIAGRAMS

13

data and detecting wild observations (see Fig. 5.1, p. 90). Plots of pairs of variates (see scatter diagrams, p. 93) are also useful. These diagrams are adequate for the projects suggested in this book but the reader will find elaborations of these diagrams, and other types, in the presenta- tion of statistical data in newspapers and magazines. The reader should inspect examples critically and decide whether they are helpful or whether they mislead. As with tables, we give a check-list of points to heed when constructing graphs.

1.3.2 Points to note when constructing graphs

  1. Make the graph self-explanatory: provide a title and a brief description of the source, label the axes, state the units, mark in the scales and give a key if it is needed.
  2. Choose a scale that: (a) is convenient; (b) ensures that most of the graph-paper is used.
  3. Beware of misleading if the origin is not included.

1.4 THE USE OF COMPUTERS

Over the past fifty years the greatest change in the study and practice of statistics, as in many human activities, has been the increasing use of computers. Com- putation has always been important in statistics. In the early 1950s statisticians carried out their analyses by literally turning the handles of calculating machines. Large investigations used punched card machinery, as was done with the 1970 Survey of British Births , which is discussed earlier in this chapter. Over time, desk calculating machines became electric, then electronic. Simultaneously, the immediate postwar experimental computers were the inspiration for huge main- frame computers occupying a whole room. These decreased in size as they used, in turn, valves, transistors and silicon chips. The two streams came together in the personal computer that we use today. We see three main uses for computers in the study of statistics: (1) to facilitate the analysis of data sets which are too large to be investigated by hand; (2) to aid the drawing of graphs and diagrams, which allow critical study of data before analysis, and improve presentation of results; (3) to simulate probability models in order to illustrate probability and statistical theory. As students become more familiar with computers the size of a set of data that can be analysed more conveniently by computer, rather than by hand, becomes smaller and smaller. Yet there is still a place for pencil and paper analysis. The teacher, having just marked thirty scripts, can gain an immediate impression of the overall performance of his pupils by setting out the marks in a stem-and- leaf diagram such as that we describe on p. 8. Scratching down the numbers, to use John Tukey’s phrase, gives an acquaintance with every item of information. Manipulating the data on a computer is more impersonal; odd, and therefore important, items can be missed.

POPULATIONS AND VARIATES

14

Instructions may be given to MINITAB in two ways. As with Windows, items on a menu bar may be clicked and dialogue boxes completed. We shall call this the menu method. Alternatively, commands may be typed in the Session window. We shall call this the session command method. Suppose data are entered into Column 1. The column of data is automatically labelled C1. To produce a histogram from the data click Graph on the menu bar. A menu appears; from this click on Histogram. In the dialogue box that appears on the screen select Simple (it is the default setting and probably already selected) and click OK. In the next dialogue box enter C1 under Graph variables , ignoring other choices, and click OK. (MINITAB uses ‘variable’ as an alternative term to ‘column’.) The graph will be displayed. We might express the procedure more concisely as

Graph > Histogram / Select Simple / Enter Graph variable /

The symbol ‘>’ represents movement of the cursor from one word to another and the symbol ‘/’ represents a click on a menu item or on an OK box. Occasionally two successive clicks will be necessary, and this is denoted by ‘//’. The relevant words on the computer screen are shown here in bold type. The alternative method is to use session commands. Click anywhere in the Session window and then click Editor on the menu bar. Click on Enable Session Commands. A prompt ‘MTB>’ appears on the left-hand side of the window. To produce a histogram type

HISTOGRAM C

after the prompt and press ‘Return’. Again a graph appears, in its own window. For clarity, we shall write the command words in capitals, but this is not nec- essary when using MINITAB. Either upper or lower case letters may be used for commands and arguments; only the first four letters of a command word are necessary. We recommend that readers enable the session commands even if they intend using only the menu approach. If this is done, the session commands equivalent to the menu commands appear in the Session window and provide a record of the analysis. To enter data into a worksheet, choose say the first column and click the cell below ‘C1’. The name of the variate may be entered. This is usually desirable but is not necessary. Use the arrow down key (↓) to move the cursor to the cell below and enter the first value. Repeat this procedure to enter the whole set. Data may also be entered through the Session window. For data that are to go into column 1 use SET C A prompt ‘DATA>’ appears. Type in the data, separated by spaces or commas, over one or more lines and signal completion by typing END. For data in several columns use READ, which also provokes the DATA prompt.

POPULATIONS AND VARIATES

16

Of the forms of data that MINITAB offers we shall make use of columns and stored constants. A column may contain numbers or text and is referred to by C plus a number, as in C1 or C23, or by name. A name may be typed at the head of a column or assigned by, for example, the command NAME C1 ‘Length’ Individual entries in a column may be referred to as C1(1), C3(2), etc. A stored constant is a single number or a text string. It is referred to by K plus a number, as in K15. It can be given a name by the NAME command. Its value may be displayed by use of commands such as PRINT K or PRINT ‘density’

To discover the names of columns and constants in use, type the session command

INFO

To carry out arithmetic, MINITAB offers a calculator obtained by choosing

Calc > Calculator /

Typical operations are adding two columns and putting the result in a third col- umn, or transforming data by, for example, constructing a new column of values which are the logarithms of an original column of data. Equivalent operations can be carried out by using LET in session commands such as LET C3 = C1 + C LET C2 = LOGE(C1)

1.6 MINITAB: GRAPHS, DIAGRAMS AND

TABLES

1.6.1 Graphs and diagrams

MINITAB can be used to produce the bar charts, histograms and stem-and-leaf diagrams that we have described in this chapter. With data in a column of the worksheet, a bar chart is produced by choosing Graph > Bar Chart / Select Simple / Enter Categorical variable , i.e. column name/

To obtain a histogram choose

Graph > Histogram / Select Simple / Enter Graph variable /

MINITAB: GRAPHS, DIAGRAMS AND TABLES

17

1.7 EXERCISES ON CHAPTER 1

  1. The variates below were recorded for individual people. State whether each variate is qualitative or quantitative and, if the latter, whether discrete or continuous: (a) age, (b) year of birth, (c) sex, (d) height, (e) colour of hair, (f) number of pairs of shoes possessed, (g) favourite vegetable, (h) age rank in household (i.e. position in order of age among people in the same household), (i) surname.
  2. Choose a population of people (e.g. teachers in your school or college, Mem- bers of Parliament, inhabitants of your town or village) and guess appropriate relative frequency distributions for the variates listed in Question 1. Sketch diagrams to represent the frequency distributions.
  3. (a) People’s weights are recorded to the nearest kilogram, and presented in a table. The classes in this table are labelled 60–64, 65–69 and so on. What are the exact upper and lower limits covered by each class, and what is the mid-point of each class? (b) An insurance company asks new customers to fill in on a form ‘age last birthday’. When processing its records, it lists the number of new customers and their ages in a table, typical classes being 30–34, 35–39 and so on. What are the upper and lower limits and the mid-point of each class? (c) A quiz contains 30 questions. One hundred people write down their answers, and these are marked right or wrong. The number of correct answers is recorded and summarized in a table whose class-intervals are 0–8, 9–12, 13–16, l7–20, 21–24, 25–30. What are the mid-points of these classes? (d) People attending a doctor’s surgery by appointment may have to wait some time before being seen. This waiting time is recorded on a particular day in Dr Smith’s surgery, to the nearest minute. The records of waiting time are summarized in a table whose classes are ‘less than 5’, ‘5 to 9’, ‘10 to 14’, ‘15 or more’ minutes. What are the upper and lower limits of each class, and the mid-point of each class?
  4. In 3(c) above, the numbers in the classes were 10, 8, 22, 29, 20, 11 respectively. Suggest how these could be shown in a diagram.
  5. In 3(d) above, the numbers in the classes were 6, 9, 9, 4 respectively. Show these in a suitable diagram.
  6. The numbers of people travelling from a particular village on the daily bus to town were recorded for a period of 30 days, as follows: 6, 3, 2, 7, 4, 0, 5, 1, 3, 2, 6, 2, 4, 4, 3, 0, 5, 2, 1, 2, 4, 3, 5, 2, 4, 6, 3, 0, 7, 1. Make a frequency table to summarize these records, and show them in a suitable diagram.
  7. The times, measured to the nearest second, taken by 30 students to complete an algebraic problem are given below:

47, 61, 53, 43, 46, 46, 68, 48, 72, 57, 48, 54, 41, 63, 49, 42, 58, 65, 45, 44, 43, 51, 45, 38, 48, 46, 44, 52, 43, 47.

EXERCISES ON CHAPTER 1

19

Group these times into a frequency table using eight equal class-intervals, the first of which contains measured times in the range 35–39 seconds. Draw a histogram of the grouped frequency distribution. Which is the modal class? [ Note. See Definition 2.8.] ( Welsh )

  1. The following figures show the attendance (in thousands, to the nearest thousand) at 32 matches of a national football team:

(a) Construct a stem-and-leaf diagram for these data. (b) What are the advantages of a stem-and-leaf display, when compared with a histogram, for illustrating these data? ( IOS )

  1. The number of males, in litters of rats which contain 5 rats altogether, was observed with the following results over 100 different litters. Summarize WW these results and illustrate them:

W

  1. A standard pack of playing cards contains 52 cards in all, 13 of each of the four suits hearts, diamonds, spades and clubs. A hand of cards consists of 13 out of the 52, and before dealing out a hand the pack must be well shuffled. One hundred hands are obtained in this way from different packs, and the number of spades in each hand is recorded. Two of these hands contain no spades, there are 6 hands with 1 spade, 20 hands with 2 spades, 28 hands with 3, 23 with 4, 12 with 5, 4 with 6, 2 with 7, and one each with 8, 10, and 11 spades respectively. Summarize this information in a suitable table and in a suitable diagram.
  2. The heights of 187 plants of a species were recorded to the nearest millimetre, as shown in the table below. Summarize them in class-intervals of width 5 mm, and show the results in a suitable diagram. Also summarize the height records in class-intervals of other widths (e.g. 4 mm, 2 mm, 10 mm), and show the results in a diagram. Comment on the similarities and/or differences in these summaries:

Height 26 27 28 29 30 31 32 33 34 35 36 37 38 Frequency 1 0 2 1 0 1 3 0 1 0 4 1 2

Height 39 40 41 42 43 44 45 46 47 48 49 50 51 Frequency 6 10 12 8 6 15 17 20 13 9 12 7 8

POPULATIONS AND VARIATES

20