Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Statistics test analysis and study, Lecture notes of Statistics

RKDF College of Technology Statistics

Test for statistical analysis for preparing for subject little guide

Typology: Lecture notes

2019/2020

Uploaded on 02/19/2020

oliver-sen 🇮🇳

2 documents

1 / 43

This page cannot be seen from the preview

Don't miss anything!

Statistical tests

Partial preview of the text

Download Statistics test analysis and study and more Lecture notes Statistics in PDF only on Docsity!

Statistical tests

Steps

**1. Make an initial appraisal of your data (Data types and initial appraisal)

Select the type of test you require based on the question you are asking (see Categories)
Select the actual test you need to use from the appropriate key
Determine any preliminary tests you need to carry out prior to performing the statistical test
If your data are suitable for the test chosen based on the results from 4 proceed to the test
If your data do not meet the demands of the chosen test go back to 3 and choose the non- parametric equivalent.
It may be that your data are still not suitable in which case you need to search wider than this web site or get more data or discard them (one of the problems you may face if you have not planned properly)**

Chi-Square Test

All chi-squared tests are concerned with counts of things (frequencies) that you can put into categories. For example, you might be investigating flower colour and have frequencies of red flowers and white flowers. Or you might be investigating human health and have frequencies of smokers and non-smokers.
The test looks at the frequencies you obtained and compares them with the frequencies you might expect given your null hypothesis. The null hypothesis is this: There is no significant difference between the observed and expected frequencies

The Chi-square Distribution

Before discussing the unfortunately-named "chi-square" test, it's

necessary to talk about the actual chi-square distribution. The chi-square

distribution, itself, is based on a complicated mathematical formula. There

are many other distributions used by statisticians (for example, F and t )

that are also based on complicated mathematical formulas. Fortunately,

this is not our problem. Plenty of people have already done the relevant

calculations, and computers can do them very quickly today.

When we perform a statistical test using a test statistic , we make the

assumption that the test statistic follows a known probability distribution.

We somehow compare our observed and expected results, summarize

these comparisons in a single test statistic, and compare the value of the

test statistic to its supposed underlying distribution. Good test statistics

are easy to calculate and closely follow a known distribution. The various

chi-square tests (and the related G - tests) assume that the test statistic

follows the chi-square distribution.

The Chi-square Distribution

When we perform a statistical test, we refer to this probability of "mistakenly rejecting our hypothesis" as "alpha." Usually, we equate alpha with a p - value. Thus, using the numbers from before, we would say p =0.0863 for a chi-square value of 4.901 and 2 d.f. We would not reject our hypothesis, since p is greater than 0.05 (that is, p >0.05). You should note that many statistical packages for computers can calculate exact p - values for chi- square distributed test statistics. However, it is common for people to simply refer to chi-square tables. Consider the table below:
The first column lists degrees of freedom. The top row shows the p - value in question. The cells of the table give the critical value of chi-square for a given p - value and a given number of degrees of freedom. Thus, the critical value of chi-square for p =0.05 with 2 d.f. is 5.991. Earlier, remember, we considered a value of 4.901. Notice that this is less than 5.991, and that critical values of chi-square increase as p - values decrease. Even without a computer, then, we could safely say that for a chi- square value of 4.901 with 2 d.f., 0.05< p <0.10. That's because, for the row corresponding to 2 d.f., 4.901 falls between 4.605 and 5.991 (the critical values for p =0.10 and p =0.05, respectively).

A Simple Goodness-of-fit Chi-square Test

Consider the following coin-toss experiment.

We flip a coin 20 times, getting 12 "heads"

and 8 "tails." Using the binomial distribution,

we can calculate the exact probability of

getting 12H/8T and any of the other possible

outcomes. Remember, for the binomial

distribution, we must define k (the number of

successes), N (the number of Bernoulli trials)

and p (the probability of success). Here, N is

20 and p is 0.5 (if our hypothesis is that the

coin is "fair"). The following table shows the

exact probability ( p ( k | pN ) for all possible

outcomes of the experiment. The probability

of 12 heads/8 tails is highlighted.

A Simple Goodness-of-fit Chi-square Test

Using the Sum Rule, we get a p - value of 0.50344467. Following the

convention of failing to reject a hypothesis if p >0.05, we fail to reject the

hypothesis that the coin is fair.

It happens that doing this type of calculation, while tedious, can be

accomplished pretty easily -- especially if we know how to use a

spreadsheet program. However, we run into practical problems once the

numbers start to get large. We may find ourselves having to calculate

hundreds or thousands of individual binomial probabilities. Consider

testing the same hypothesis by flipping the coin 10,000 times. What is the

exact probability, based on the binomial distribution, of getting 4,

heads/5,135 tails or any outcome as far or farther from 5,000 heads/5,

tails? You should recognize that you'll be adding 9,732 individual

probabilities to get the p - value. You will also find that getting those

probabilities in the first place is often impossible. Try calculating 10,000!

(1 x 2 x 3 x ... x 9,998 x 9,999 x 10,000).

As sample size gets large, we can substitute a simple test statistic that

follows the chi-square distribution. Even with small sample sizes (like the

20 coin flips we used to test the hypothesis that the coin was fair), the chi-

square goodness-of-fit test works pretty well. The test statistic usually

referred to as "chi-square" (unfortunately, in my opinion) is calculated by

comparing observed results to expected results. The calculation is

straightforward. For each possible outcome, we first subtract the expected

number from the observed number. Note: we do not subtract

percentages, but the actual numbers! This is very important. After we do

this, we square the result (that is, multiply it by itself). Then we divide this

result by the expected number. We sum these values across all possible

outcome classes to calculate the chi-square test statistic.

The formula for the test statistic is basically this:

Here's the earlier table, with two columns added so we can calculate the chi-square test statistic. One is for our observed data, the other for the calculation.

Notice that the totals for observed and expected numbers are the same

(both are 20). If you ever do this test and the columns do not add up to

the same total, you have done something wrong!

In this case, the sum of the last column is 0.8. For this type of test, the

number of degrees of freedom is simply the number of outcome classes

minus one. Since we have two outcome classes ("heads" and "tails"), we

have 1 degree of freedom. Going to the chi-square table, we look in the

row for 1 d.f. to see where the value 0.8 lies. It lies between 0.455 and

2.706. Therefore, we would say that 0.1< p <0.5. If we were to calculate the

p - value exactly, using a computer, we would say p =0.371. So the chi-

square test doesn't give us exactly the right answer. However, as sample

sizes increase, it does a better and better job. Also, p - values of 0.371 and

0.503 aren't qualitatively very different. In neither case would we be

inclined to reject our hypothesis.

TEST YOUR UNDERSTANDING

There are 110 houses in a particular neighborhood. Liberals live in 25 of them, moderates in 55 of them, and conservatives in the remaining 30. An airplane carrying 65 lb. sacks of flour passes over the neighborhood. For some reason, 20 sacks fall from the plane, each miraculously slamming through the roof of a different house. None hit the yards or the street, or land in trees, or anything like that. Each one slams through a roof. Anyway, 2 slam through a liberal roof, 15 slam through a moderate roof, and 3 slam through a conservative roof. Should we reject the hypothesis that the sacks of flour hit houses at random?

SOLUTION

Independent Assortment of Genes

The standard approach to testing for independent assortment of genes involves crossing individuals heterozygous for each gene with individuals homozygous recessive for both genes ( i.e. , a two-point testcross). Consider an individual with the AaBb genotype. Regardless of linkage, we expect half of the gametes to have the A allele and half the a allele. Similarly, we expect half to have the B allele and half the b allele. These expectations are drawn from Mendel's First Law: that alleles in heterozygotes segregate equally into gametes. If the alleles are independently assorting (and equally segregating), we expect 25% of the offspring to have each of the gametic types: AB , Ab , aB and ab. Therefore, since only recessive alleles are provided in the gametes from the homozygous recessive parent, we expect 25% of the offspring to have each of the four possible phenotypes. If the genes are not independently assorting, we expect the parental allele combinations to stay together more than 50% of the time. Thus, if the heterozygote has the AB/ab genotype, we expect more than 50% of the gametes to be AB or ab (parental), and we expect fewer than 50% to be Ab or aB (recombinant). Alternatively, if the heterozygote has the Ab/aB genotype, we expect the opposite: more than 50% Ab or aB and less than 50% AB or ab. The old-fashioned way to test for independent assortment by the two-point testcross involves two steps. First, one determines that there are more parental offspring than recombinant offspring. While it's possible to see the opposite (more recombinant than parental), this can not be explained by linkage; the simplest explanation would be selection favoring the recombinants. The second step is to determine if there are significantly more parental than recombinant offspring, since some deviation from expectations is always expected. If the testcross produced N offspring, one would expect 25% x N of each phenotype. The chi-square test would be performed as before.

Statistics test analysis and study, Lecture notes of Statistics

Related documents

Partial preview of the text

Download Statistics test analysis and study and more Lecture notes Statistics in PDF only on Docsity!

Statistical tests

Steps

Chi-Square Test

The Chi-square Distribution

necessary to talk about the actual chi-square distribution. The chi-square

distribution, itself, is based on a complicated mathematical formula. There

are many other distributions used by statisticians (for example, F and t )

that are also based on complicated mathematical formulas. Fortunately,

this is not our problem. Plenty of people have already done the relevant

calculations, and computers can do them very quickly today.

When we perform a statistical test using a test statistic , we make the

assumption that the test statistic follows a known probability distribution.

We somehow compare our observed and expected results, summarize

these comparisons in a single test statistic, and compare the value of the

test statistic to its supposed underlying distribution. Good test statistics

are easy to calculate and closely follow a known distribution. The various

chi-square tests (and the related G - tests) assume that the test statistic

follows the chi-square distribution.

The Chi-square Distribution

We flip a coin 20 times, getting 12 "heads"

and 8 "tails." Using the binomial distribution,

we can calculate the exact probability of

getting 12H/8T and any of the other possible

outcomes. Remember, for the binomial

distribution, we must define k (the number of

successes), N (the number of Bernoulli trials)

and p (the probability of success). Here, N is

20 and p is 0.5 (if our hypothesis is that the

coin is "fair"). The following table shows the

exact probability ( p ( k | pN ) for all possible

outcomes of the experiment. The probability

of 12 heads/8 tails is highlighted.

convention of failing to reject a hypothesis if p >0.05, we fail to reject the

hypothesis that the coin is fair.

It happens that doing this type of calculation, while tedious, can be

accomplished pretty easily -- especially if we know how to use a

spreadsheet program. However, we run into practical problems once the

numbers start to get large. We may find ourselves having to calculate

hundreds or thousands of individual binomial probabilities. Consider

testing the same hypothesis by flipping the coin 10,000 times. What is the

exact probability, based on the binomial distribution, of getting 4,

heads/5,135 tails or any outcome as far or farther from 5,000 heads/5,

tails? You should recognize that you'll be adding 9,732 individual

probabilities to get the p - value. You will also find that getting those

probabilities in the first place is often impossible. Try calculating 10,000!

(1 x 2 x 3 x ... x 9,998 x 9,999 x 10,000).

follows the chi-square distribution. Even with small sample sizes (like the

20 coin flips we used to test the hypothesis that the coin was fair), the chi-

square goodness-of-fit test works pretty well. The test statistic usually

referred to as "chi-square" (unfortunately, in my opinion) is calculated by

comparing observed results to expected results. The calculation is

straightforward. For each possible outcome, we first subtract the expected

number from the observed number. Note: we do not subtract

percentages, but the actual numbers! This is very important. After we do

this, we square the result (that is, multiply it by itself). Then we divide this

result by the expected number. We sum these values across all possible

outcome classes to calculate the chi-square test statistic.

The formula for the test statistic is basically this:

(both are 20). If you ever do this test and the columns do not add up to

the same total, you have done something wrong!

In this case, the sum of the last column is 0.8. For this type of test, the

number of degrees of freedom is simply the number of outcome classes

minus one. Since we have two outcome classes ("heads" and "tails"), we

have 1 degree of freedom. Going to the chi-square table, we look in the

row for 1 d.f. to see where the value 0.8 lies. It lies between 0.455 and

2.706. Therefore, we would say that 0.1< p <0.5. If we were to calculate the

p - value exactly, using a computer, we would say p =0.371. So the chi-

square test doesn't give us exactly the right answer. However, as sample

sizes increase, it does a better and better job. Also, p - values of 0.371 and

0.503 aren't qualitatively very different. In neither case would we be

inclined to reject our hypothesis.

TEST YOUR UNDERSTANDING

SOLUTION

Independent Assortment of Genes

Independent Assortment of Genes

equal segregation of alleles. That is, it assumes that the A allele is