Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Statistics different chapters including various tests., Study notes of Statistics

The LNM Institute of Information Technology Statistics

Statistics and various chapters for tests.

Typology: Study notes

2019/2020

Uploaded on 09/22/2020

himanshi-swaroop 🇮🇳

5 documents

1 / 37

This page cannot be seen from the preview

Don't miss anything!

1. One Sample z Test

Following line automatically get excecuted when you click __File > Import Dataset > From

Text (readr)

library(readr)

Perfume_Volumes <- read_csv("D:/_R Getting Started/0 R Markdowns/Section

9/Perfume Volumes.csv")

## Parsed with column specification:

## cols(

## `Machine 1` = col_integer()

## )

View(Perfume_Volumes)

Show first 10 rows of the dataset

Perfume_Volumes

## # A tibble: 100 x 1

## `Machine 1`

## <int>

## 1 148

## 2 148

## 3 149

## 4 154

## 5 156

## 6 155

## 7 151

## 8 154

## 9 152

## 10 156

## # ... with 90 more rows

Using $ sign you can view the column Machine 1 as a vector, and then find the mean value

of this vector.

Perfume_Volumes$`Machine 1`

## [1] 148 148 149 154 156 155 151 154 152 156 156 152 156 154 151 155 151

## [18] 152 154 155 150 152 153 150 152 152 152 151 154 152 156 151 152 155

## [35] 148 150 155 151 153 152 149 151 151 149 149 151 150 148 152 149 152

## [52] 155 153 152 155 152 157 150 151 150 152 149 153 154 152 154 149 148

## [69] 151 149 149 155 156 151 155 152 150 153 156 156 155 153 148 149 148

## [86] 153 152 152 150 148 149 156 153 150 155 153 149 151 154 152

mean(Perfume_Volumes$`Machine 1`)

## [1] 152

Partial preview of the text

Download Statistics different chapters including various tests. and more Study notes Statistics in PDF only on Docsity!

1. One Sample z Test

Following line automatically get excecuted when you click __File > Import Dataset > From

Text (readr)

library (readr) Perfume_Volumes <- read_csv ("D:/_R Getting Started/0 R Markdowns/Section 9/Perfume Volumes.csv")

Parsed with column specification:

cols(

`Machine 1` = col_integer()

)

View (Perfume_Volumes)

Show first 10 rows of the dataset

Perfume_Volumes

# A tibble: 100 x 1

`Machine 1`

1 148

2 148

3 149

4 154

5 156

6 155

7 151

8 154

9 152

10 156

# ... with 90 more rows

Using $ sign you can view the column Machine 1 as a vector, and then find the mean value

of this vector.

Perfume_Volumes $ Machine 1

[1] 148 148 149 154 156 155 151 154 152 156 156 152 156 154 151 155 151

[18] 152 154 155 150 152 153 150 152 152 152 151 154 152 156 151 152 155

[35] 148 150 155 151 153 152 149 151 151 149 149 151 150 148 152 149 1 52

[52] 155 153 152 155 152 157 150 151 150 152 149 153 154 152 154 149 148

[69] 151 149 149 155 156 151 155 152 150 153 156 156 155 153 148 149 148

[86] 153 152 152 150 148 149 156 153 150 155 153 149 151 154 152

mean (Perfume_Volumes $ Machine 1)

[1] 152

Example: Bottles are being produced with mean as 150 cc and standard deviation of 2 cc.

Sample of 100 bottles show the mean as 152. Has the mean volume increased? Check with

95% confidence level.

Bad news: There is no z.test function (unlike t.test). Hence we will perform z test step by step first, and then we will use an external package to conduct the z.test function.

Find the value of z calculated.

zvalue <- ( 152 - 150 ) / ( 2 /sqrt ( 100 )) zvalue

[1] 10

Using package BDSA - Basic Statistics and Data Analysis

install.packages(“BSDA”)

library (BSDA)

Loading required package: lattice

Attaching package: 'BSDA'

The following object is masked from 'package:datasets':

Orange

z.test (x = Perfume_Volumes $ Machine 1, alternative = "greater", sigma.x = 2 , mu = 150 )

One-sample z-Test

data: Perfume_Volumes$`Machine 1`

z = 10, p-value < 2.2e- 16

alternative hypothesis: true mean is greater than 150

95 percent confidence interval:

151.671 NA

sample estimates:

mean of x

152 2. One Sample t Test

Draw t distributions for different df

q <- seq (-4.0,4.0, by=0.1) q

Example: Bottles are being produced with mean as 150 cc and the population standard

deviation is unknown. Sample of 4 bottles show the volume as (151, 153, 152, 152).Has the

mean volume changes? Check with 95% confidence level.

vol <- c ( 151 , 153 , 152 , 152 ) t.test (x = vol, mu = 150 , conf.level = 0.95)

One Sample t-test

data: vol

t = 4.899, df = 3, p-value = 0.

alternative hypothesis: true mean is not equal to 150

95 percent confidence interval:

150.7008 153.

sample estimates:

mean of x

152 p is less than 0.05 and hence reject null hypothesis.

Hence we conclude that volume has changed from 150 cc.

Lets visualize this

library (visualize) visualize.t (stat= c (-4.899,4.899), df = 3 , section = "tails")

Four t functions - rt, pt, qt, dt

pt (q = - 4.899, df = 3 )

[1] 0.

pt (q = - 4.899, df = 3 ) + pt (q = 4.899, df = 3 , lower.tail = F)

[1] 0.

9.3 One Sample Variance Test - Chi Square

Example: 25 bottles were selected and their variance was 5. Has variance "increased" from

the historical 4 variance? (95% confidence level)

Bad News: chisq.test is for contigency tables and not suitable for one sample variance test.

Lets install an external package named EnvStats for this.

install.packages("EnvStats")

library (EnvStats)

Attaching package: 'EnvStats'

95% Confidence Interval: LCL = 3.

UCL = Inf

In the results, p value is greater than 0.05, hence we fail to reject the null hypothesis. Hence

there are not enough evidence that the variance has changed.

Performing one sample variance test in conventional way _

Calculated Chi Square

Calc <- ( 25 - 1 )* var (VolumeVar$Volumes) / 4 Calc

[1] 30

Remember 4 R functions for distributions - rchisq, pchisq, qchisq, dchisq

Use qchisq for 5% area on the right to find the critical value of chi suare.

crit <- qchisq (p=0.05, df= 24 , lower.tail = F) crit

[1] 36.

Plot Chi Square distribution, and show critical value (36.4) and calculated (30) using dchisq

x <- seq ( 1 , 50 , by = 1 ) y <- dchisq (x, 24 ) plot (y, type="l", xlab="Chi Sq", ylab="f(chi sq)") abline (v= 30 ) text ( 30 , 0.05, "Calculated") abline (v=crit) text (crit, 0.04, "Critical - 0.95")

Calculated value is falling in the acceptance zone and hence we fail to reject the null

hypothesis. Hence there are not enough evidence that the variance has changed.

Below part was not included in the lecture video.

Here is the bonus section, which you might want to try yourself.

Use visualize to draw the chi square distribution.

library (visualize) visualize.chisq (stat = 30 , df = 24 , section = "upper")

`Machine 2` = col_integer()

)

View (Perfume_Volumes_2_Sample)

Two Sample z Test

z.test (x = Perfume_Volumes_2_Sample$Machine 1, y = Perfume_Volumes_2_Sample$Machine 2, sigma.x = sd (Perfume_Volumes_2_Sample$Machine 1), sigma.y = sd (Perfume_Volumes_2_Sample$Machine 2) )

Two-sample z-Test

data: Perfume_Volumes_2_Sample$`Machine 1` and

Perfume_Volumes_2_Sample$Machine 2

z = - 3.5954, p-value = 0.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

- 1.5142221 - 0.

sample estimates:

mean of x mean of y

150.19 151.

Since p is less than 0.05 we reject the null hypothesis.

Box and Whisker Plot

boxplot (x = Perfume_Volumes_2_Sample)

Overlapping Histograms

hist (x = Perfume_Volumes_2_Sample$Machine 1, col= rgb ( 1 , 0 , 0 ,0.5), main = "Volumes by Machine 1 and 2", xlim = c ( 140 , 160 ), xlab = "Volume", ylab = "Frequency", ) hist (x = Perfume_Volumes_2_Sample$Machine 2, col= rgb ( 0 , 0 , 1 ,0.5), add = T) box ()

5 Two Sample t Test

5.1 When variance is equal

Use pooled variance

mc1 <- c ( 150 , 152 , 154 , 152 , 151 ) mc2 <- c ( 156 , 155 , 158 , 155 , 154 )

How to check equality of variance? - F Test will be covered later in this course

var.test (x = mc1, y = mc2)

F test to compare two variances

data: mc1 and mc

F = 0.95652, num df = 4, denom df = 4, p-value = 0.

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.09959069 9.

sample estimates:

ratio of variances

0. Since the value of p is greater than 0.05, we fail to reject the null hypothesis. Variances of

mc1 and mc2 can be considered to be equal.

Conduct t test with equal variance

t.test (x = mc1, y = mc2, var.equal = T)

Two Sample t-test

data: mc1 and mc

t = - 4.0056, df = 8, p-value = 0.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

- 5.987668 - 1.

sample estimates:

mean of x mean of y

151.8 155.

In this two sample t test the value of p is low, hence null gets rejected.

There is a difference in the volumes of machine 1 and machine 3.

Visual Representation of data

boxplot (mc1, mc2)

5.2 When variance is NOT equal

mc1 <- c ( 150 , 152 , 154 , 152 , 151 ) mc3 <- c ( 144 , 162 , 177 , 150 , 140 )

How to check equality of variance? - F Test will be covered later in this course

var.test (x = mc1, y = mc3)

F test to compare two variances

data: mc1 and mc

F = 0.0097431, num df = 4, denom df = 4, p-value = 0.

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.001014431 0.

sample estimates:

ratio of variances

0. Since the value of p is less than 0.05, we reject the null hypothesis. Variances of mc1 and

mc3 can be considered to be NOT equal.

Conduct t test considering un-equal variance.

t.test (x = mc1, y = mc3, var.equal = F)

5.3 Paired t Test

bp.before <- c ( 120 , 122 , 143 , 100 , 109 ) bp.after <- c ( 122 , 120 , 141 , 109 , 109 ) t.test (x = bp.before, y = bp.after, paired = T)

Paired t-test

data: bp.before and bp.after

t = - 0.68641, df = 4, p-value = 0.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

- 7.062859 4.

sample estimates:

mean of the differences

- 1.

Since the value of p is greater than 0.05, we fail to reject the null hypothesis. There are not

enough evidences to prove that this medicine has any effect on the blood pressure.

Visualize

bp.diff <- bp.after - bp.before bp.diff

[1] 2 - 2 - 2 9 0

boxplot (bp.diff, main= "Effect of Medicine on BP", ylab = "Post Medicine - BP Difference")

6 Two Sample Variance Test - F Test

Example: We took 8 samples from machine A and the standard deviation was 1.1. From

machine B we randomly picked 5 samples and the variance was 11. Is there a difference in

the variance for machine A and B? Check with 90% confidence level.

mca <- c ( 150 , 150 , 151 , 149 , 151 , 151 , 148 , 151 ) sd (mca)

[1] 1.

mean (mca)

[1] 150.

mcb <- c ( 152 , 146 , 152 , 150 , 155 ) var (mcb)

[1] 11

mean (mcb)

[1] 151

x <- seq ( 0 , 10 ) df (x, df1 = 4 , df2 = 7 )

[1] 0.000000000 0.428138135 0.155514809 0.063565304 0.

[6] 0.015336031 0.008608067 0.005151901 0.003246970 0.

[11] 0.

plot ( df (x, df1 = 4 , df2 = 7 ), type="l", xlab="F Value", ylab="Density")

Visualization

boxplot (mca, mcb)

7 ANOVA

mc1 <- c ( 150 , 151 , 152 , 152 , 151 , 150 ) mc2 <- c ( 153 , 152 , 148 , 151 , 149 , 152 ) mc3 <- c ( 156 , 154 , 155 , 156 , 157 , 155 )

Data preparation - create data frame

volume <- c (mc1, mc2, mc3) machine <- rep ("machine1", times= 6 ) machine

[1] "machine1" "machine1" "machine1" "machine1" "machine1" "machine1"

# machine <- rep ("machine1", times= length (mc1)) machine

[1] "machine1" "machine1" "machine1" "machine1" "machine1" "machine1"

machine <- rep ( c ("machine1", "machine2", "machine3"), times= c ( length (mc1), length (mc2), length (mc3))) machine

Statistics different chapters including various tests., Study notes of Statistics

Related documents

Partial preview of the text

Download Statistics different chapters including various tests. and more Study notes Statistics in PDF only on Docsity!

1. One Sample z Test

Following line automatically get excecuted when you click __File > Import Dataset > From

Text (readr)

Parsed with column specification:

cols(

Machine 1 = col_integer()

)

Show first 10 rows of the dataset

# A tibble: 100 x 1

Machine 1

1 148

2 148

3 149

4 154

5 156

6 155

7 151

8 154

9 152

10 156

# ... with 90 more rows

Using $ sign you can view the column Machine 1 as a vector, and then find the mean value

of this vector.

[1] 148 148 149 154 156 155 151 154 152 156 156 152 156 154 151 155 151

[18] 152 154 155 150 152 153 150 152 152 152 151 154 152 156 151 152 155

[35] 148 150 155 151 153 152 149 151 151 149 149 151 150 148 152 149 1 52

[52] 155 153 152 155 152 157 150 151 150 152 149 153 154 152 154 149 148

[69] 151 149 149 155 156 151 155 152 150 153 156 156 155 153 148 149 148

[86] 153 152 152 150 148 149 156 153 150 155 153 149 151 154 152

[1] 152

Example: Bottles are being produced with mean as 150 cc and standard deviation of 2 cc.

Sample of 100 bottles show the mean as 152. Has the mean volume increased? Check with

95% confidence level.

Find the value of z calculated.

[1] 10

Using package BDSA - Basic Statistics and Data Analysis

install.packages(“BSDA”)

Loading required package: lattice

Attaching package: 'BSDA'

The following object is masked from 'package:datasets':

Orange

One-sample z-Test

data: Perfume_Volumes$Machine 1

z = 10, p-value < 2.2e- 16

alternative hypothesis: true mean is greater than 150

95 percent confidence interval:

151.671 NA

sample estimates:

mean of x

152

2. One Sample t Test

Draw t distributions for different df

Example: Bottles are being produced with mean as 150 cc and the population standard

deviation is unknown. Sample of 4 bottles show the volume as (151, 153, 152, 152).Has the

mean volume changes? Check with 95% confidence level.

One Sample t-test

data: vol

t = 4.899, df = 3, p-value = 0.

alternative hypothesis: true mean is not equal to 150

95 percent confidence interval:

150.7008 153.

sample estimates:

mean of x

152

p is less than 0.05 and hence reject null hypothesis.

Hence we conclude that volume has changed from 150 cc.

Lets visualize this

Four t functions - rt, pt, qt, dt

[1] 0.

[1] 0.

9.3 One Sample Variance Test - Chi Square

Example: 25 bottles were selected and their variance was 5. Has variance "increased" from

the historical 4 variance? (95% confidence level)

Lets install an external package named EnvStats for this.

install.packages("EnvStats")

Attaching package: 'EnvStats'

`Machine 1` = col_integer()

`Machine 1`

data: Perfume_Volumes$`Machine 1`

`Machine 2` = col_integer()

data: Perfume_Volumes_2_Sample$`Machine 1` and