Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics different chapters including various tests., Study notes of Statistics

Statistics and various chapters for tests.

Typology: Study notes

2019/2020

Uploaded on 09/22/2020

himanshi-swaroop
himanshi-swaroop 🇮🇳

5 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1. One Sample z Test
Following line automatically get excecuted when you click __File > Import Dataset > From
Text (readr)
library(readr)
Perfume_Volumes <- read_csv("D:/_R Getting Started/0 R Markdowns/Section
9/Perfume Volumes.csv")
## Parsed with column specification:
## cols(
## `Machine 1` = col_integer()
## )
View(Perfume_Volumes)
Show first 10 rows of the dataset
Perfume_Volumes
## # A tibble: 100 x 1
## `Machine 1`
## <int>
## 1 148
## 2 148
## 3 149
## 4 154
## 5 156
## 6 155
## 7 151
## 8 154
## 9 152
## 10 156
## # ... with 90 more rows
Using $ sign you can view the column Machine 1 as a vector, and then find the mean value
of this vector.
Perfume_Volumes$`Machine 1`
## [1] 148 148 149 154 156 155 151 154 152 156 156 152 156 154 151 155 151
## [18] 152 154 155 150 152 153 150 152 152 152 151 154 152 156 151 152 155
## [35] 148 150 155 151 153 152 149 151 151 149 149 151 150 148 152 149 152
## [52] 155 153 152 155 152 157 150 151 150 152 149 153 154 152 154 149 148
## [69] 151 149 149 155 156 151 155 152 150 153 156 156 155 153 148 149 148
## [86] 153 152 152 150 148 149 156 153 150 155 153 149 151 154 152
mean(Perfume_Volumes$`Machine 1`)
## [1] 152
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Statistics different chapters including various tests. and more Study notes Statistics in PDF only on Docsity!

1. One Sample z Test

Following line automatically get excecuted when you click __File > Import Dataset > From

Text (readr)

library (readr) Perfume_Volumes <- read_csv ("D:/_R Getting Started/0 R Markdowns/Section 9/Perfume Volumes.csv")

Parsed with column specification:

cols(

Machine 1 = col_integer()

)

View (Perfume_Volumes)

Show first 10 rows of the dataset

Perfume_Volumes

# A tibble: 100 x 1

Machine 1

1 148

2 148

3 149

4 154

5 156

6 155

7 151

8 154

9 152

10 156

# ... with 90 more rows

Using $ sign you can view the column Machine 1 as a vector, and then find the mean value

of this vector.

Perfume_Volumes $ Machine 1

[1] 148 148 149 154 156 155 151 154 152 156 156 152 156 154 151 155 151

[18] 152 154 155 150 152 153 150 152 152 152 151 154 152 156 151 152 155

[35] 148 150 155 151 153 152 149 151 151 149 149 151 150 148 152 149 1 52

[52] 155 153 152 155 152 157 150 151 150 152 149 153 154 152 154 149 148

[69] 151 149 149 155 156 151 155 152 150 153 156 156 155 153 148 149 148

[86] 153 152 152 150 148 149 156 153 150 155 153 149 151 154 152

mean (Perfume_Volumes $ Machine 1)

[1] 152

Example: Bottles are being produced with mean as 150 cc and standard deviation of 2 cc.

Sample of 100 bottles show the mean as 152. Has the mean volume increased? Check with

95% confidence level.

Bad news: There is no z.test function (unlike t.test). Hence we will perform z test step by step first, and then we will use an external package to conduct the z.test function.

Find the value of z calculated.

zvalue <- ( 152 - 150 ) / ( 2 /sqrt ( 100 )) zvalue

[1] 10

Using package BDSA - Basic Statistics and Data Analysis

install.packages(“BSDA”)

library (BSDA)

Loading required package: lattice

Attaching package: 'BSDA'

The following object is masked from 'package:datasets':

Orange

z.test (x = Perfume_Volumes $ Machine 1, alternative = "greater", sigma.x = 2 , mu = 150 )

One-sample z-Test

data: Perfume_Volumes$Machine 1

z = 10, p-value < 2.2e- 16

alternative hypothesis: true mean is greater than 150

95 percent confidence interval:

151.671 NA

sample estimates:

mean of x

152

2. One Sample t Test

Draw t distributions for different df

q <- seq (-4.0,4.0, by=0.1) q

Example: Bottles are being produced with mean as 150 cc and the population standard

deviation is unknown. Sample of 4 bottles show the volume as (151, 153, 152, 152).Has the

mean volume changes? Check with 95% confidence level.

vol <- c ( 151 , 153 , 152 , 152 ) t.test (x = vol, mu = 150 , conf.level = 0.95)

One Sample t-test

data: vol

t = 4.899, df = 3, p-value = 0.

alternative hypothesis: true mean is not equal to 150

95 percent confidence interval:

150.7008 153.

sample estimates:

mean of x

152

p is less than 0.05 and hence reject null hypothesis.

Hence we conclude that volume has changed from 150 cc.

Lets visualize this

library (visualize) visualize.t (stat= c (-4.899,4.899), df = 3 , section = "tails")

Four t functions - rt, pt, qt, dt

pt (q = - 4.899, df = 3 )

[1] 0.

pt (q = - 4.899, df = 3 ) + pt (q = 4.899, df = 3 , lower.tail = F)

[1] 0.

9.3 One Sample Variance Test - Chi Square

Example: 25 bottles were selected and their variance was 5. Has variance "increased" from

the historical 4 variance? (95% confidence level)

Bad News: chisq.test is for contigency tables and not suitable for one sample variance test.

Lets install an external package named EnvStats for this.

install.packages("EnvStats")

library (EnvStats)

Attaching package: 'EnvStats'

95% Confidence Interval: LCL = 3.

UCL = Inf

In the results, p value is greater than 0.05, hence we fail to reject the null hypothesis. Hence

there are not enough evidence that the variance has changed.

Performing one sample variance test in conventional way _

Calculated Chi Square

Calc <- ( 25 - 1 )* var (VolumeVar$Volumes) / 4 Calc

[1] 30

Remember 4 R functions for distributions - rchisq, pchisq, qchisq, dchisq

Use qchisq for 5% area on the right to find the critical value of chi suare.

crit <- qchisq (p=0.05, df= 24 , lower.tail = F) crit

[1] 36.

Plot Chi Square distribution, and show critical value (36.4) and calculated (30) using dchisq

x <- seq ( 1 , 50 , by = 1 ) y <- dchisq (x, 24 ) plot (y, type="l", xlab="Chi Sq", ylab="f(chi sq)") abline (v= 30 ) text ( 30 , 0.05, "Calculated") abline (v=crit) text (crit, 0.04, "Critical - 0.95")

Calculated value is falling in the acceptance zone and hence we fail to reject the null

hypothesis. Hence there are not enough evidence that the variance has changed.

Below part was not included in the lecture video.

Here is the bonus section, which you might want to try yourself.

Use visualize to draw the chi square distribution.

library (visualize) visualize.chisq (stat = 30 , df = 24 , section = "upper")

Machine 2 = col_integer()

)

View (Perfume_Volumes_2_Sample)

Two Sample z Test

z.test (x = Perfume_Volumes_2_Sample$Machine 1, y = Perfume_Volumes_2_Sample$Machine 2, sigma.x = sd (Perfume_Volumes_2_Sample$Machine 1), sigma.y = sd (Perfume_Volumes_2_Sample$Machine 2) )

Two-sample z-Test

data: Perfume_Volumes_2_Sample$Machine 1 and

Perfume_Volumes_2_Sample$Machine 2

z = - 3.5954, p-value = 0.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

- 1.5142221 - 0.

sample estimates:

mean of x mean of y

150.19 151.

Since p is less than 0.05 we reject the null hypothesis.

Box and Whisker Plot

boxplot (x = Perfume_Volumes_2_Sample)

Overlapping Histograms

hist (x = Perfume_Volumes_2_Sample$Machine 1, col= rgb ( 1 , 0 , 0 ,0.5), main = "Volumes by Machine 1 and 2", xlim = c ( 140 , 160 ), xlab = "Volume", ylab = "Frequency", ) hist (x = Perfume_Volumes_2_Sample$Machine 2, col= rgb ( 0 , 0 , 1 ,0.5), add = T) box ()

5 Two Sample t Test

5.1 When variance is equal

Use pooled variance

mc1 <- c ( 150 , 152 , 154 , 152 , 151 ) mc2 <- c ( 156 , 155 , 158 , 155 , 154 )

How to check equality of variance? - F Test will be covered later in this course

var.test (x = mc1, y = mc2)

F test to compare two variances

data: mc1 and mc

F = 0.95652, num df = 4, denom df = 4, p-value = 0.

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.09959069 9.

sample estimates:

ratio of variances

0.

Since the value of p is greater than 0.05, we fail to reject the null hypothesis. Variances of

mc1 and mc2 can be considered to be equal.

Conduct t test with equal variance

t.test (x = mc1, y = mc2, var.equal = T)

Two Sample t-test

data: mc1 and mc

t = - 4.0056, df = 8, p-value = 0.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

- 5.987668 - 1.

sample estimates:

mean of x mean of y

151.8 155.

In this two sample t test the value of p is low, hence null gets rejected.

There is a difference in the volumes of machine 1 and machine 3.

Visual Representation of data

boxplot (mc1, mc2)

5.2 When variance is NOT equal

mc1 <- c ( 150 , 152 , 154 , 152 , 151 ) mc3 <- c ( 144 , 162 , 177 , 150 , 140 )

How to check equality of variance? - F Test will be covered later in this course

var.test (x = mc1, y = mc3)

F test to compare two variances

data: mc1 and mc

F = 0.0097431, num df = 4, denom df = 4, p-value = 0.

alternative hypothesis: true ratio of variances is not equal to 1

95 percent confidence interval:

0.001014431 0.

sample estimates:

ratio of variances

0.

Since the value of p is less than 0.05, we reject the null hypothesis. Variances of mc1 and

mc3 can be considered to be NOT equal.

Conduct t test considering un-equal variance.

t.test (x = mc1, y = mc3, var.equal = F)

5.3 Paired t Test

bp.before <- c ( 120 , 122 , 143 , 100 , 109 ) bp.after <- c ( 122 , 120 , 141 , 109 , 109 ) t.test (x = bp.before, y = bp.after, paired = T)

Paired t-test

data: bp.before and bp.after

t = - 0.68641, df = 4, p-value = 0.

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

- 7.062859 4.

sample estimates:

mean of the differences

- 1.

Since the value of p is greater than 0.05, we fail to reject the null hypothesis. There are not

enough evidences to prove that this medicine has any effect on the blood pressure.

Visualize

bp.diff <- bp.after - bp.before bp.diff

[1] 2 - 2 - 2 9 0

boxplot (bp.diff, main= "Effect of Medicine on BP", ylab = "Post Medicine - BP Difference")

6 Two Sample Variance Test - F Test

Example: We took 8 samples from machine A and the standard deviation was 1.1. From

machine B we randomly picked 5 samples and the variance was 11. Is there a difference in

the variance for machine A and B? Check with 90% confidence level.

mca <- c ( 150 , 150 , 151 , 149 , 151 , 151 , 148 , 151 ) sd (mca)

[1] 1.

mean (mca)

[1] 150.

mcb <- c ( 152 , 146 , 152 , 150 , 155 ) var (mcb)

[1] 11

mean (mcb)

[1] 151

x <- seq ( 0 , 10 ) df (x, df1 = 4 , df2 = 7 )

[1] 0.000000000 0.428138135 0.155514809 0.063565304 0.

[6] 0.015336031 0.008608067 0.005151901 0.003246970 0.

[11] 0.

plot ( df (x, df1 = 4 , df2 = 7 ), type="l", xlab="F Value", ylab="Density")

Visualization

boxplot (mca, mcb)

7 ANOVA

mc1 <- c ( 150 , 151 , 152 , 152 , 151 , 150 ) mc2 <- c ( 153 , 152 , 148 , 151 , 149 , 152 ) mc3 <- c ( 156 , 154 , 155 , 156 , 157 , 155 )

Data preparation - create data frame

volume <- c (mc1, mc2, mc3) machine <- rep ("machine1", times= 6 ) machine

[1] "machine1" "machine1" "machine1" "machine1" "machine1" "machine1"

# machine <- rep ("machine1", times= length (mc1)) machine

[1] "machine1" "machine1" "machine1" "machine1" "machine1" "machine1"

machine <- rep ( c ("machine1", "machine2", "machine3"), times= c ( length (mc1), length (mc2), length (mc3))) machine