Download Statistics different chapters including various tests. and more Study notes Statistics in PDF only on Docsity!
1. One Sample z Test
Following line automatically get excecuted when you click __File > Import Dataset > From
Text (readr)
library (readr) Perfume_Volumes <- read_csv ("D:/_R Getting Started/0 R Markdowns/Section 9/Perfume Volumes.csv")
Parsed with column specification:
cols(
Machine 1
= col_integer()
)
View (Perfume_Volumes)
Show first 10 rows of the dataset
Perfume_Volumes
# A tibble: 100 x 1
Machine 1
1 148
2 148
3 149
4 154
5 156
6 155
7 151
8 154
9 152
10 156
# ... with 90 more rows
Using $ sign you can view the column Machine 1 as a vector, and then find the mean value
of this vector.
Perfume_Volumes $ Machine 1
[1] 148 148 149 154 156 155 151 154 152 156 156 152 156 154 151 155 151
[18] 152 154 155 150 152 153 150 152 152 152 151 154 152 156 151 152 155
[35] 148 150 155 151 153 152 149 151 151 149 149 151 150 148 152 149 1 52
[52] 155 153 152 155 152 157 150 151 150 152 149 153 154 152 154 149 148
[69] 151 149 149 155 156 151 155 152 150 153 156 156 155 153 148 149 148
[86] 153 152 152 150 148 149 156 153 150 155 153 149 151 154 152
mean (Perfume_Volumes $ Machine 1
)
[1] 152
Example: Bottles are being produced with mean as 150 cc and standard deviation of 2 cc.
Sample of 100 bottles show the mean as 152. Has the mean volume increased? Check with
95% confidence level.
Bad news: There is no z.test function (unlike t.test). Hence we will perform z test step by step first, and then we will use an external package to conduct the z.test function.
Find the value of z calculated.
zvalue <- ( 152 - 150 ) / ( 2 /sqrt ( 100 )) zvalue
[1] 10
Using package BDSA - Basic Statistics and Data Analysis
install.packages(“BSDA”)
library (BSDA)
Loading required package: lattice
Attaching package: 'BSDA'
The following object is masked from 'package:datasets':
Orange
z.test (x = Perfume_Volumes $ Machine 1
, alternative = "greater", sigma.x = 2 , mu = 150 )
One-sample z-Test
data: Perfume_Volumes$Machine 1
z = 10, p-value < 2.2e- 16
alternative hypothesis: true mean is greater than 150
95 percent confidence interval:
151.671 NA
sample estimates:
mean of x
152
2. One Sample t Test
Draw t distributions for different df
q <- seq (-4.0,4.0, by=0.1) q
Example: Bottles are being produced with mean as 150 cc and the population standard
deviation is unknown. Sample of 4 bottles show the volume as (151, 153, 152, 152).Has the
mean volume changes? Check with 95% confidence level.
vol <- c ( 151 , 153 , 152 , 152 ) t.test (x = vol, mu = 150 , conf.level = 0.95)
One Sample t-test
data: vol
t = 4.899, df = 3, p-value = 0.
alternative hypothesis: true mean is not equal to 150
95 percent confidence interval:
150.7008 153.
sample estimates:
mean of x
152
p is less than 0.05 and hence reject null hypothesis.
Hence we conclude that volume has changed from 150 cc.
Lets visualize this
library (visualize) visualize.t (stat= c (-4.899,4.899), df = 3 , section = "tails")
Four t functions - rt, pt, qt, dt
pt (q = - 4.899, df = 3 )
[1] 0.
pt (q = - 4.899, df = 3 ) + pt (q = 4.899, df = 3 , lower.tail = F)
[1] 0.
9.3 One Sample Variance Test - Chi Square
Example: 25 bottles were selected and their variance was 5. Has variance "increased" from
the historical 4 variance? (95% confidence level)
Bad News: chisq.test is for contigency tables and not suitable for one sample variance test.
Lets install an external package named EnvStats for this.
install.packages("EnvStats")
library (EnvStats)
Attaching package: 'EnvStats'
95% Confidence Interval: LCL = 3.
UCL = Inf
In the results, p value is greater than 0.05, hence we fail to reject the null hypothesis. Hence
there are not enough evidence that the variance has changed.
Performing one sample variance test in conventional way _
Calculated Chi Square
Calc <- ( 25 - 1 )* var (VolumeVar$Volumes) / 4 Calc
[1] 30
Remember 4 R functions for distributions - rchisq, pchisq, qchisq, dchisq
Use qchisq for 5% area on the right to find the critical value of chi suare.
crit <- qchisq (p=0.05, df= 24 , lower.tail = F) crit
[1] 36.
Plot Chi Square distribution, and show critical value (36.4) and calculated (30) using dchisq
x <- seq ( 1 , 50 , by = 1 ) y <- dchisq (x, 24 ) plot (y, type="l", xlab="Chi Sq", ylab="f(chi sq)") abline (v= 30 ) text ( 30 , 0.05, "Calculated") abline (v=crit) text (crit, 0.04, "Critical - 0.95")
Calculated value is falling in the acceptance zone and hence we fail to reject the null
hypothesis. Hence there are not enough evidence that the variance has changed.
Below part was not included in the lecture video.
Here is the bonus section, which you might want to try yourself.
Use visualize to draw the chi square distribution.
library (visualize) visualize.chisq (stat = 30 , df = 24 , section = "upper")
Machine 2
= col_integer()
)
View (Perfume_Volumes_2_Sample)
Two Sample z Test
z.test (x = Perfume_Volumes_2_Sample$Machine 1
, y = Perfume_Volumes_2_Sample$Machine 2
, sigma.x = sd (Perfume_Volumes_2_Sample$Machine 1
), sigma.y = sd (Perfume_Volumes_2_Sample$Machine 2
) )
Two-sample z-Test
data: Perfume_Volumes_2_Sample$Machine 1
and
Perfume_Volumes_2_Sample$Machine 2
z = - 3.5954, p-value = 0.
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
- 1.5142221 - 0.
sample estimates:
mean of x mean of y
150.19 151.
Since p is less than 0.05 we reject the null hypothesis.
Box and Whisker Plot
boxplot (x = Perfume_Volumes_2_Sample)
Overlapping Histograms
hist (x = Perfume_Volumes_2_Sample$Machine 1
, col= rgb ( 1 , 0 , 0 ,0.5), main = "Volumes by Machine 1 and 2", xlim = c ( 140 , 160 ), xlab = "Volume", ylab = "Frequency", ) hist (x = Perfume_Volumes_2_Sample$Machine 2
, col= rgb ( 0 , 0 , 1 ,0.5), add = T) box ()
5 Two Sample t Test
5.1 When variance is equal
Use pooled variance
mc1 <- c ( 150 , 152 , 154 , 152 , 151 ) mc2 <- c ( 156 , 155 , 158 , 155 , 154 )
How to check equality of variance? - F Test will be covered later in this course
var.test (x = mc1, y = mc2)
F test to compare two variances
data: mc1 and mc
F = 0.95652, num df = 4, denom df = 4, p-value = 0.
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.09959069 9.
sample estimates:
ratio of variances
0.
Since the value of p is greater than 0.05, we fail to reject the null hypothesis. Variances of
mc1 and mc2 can be considered to be equal.
Conduct t test with equal variance
t.test (x = mc1, y = mc2, var.equal = T)
Two Sample t-test
data: mc1 and mc
t = - 4.0056, df = 8, p-value = 0.
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
- 5.987668 - 1.
sample estimates:
mean of x mean of y
151.8 155.
In this two sample t test the value of p is low, hence null gets rejected.
There is a difference in the volumes of machine 1 and machine 3.
Visual Representation of data
boxplot (mc1, mc2)
5.2 When variance is NOT equal
mc1 <- c ( 150 , 152 , 154 , 152 , 151 ) mc3 <- c ( 144 , 162 , 177 , 150 , 140 )
How to check equality of variance? - F Test will be covered later in this course
var.test (x = mc1, y = mc3)
F test to compare two variances
data: mc1 and mc
F = 0.0097431, num df = 4, denom df = 4, p-value = 0.
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.001014431 0.
sample estimates:
ratio of variances
0.
Since the value of p is less than 0.05, we reject the null hypothesis. Variances of mc1 and
mc3 can be considered to be NOT equal.
Conduct t test considering un-equal variance.
t.test (x = mc1, y = mc3, var.equal = F)
5.3 Paired t Test
bp.before <- c ( 120 , 122 , 143 , 100 , 109 ) bp.after <- c ( 122 , 120 , 141 , 109 , 109 ) t.test (x = bp.before, y = bp.after, paired = T)
Paired t-test
data: bp.before and bp.after
t = - 0.68641, df = 4, p-value = 0.
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
- 7.062859 4.
sample estimates:
mean of the differences
- 1.
Since the value of p is greater than 0.05, we fail to reject the null hypothesis. There are not
enough evidences to prove that this medicine has any effect on the blood pressure.
Visualize
bp.diff <- bp.after - bp.before bp.diff
[1] 2 - 2 - 2 9 0
boxplot (bp.diff, main= "Effect of Medicine on BP", ylab = "Post Medicine - BP Difference")
6 Two Sample Variance Test - F Test
Example: We took 8 samples from machine A and the standard deviation was 1.1. From
machine B we randomly picked 5 samples and the variance was 11. Is there a difference in
the variance for machine A and B? Check with 90% confidence level.
mca <- c ( 150 , 150 , 151 , 149 , 151 , 151 , 148 , 151 ) sd (mca)
[1] 1.
mean (mca)
[1] 150.
mcb <- c ( 152 , 146 , 152 , 150 , 155 ) var (mcb)
[1] 11
mean (mcb)
[1] 151
x <- seq ( 0 , 10 ) df (x, df1 = 4 , df2 = 7 )
[1] 0.000000000 0.428138135 0.155514809 0.063565304 0.
[6] 0.015336031 0.008608067 0.005151901 0.003246970 0.
[11] 0.
plot ( df (x, df1 = 4 , df2 = 7 ), type="l", xlab="F Value", ylab="Density")
Visualization
boxplot (mca, mcb)
7 ANOVA
mc1 <- c ( 150 , 151 , 152 , 152 , 151 , 150 ) mc2 <- c ( 153 , 152 , 148 , 151 , 149 , 152 ) mc3 <- c ( 156 , 154 , 155 , 156 , 157 , 155 )
Data preparation - create data frame
volume <- c (mc1, mc2, mc3) machine <- rep ("machine1", times= 6 ) machine
[1] "machine1" "machine1" "machine1" "machine1" "machine1" "machine1"
# machine <- rep ("machine1", times= length (mc1)) machine
[1] "machine1" "machine1" "machine1" "machine1" "machine1" "machine1"
machine <- rep ( c ("machine1", "machine2", "machine3"), times= c ( length (mc1), length (mc2), length (mc3))) machine