Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad

Análisis estadístico de datos de torque de tapas y costo de energía familiar en R, Apuntes de Estadística

El análisis estadístico de datos de torque de tapas y costo de energía familiar utilizando herramientas de análisis estadístico en r. El documento incluye el código r para la generación de gráficos y pruebas de hipótesis, así como los resultados obtenidos. El análisis de torque de tapas se realiza para dos máquinas diferentes y se evalúa la distribución, la dispersión y los valores centrales. El análisis de costo de energía familiar se realiza para 25 familias y se evalúa si ha cambiado con respecto al año anterior, cuando el costo promedio mensual era de $200.

Tipo: Apuntes

2023/2024

Subido el 18/03/2024

ubaldo-moran-paniagua
ubaldo-moran-paniagua 🇲🇽

1 documento

1 / 12

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Activity 2. Basic Statistics in R
Selected Topics 1
Spring 2022
The due date is February 25th at 11:59 pm.
Instructions: This activity can be done in pairs or thirds. Only one will submit the activity with the name of
both members. You must send a screenshot of the teams meeting with the video camera turned on—this
requirement is evidence of teamwork.
Names: Rebeca Paola Aguilar Jaimes- 159337
Ubaldo de Jesús Morán Paniagua 159580
For each problem, analyze the information using different statistical analysis tools in R to answer the
questions. For each used tool, you must provide specific conclusions. If you do not write your
conclusions according to each analysis, the solution does not count for your score.
Problem 1. Cap removal torque data
A quality control engineer needs to ensure that the caps on shampoo bottles are fastened
correctly. If the caps are fastened too loosely, they may fall off during shipping. If they
are fastened too tightly, they may be too difficult to remove. The target torque value for
fastening the caps is 18. The engineer collects a random sample of 68 bottles and tests the
amount of torque that is needed to remove the caps.
pf3
pf4
pf5
pf8
pf9
pfa

Vista previa parcial del texto

¡Descarga Análisis estadístico de datos de torque de tapas y costo de energía familiar en R y más Apuntes en PDF de Estadística solo en Docsity!

Activity 2. Basic Statistics in R

Selected Topics 1

Spring 2022

The due date is February 25th^ at 11:59 pm.

Instructions : This activity can be done in pairs or thirds. Only one will submit the activity with the name of both members. You must send a screenshot of the teams meeting with the video camera turned on—this requirement is evidence of teamwork. Names: Rebeca Paola Aguilar Jaimes- 159337 Ubaldo de Jesús Morán Paniagua 159580 For each problem, analyze the information using different statistical analysis tools in R to answer the questions. For each used tool, you must provide specific conclusions. If you do not write your conclusions according to each analysis, the solution does not count for your score.

Problem 1. Cap removal torque data

A quality control engineer needs to ensure that the caps on shampoo bottles are fastened correctly. If the caps are fastened too loosely, they may fall off during shipping. If they are fastened too tightly, they may be too difficult to remove. The target torque value for fastening the caps is 18. The engineer collects a random sample of 68 bottles and tests the amount of torque that is needed to remove the caps.

Column Description Torque The torque that is needed to remove the cap Machin e The machine that tightened the cap: 1 or 2 a) Use a boxplot grouping the torque in two categories, one for machine 1 and the other for machine 2. Use the results to make your conclusions about dispersion, distribution, central values, etc.

INPUT #First

CapTorque <- read.csv("~/R/DataSets/CapTorque.csv")

View(CapTorque)

machine_type1 = CapTorque$Torque[CapTorque$Machine == 1]

machine_type2 = CapTorque$Torque[CapTorque$Machine == 2]

boxplot(machine_type1,machine_type2,col=c("green","yellow"))

summary(machine_type1)

summary(machine_type2)

library(pastecs)

stat.desc(machine_type1)

stat.desc(machine_type2)

#Prueba de Hipótesis máquina 1

xbarra1=mean(machine_type1)

mu1=

sd1=sd(machine_type1)

n1=length(machine_type1)

z1=(xbarra1-mu1)/(sd1/sqrt(n1))

z

alpha1=(0.05)/

alpha

z.alpha1=qnorm(1-alpha1)

z.alpha

pval1=(1-pnorm(z1))*

pval

#Se concluye que la máquina 1 esta obteniendo el valor de torque objetivo para

las tapas

#Prueba de Hipótesis máquina 2

xbarra2=mean(machine_type2)

mu2=

sd2=sd(machine_type2)

n2=length(machine_type2)

z2=(xbarra2-mu2)/(sd2/sqrt(n2))

z

alpha2=(0.05)/

alpha

[1] 1.

> pval1=(1-pnorm(z1))*

> pval

[1] 0.

> #Se concluye que la máquina 1 esta obteniendo el valor de torque objetivo

para las tapas

> #Prueba de Hipótesis máquina 2

> xbarra2=mean(machine_type2)

> mu2=

> sd2=sd(machine_type2)

> n2=length(machine_type2)

> z2=(xbarra2-mu2)/(sd2/sqrt(n2))

> z

[1] 4.

> alpha2=(0.05)/

> alpha

[1] 0.

> z.alpha2=qnorm(1-alpha2)

> z.alpha

[1] 1.

> pval2=(1-pnorm(z2))*

> pval

[1] 8.788042e-

We can see that the Torque values for Machine 1 are not as spread out as the values for Machine 2. We can also see that the data for Machine 1 follow a normal distribution skewed to the left and the data for Machine 2 follow a normal distribution. an almost uniform distribution. According to the test statistic, assuming that it follows a normal distribution and constant variance for Machine 1 and 2, we find that Machine 1 adjusts to the objective value equivalent to 18 of the fixation of the covers, this is verified by having carried out a test two tails with a significance level of 0.05, obtaining a p value equivalent to 0.36. While for Machine 2 it obtained a very small p-value (8.788042e-07), since it is a value lower than the level of significance (alpha), it does not meet the objective value of fixation. b) Finally, determine the mean and the standard deviation of each machine's torque and make your final conclusions about the problem. You can use the predefined functions in R to make this analysis. Finally the Machine that has the least variability, an almost normal distribution (skewed to the right) and that is closer to the target of 18 to fasten the caps was number 1,

because its mean needed to remove the caps is 18.67 and its deviation is 4.394. Compared to Machine 2, which has a mean of 24.19 and a deviation of 7.11. So it can be concluded that their two means are different according to what can be seen in the graph and in the results. Problem 2. Family energy cost data An economist wants to determine whether the monthly energy cost for families has changed from the previous year, when the mean cost per month was $200. The economist randomly samples 25 families and records their energy costs for the current year. Data file : FamilyEnergyCost.csv Worksheet column Description Family ID The family identification number Energy Cost The mean cost of energy per month a) Use a histogram to evaluate if the energy cost follows a normal distribution function. INPUT #Second FamilyEnergyCost <- read.csv("~/R/DataSets/FamilyEnergyCost.csv") View(FamilyEnergyCost) hist(FamilyEnergyCost$Energy.Cost) #install.packages('nortest') library(nortest) ad.test(FamilyEnergyCost$Energy.Cost) #PRUEBA A.DARLING DE NORMALIDAD str(FamilyEnergyCost) Energy_Cost=FamilyEnergyCost$Energy.Cost hist(Energy_Cost)#Histograma Mean_Family_Energy=mean(Energy_Cost) #Medias de los costos de energía sd_energy_cost=sd(Energy_Cost) #Desviación estándar de los costos de energía summary(Energy_Cost)#Summary OUTPUT > #Second

FamilyEnergyCost <- read.csv("~/R/DataSets/FamilyEnergyCost.csv") View(FamilyEnergyCost) hist(FamilyEnergyCost$Energy.Cost) #install.packages('nortest') library(nortest) ad.test(FamilyEnergyCost$Energy.Cost) #PRUEBA A.DARLING DE NORMALIDAD Anderson-Darling normality test data: FamilyEnergyCost$Energy.Cost A = 0.22709, p-value = 0. str(FamilyEnergyCost) 'data.frame': 25 obs. of 2 variables: $ Family.ID : int 1 2 3 4 5 6 7 8 9 10 ... $ Energy.Cost: int 211 572 558 250 478 307 184 435 460 308 ...

After The resting heart rate of the person after the running program Difference The difference between the person's resting heart rate before and after the running program a) Evaluates with a 95% confidence interval mean plot if there is a difference between the resting heart rate of people before and after the running program. (15 points) INPUT #Third RestingHeartRate <- read.csv("~/R/DataSets/RestingHeartRate.csv") View(RestingHeartRate) RestingHeartRate=subset(RestingHeartRate,select=c(Before,After)) RestingHeartRate str(RestingHeartRate) means_RestingHeartRate = sapply(RestingHeartRate,mean) stdev_RestingHeartRate = sapply(RestingHeartRate,sd) n = sapply(RestingHeartRate,length) qt(0.975,n-1) interval = qt(0.975,n-1)*stdev_RestingHeartRate/sqrt(n) plotCI(x=means_RestingHeartRate, uiw =interval, barcol="blue", main=expression(paste("Confidence Interval Plot of difference between the heart rate of people before and after ",alpha,"=5%"))) t.test(RestingHeartRate$Before,RestingHeartRate$After,var.equal = FALSE) t.test(RestingHeartRate$Before,RestingHeartRate$After,var.equal = TRUE) OUTPUT > #Second

RestingHeartRate <- read.csv("~/R/DataSets/RestingHeartRate.csv") View(RestingHeartRate) RestingHeartRate=subset(RestingHeartRate,select=c(Before,After)) RestingHeartRate Before After 1 68 67 2 76 77 3 74 74 4 71 74 5 71 69 6 72 70 7 75 71 8 83 77 9 75 71 10 74 74 11 76 73 12 77 68 13 78 71 14 75 72 15 75 77 16 84 80 17 77 74 18 69 73 19 75 72 20 65 62 str(RestingHeartRate) 'data.frame':20 obs. of 2 variables: $ Before: int 68 76 74 71 71 72 75 83 75 74 ...

$ After : int 67 77 74 74 69 70 71 77 71 74 ...

means_RestingHeartRate = sapply(RestingHeartRate,mean) stdev_RestingHeartRate = sapply(RestingHeartRate,sd) n = sapply(RestingHeartRate,length) qt(0.975,n-1) Before After 2.093024 2. interval = qt(0.975,n-1)*stdev_RestingHeartRate/sqrt(n) plotCI(x=means_RestingHeartRate, uiw =interval, barcol="blue",

  • main=expression(paste("Confidence Interval Plot of difference between the heart rate of people before and after ",alpha,"=5%")))

t.test(RestingHeartRate$Before,RestingHeartRate$After,var.equal = FALSE) Welch Two Sample t-test data: RestingHeartRate$Before and RestingHeartRate$After t = 1.6219, df = 37.57, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5470543 4. sample estimates: mean of x mean of y 74.5 72. t.test(RestingHeartRate$Before,RestingHeartRate$After,var.equal = TRUE) Two Sample t-test data: RestingHeartRate$Before and RestingHeartRate$After t = 1.6219, df = 38, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.5460218 4. sample estimates: mean of x mean of y 74.5 72. Using the confidence intervals of each of the before and after tests, we can conclude that there is no significant difference between the pulse periods at rest and this is because the intervals coincide.

View(AcademicSalaries) AcademicSalaries=na.omit(AcademicSalaries) AcademicSalaries=as.data.frame(AcademicSalaries) str(AcademicSalaries) 'data.frame': 45 obs. of 3 variables: $ Subject: int 1 1 1 1 1 1 1 1 1 1 ... $ Degree : int 1 1 2 2 3 3 3 3 3 3 ... $ Salary : num 1.7 1.9 1.8 2.1 2.5 2.7 2.9 2.5 2.6 2. ...

  • attr(, "na.action")= 'omit' Named int [1:97] 46 47 48 49 50 51 52 53 54 55 ... ..- attr(, "names")= chr [1:97] "46" "47" "48" "49" ...

AcademicSalaries$Subject <- as.factor(AcademicSalaries$Subject) AcademicSalaries$Degree <- as.factor(AcademicSalaries$Degree) str(AcademicSalaries) 'data.frame': 45 obs. of 3 variables: $ Subject: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ... $ Degree : Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3 3 3 3 3 ... $ Salary : num 1.7 1.9 1.8 2.1 2.5 2.7 2.9 2.5 2.6 2. ...

  • attr(, "na.action")= 'omit' Named int [1:97] 46 47 48 49 50 51 52 53 54 55 ... ..- attr(, "names")= chr [1:97] "46" "47" "48" "49" ...

summary(AcademicSalaries) Subject Degree Salary 1:12 1:10 Min. :1. 2:13 2:13 1st Qu.:2. 3:11 3:22 Median :2. 4: 9 Mean :2. 3rd Qu.:3. Max. :3. Subject_Salary=split(AcademicSalaries$Salary,AcademicSalaries$Subject) str(Subject_Salary) List of 4 $ 1: num [1:12] 1.7 1.9 1.8 2.1 2.5 2.7 2.9 2.5 2.6 2. ... $ 2: num [1:13] 2.5 2.3 2.6 2.4 2.7 2.4 2.6 2.4 2.5 3. ... $ 3: num [1:11] 2.7 2.8 2.9 3 2.8 2.7 3.7 3.6 3.7 3. ... $ 4: num [1:9] 2.5 2.6 2.3 2.8 3.3 3.4 3.3 3.5 3. means_Subject_Salary = sapply(Subject_Salary,mean) stdev_Subject_Salary = sapply(Subject_Salary,sd) ggplot(AcademicSalaries,aes(x=Subject,y=Salary,fill=Subject))+geom_boxplot() summary(AcademicSalaries) Subject Degree Salary 1:12 1:10 Min. :1. 2:13 2:13 1st Qu.:2. 3:11 3:22 Median :2. 4: 9 Mean :2. 3rd Qu.:3. Max. :3. my_anova2=aov(Salary~Subject+Degree+Subject*Degree, data=AcademicSalaries) summary(my_anova2) Df Sum Sq Mean Sq F value Pr(>F) Subject 3 4.168 1.389 63.85 7.9e-14 *** Degree 2 8.382 4.191 192.63 < 2e-16 *** Subject:Degree 6 0.044 0.007 0.34 0. Residuals 33 0.718 0.


Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Tukey=TukeyHSD(my_anova2) Tukey

Tukey multiple comparisons of means

95% family-wise confidence level Fit: aov(formula = Salary ~ Subject + Degree + Subject * Degree, data = AcademicSalaries) $Subject $Degree $Subject:Degree