









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Useful R commands in a table with each function purpose. File also list some usage of these functions according to math. M143 Introduction to Probability and Statistics course material at Calvin University
Typology: Cheat Sheet
1 / 17
This page cannot be seen from the preview
Don't miss anything!
On special offer
Command
Purpose
help()
Obtain documentation for a given
command
example()
View some examples on the use of a command
c()
,^ scan()
Enter data manually to a vector in
seq()
Make arithmetic progression vector
rep()
Make vector of repeated values
data()
Load (often into a data.frame) built-in dataset
View()
View dataset in a spreadsheet-type format
str()
Display internal structure of an R object
read.csv()
,^ read.table()
Load into a data.frame an existing data file
library()
,^ require()
Make available an
add-on package
dim()
See dimensions (# of rows/cols) of data.frame
length()
Give length of a vector
ls()
Lists memory contents
rm()
Removes an item from memory
names()
Lists names of variables in a data.frame
hist()
Command for producing a histogram
histogram()
Lattice command for producing a histogram
stem()
Make a stem plot
table()
List all values of a variable with frequencies
xtabs()
Cross-tabulation tables using formulas
mosaicplot()
Make a mosaic plot
cut()
Groups values of a variable into larger bins
mean()
,^ median()
Identify “center” of distribution
by()
apply function to a column split by factors
summary()
Display 5-number summary and mean
var()
,^ sd()
Find variance, sd of values in vector
sum()
Add up all values in a vector
quantile()
Find the position of a quantile in a dataset
barplot()
Produces a bar graph
barchart()
Lattice
command for producing bar graphs
boxplot()
Produces a boxplot
bwplot()
Lattice
command for producing boxplots
Command
Purpose
plot()
Produces a scatterplot
xyplot()
Lattice
command for producing a scatterplot
lm()
Determine the least-squares regression line
anova()
Analysis of variance (can use on results of
lm()
predict()
Obtain predicted values from linear model
nls()
estimate parameters of a nonlinear model
residuals()
gives
(observed - predicted)
for a model fit to data
sample()
take a sample from a vector of data
replicate()
repeat some process a set number of times
cumsum()
produce running total of values for input vector
ecdf()
builds empirical cumulative distribution function
dbinom()
, etc.
tools for binomial distributions
dpois()
, etc.
tools for Poisson distributions
pnorm()
, etc.
tools for normal distributions
qt()
, etc.
tools for student
t^
distributions
pchisq()
, etc.
tools for chi-square distributions
binom.test()
hypothesis test and confidence interval for 1 proportion
prop.test()
inference for 1 proportion using normal approx.
chisq.test()
carries out a chi-square test
fisher.test()
Fisher test for contingency table
t.test()
student t test for inference on population mean
qqnorm()
,^ qqline()
tools for checking normality
addmargins()
adds marginal sums to an existing table
prop.table()
compute proportions from a contingency table
par()
query and edit graphical settings
power.t.test()
power calculations for 1- and 2-sample
t
anova()
compute analysis of variance table for fitted model
help(mean)
require(lattice) example(histogram)
x = c(8, 6, 7, 5, 3, 0, 9) x [1] 8 6 7 5 3 0 9
names = c("Owen", "Luke", "Anakin", "Leia", "Jacen", "Jaina") names [1] "Owen" "Luke" "Anakin" "Leia" "Jacen" "Jaina" heartDeck = c(rep(1, 13), rep(0, 39)) heartDeck [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [49] 0 0 0 0
y = seq(7, 41, 1.5) y [1] 7.0 8.5 10.0 11.5 13.0 14.5 16.0 17.5 19.0 20.5 22.0 23.5 25.0 26.5 28.0 29.5 31.0 32.5 34. [20] 35.5 37.0 38.5 40.
data(iris) names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" dim(iris) [1] 150 5 str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4. ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3. ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1. ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0. ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... View(iris)
library(abd) require(lattice)
require(lattice) data(iris) histogram(iris$Sepal.Length, breaks=seq(4,8,.25)) histogram(~ Sepal.Length, data=iris, main="Iris Sepals", xlab="Length") histogram(~ Sepal.Length | Species, data=iris, col="red") histogram(~ Sepal.Length | Species, data=iris, n=15, layout=c(1,3))
As.in.H2O = read.csv("http://www.calvin.edu/~scofield/data/comma/arsenicInWater.csv")
senate = read.table("http://www.calvin.edu/~scofield/data/tab/rc/senate99.dat", sep="\t", header=T)
counties=read.csv("http://www.calvin.edu/~stob/data/counties.csv") names(counties) [1] "County" "State" "Population" "HousingUnits" "TotalArea" [6] "WaterArea" "LandArea" "DensityPop" "DensityHousing" x = counties$LandArea mean(x, na.rm = T) [1] 1126. median(x, na.rm = T)
[1] 616.
summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.99 431.70 616.50 1126.00 923.20 145900. sd(x, na.rm = T) [1] 3622. var(x, na.rm = T) [1] 13122165
quantile(x, probs=seq(0, 1, .2), na.rm=T) 0% 20% 40% 60% 80% 100% 1.99 403.29 554.36 717.94 1043.82 145899.
firstTwentyIntegers = 1: sum(firstTwentyIntegers) [1] 210 die = 1: manyRolls = sample(die, 100, replace=T) sixFreq = sum(manyRolls == 6) sixFreq / 100 [1] 0.
monarchs = read.csv("http://www.calvin.edu/~scofield/data/comma/monarchReigns.csv") stem(monarchs$years) The decimal point is 1 digit(s) to the right of the |
0 | 0123566799 1 | 0023333579 2 | 012224455 3 | 355589 4 | 4 5 | 069 6 | 3
pol = read.csv("http://www.calvin.edu/~stob/data/csbv.csv") table(pol$sex) Female Male 133 88 table(pol$sex, pol$Political04) Conservative Far Right Liberal Middle-of-the-road Female 67 0 14 48 Male 47 7 6 28 xtabs(~sex, data=pol) sex
Female Male 133 88
xtabs(~Political04 + Political07, data=pol) Political Political04 Conservative Far Left Far Right Liberal Middle-of-the-road Conservative 58 0 2 13 39 Far Right 4 0 3 0 0 Liberal 0 1 1 14 4 Middle-of-the-road 20 0 0 22 32 mosaicplot(~Political04 + sex, data=pol)
pol = read.csv("http://www.calvin.edu/~stob/data/csbv.csv") barplot(table(pol$Political04), main="Political Leanings, Calvin Freshman 2004") barplot(table(pol$Political04), horiz=T) barplot(table(pol$Political04),col=c("red","green","blue","orange")) barplot(table(pol$Political04),col=c("red","green","blue","orange"), names=c("Conservative","Far Right","Liberal","Centrist"))
Conservative Liberal Centrist
0
40
80
barplot(xtabs(~sex + Political04, data=pol), legend=c("Female","Male"), beside=T)
Conservative Far Right Liberal Middle−of−the−road
Female Male
0
10
20
30
40
50
60
data(iris) boxplot(iris$Sepal.Length) boxplot(iris$Sepal.Length, col="yellow") boxplot(Sepal.Length ~ Species, data=iris) boxplot(Sepal.Length ~ Species, data=iris, col="yellow", ylab="Sepal length",main="Iris Sepal Length by Species")
l
data(faithful) plot(waiting~eruptions,data=faithful) plot(waiting~eruptions,data=faithful,cex=.5) plot(waiting~eruptions,data=faithful,pch=6) plot(waiting~eruptions,data=faithful,pch=19) plot(waiting~eruptions,data=faithful,cex=.5,pch=19,col="blue") plot(waiting~eruptions, data=faithful, cex=.5, pch=19, col="blue", main="Old Faithful Eruptions", ylab="Wait time between eruptions", xlab="Duration of eruption")
l
l
l l
l
l
l l
l
l
l
l l
l
l
l
l
l
l
l
ll
l l
l
l
l
l (^) l l ll l
l l
l l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l l
l l
l
l
l
l
l
l
l l l
l
l
l
l l l
l
l
l l
l l
l l l
l
l l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l l l
l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l l l
l
l
l
l
l
l l l
l l (^) l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l l
l l l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l l l
l l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l
l ll l l l
l l
l
l
l
l
ll
l l
l
l
l
l
l
l l
l
l
l
l
l
l l l
l l l
l
l
l (^) l
l
l
l
l
l
l
l
l
l
l
l
dbinom(0, 5, .5) # probability of 0 heads in 5 flips
[1] 0.
dbinom(0:5, 5, .5) # full probability dist. for 5 flips [1] 0.03125 0.15625 0.31250 0.31250 0.15625 0. sum(dbinom(0:2, 5, .5)) # probability of 2 or fewer heads in 5 flips [1] 0. pbinom(2, 5, .5) # same as last line [1] 0. flip5 = replicate(10000, sum(sample(c("H","T"), 5, rep=T)=="H")) table(flip5) / 10000 # distribution (simulated) of count of heads in 5 flips flip 0 1 2 3 4 5 0.0310 0.1545 0.3117 0.3166 0.1566 0. table(rbinom(10000, 5, .5)) / 10000 # shorter version of previous 2 lines 0 1 2 3 4 5 0.0304 0.1587 0.3087 0.3075 0.1634 0.
qbinom(seq(0,1,.2), 50, .2) # approx. 0/.2/.4/.6/.8/1-quantiles in Binom(50,.2) distribution [1] 0 8 9 11 12 50 binom.test(29, 200, .21) # inference on sample with 29 successes in 200 trials Exact binomial test
data: 29 and 200 number of successes = 29, number of trials = 200, p-value = 0. alternative hypothesis: true probability of success is not equal to 0. 95 percent confidence interval: 0.09930862 0. sample estimates: probability of success
prop.test(29, 200, .21) # inference on same sample, using normal approx. to binomial
1-sample proportions test with continuity correction
data: 29 out of 200, null probability 0. X-squared = 4.7092, df = 1, p-value = 0. alternative hypothesis: true p is not equal to 0. 95 percent confidence interval: 0.1007793 0. sample estimates: p
1 - pchisq(3.1309, 5) # gives P-value associated with X-squared stat 3.1309 when df=
[1] 0.
pchisq(3.1309, df=5, lower.tail=F) # same as above [1] 0. qchisq(c(.001,.005,.01,.025,.05,.95,.975,.99,.995,.999), 2) # gives critical values like Table A [1] 0.002001001 0.010025084 0.020100672 0.050635616 0.102586589 5.991464547 7. [8] 9.210340372 10.596634733 13. qchisq(c(.999,.995,.99,.975,.95,.05,.025,.01,.005,.001), 2, lower.tail=F) # same as above [1] 0.002001001 0.010025084 0.020100672 0.050635616 0.102586589 5.991464547 7. [8] 9.210340372 10.596634733 13. observedCounts = c(35, 27, 33, 40, 47, 51) claimedProbabilities = c(.13, .13, .14, .16, .24, .20) chisq.test(observedCounts, p=claimedProbabilities) # goodness-of-fit test, assumes df = n- Chi-squared test for given probabilities
data: observedCounts X-squared = 3.1309, df = 5, p-value = 0.
blood = read.csv("http://www.calvin.edu/~scofield/data/comma/blood.csv") t = table(blood$Rh, blood$type) addmargins(t) # to add both row/column totals A AB B O Sum Neg 6 1 2 7 16 Pos 34 3 9 38 84 Sum 40 4 11 45 100 addmargins(t, 1) # to add only column totals A AB B O Neg 6 1 2 7 Pos 34 3 9 38 Sum 40 4 11 45 addmargins(t, 2) # to add only row totals
A AB B O Sum Neg 6 1 2 7 16 Pos 34 3 9 38 84
par(mfrow = c(1,2)) # set figure so next two plots appear side-by-side poisSamp = rpois(50, 3) # Draw sample of size 50 from Pois(3) maxX = max(poisSamp) # will help in setting horizontal plotting region hist(poisSamp, freq=F, breaks=-.5:(maxX+.5), col="green", xlab="Sampled values") plot(0:maxX, dpois(0:maxX, 3), type="h", ylim=c(0,.25), col="blue", main="Probabilities for Pois(3)")
blood = read.csv("http://www.calvin.edu/~scofield/data/comma/blood.csv") tblood = xtabs(~Rh + type, data=blood) tblood # contingency table for blood type and Rh factor type Rh A AB B O Neg 6 1 2 7 Pos 34 3 9 38 chisq.test(tblood) Pearson's Chi-squared test
data: tblood X-squared = 0.3164, df = 3, p-value = 0.
fisher.test(tblood) Fisher's Exact Test for Count Data
data: tblood p-value = 0. alternative hypothesis: two.sided
dpois(2:7, 4.2) # probabilities of 2, 3, 4, 5, 6 or 7 successes in Pois(4.211)
[1] 0.13226099 0.18516538 0.19442365 0.16331587 0.11432111 0.
ppois(1, 4.2) # probability of 1 or fewer successes in Pois(4.2); same as sum(dpois(0:1, 4.2)) [1] 0. 1 - ppois(7, 4.2) # probability of 8 or more successes in Pois(4.2) [1] 0.
pnorm(17, 19, 3) # gives Prob[X < 17], when X ~ Norm(19, 3) [1] 0. qnorm(c(.95, .975, .995)) # obtain z* critical values for 90, 95, 99% CIs
[1] 1.644854 1.959964 2.
nSamp = rnorm(10000, 7, 1.5) # draw random sample from Norm(7, 1.5) hist(nSamp, freq=F, col="green", main="Sampled values and population density curve") xs = seq(2, 12, .05) lines(xs, dnorm(xs, 7, 1.5), lwd=2, col="blue")
data(sleep) t.test(extra ~ group, data=sleep) # 2-sample t with group id column Welch Two Sample t-test
data: extra by group t = -1.8608, df = 17.776, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0. sample estimates: mean in group 1 mean in group 2 0.75 2.
sleepGrp1 = sleep$extra[sleep$group==1] sleepGrp2 = sleep$extra[sleep$group==2] t.test(sleepGrp1, sleepGrp2, conf.level=.99) # 2-sample t, data in separate vectors Welch Two Sample t-test
data: sleepGrp1 and sleepGrp t = -1.8608, df = 17.776, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 99 percent confidence interval: -4.027633 0. sample estimates: mean of x mean of y 0.75 2.
qqnorm(precip, ylab = "Precipitation [in/yr] for 70 US cities", pch=19, cex=.6) qqline(precip) # Is this line helpful? Is it the one you would eyeball?
l l
l
l
l l
l l
ll l
l
l l
l l
llll l
l
l
lll llll
l ll
l
l
l
l
l
l
l l
l ll
l
l ll l ll^ l l
l
l
l l l
l
l
l
l
ll l
l
l l l
l
power.t.test(n=20, delta=.1, sd=.4, sig.level=.05) # tells how much power at these settings
Two-sample t test power calculation
n = 20 delta = 0. sd = 0. sig.level = 0. power = 0. alternative = two.sided
NOTE: n is number in each group
power.t.test(delta=.1, sd=.4, sig.level=.05, power=.8) # tells sample size needed for desired power Two-sample t test power calculation
n = 252. delta = 0. sd = 0. sig.level = 0. power = 0. alternative = two.sided
NOTE: n is number in each group
require(lattice) require(abd) data(JetLagKnees) xyplot(shift ~ treatment, JetLagKnees, type=c('p','a'), col="navy", pch=19, cex=.5) anova( lm( shift ~ treatment, JetLagKnees ) ) Analysis of Variance Table
Response: shift Df Sum Sq Mean Sq F value Pr(>F) treatment 2 7.2245 3.6122 7.2894 0.004472 ** Residuals 19 9.4153 0.
Signif. codes: 0 ***' 0.001
**' 0.01 *' 0.05
.' 0.1 ` ' 1
treatment
shift −
−
0
control eyes knee
lll llll l
ll ll ll l
ll lll l l