Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Table of Useful R Commands Cheat Sheet, Cheat Sheet of Probability and Statistics

Useful R commands in a table with each function purpose. File also list some usage of these functions according to math. M143 Introduction to Probability and Statistics course material at Calvin University

Typology: Cheat Sheet

2020/2021
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 04/26/2021

amodini
amodini 🇺🇸

4.7

(19)

258 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Table of Useful Rcommands
Command Purpose
help() Obtain documentation for a given Rcommand
example() View some examples on the use of a command
c(),scan() Enter data manually to a vector in R
seq() Make arithmetic progression vector
rep() Make vector of repeated values
data() Load (often into a data.frame) built-in dataset
View() View dataset in a spreadsheet-type format
str() Display internal structure of an R object
read.csv(),read.table() Load into a data.frame an existing data file
library(),require() Make available an Radd-on package
dim() See dimensions (# of rows/cols) of data.frame
length() Give length of a vector
ls() Lists memory contents
rm() Removes an item from memory
names() Lists names of variables in a data.frame
hist() Command for producing a histogram
histogram() Lattice command for producing a histogram
stem() Make a stem plot
table() List all values of a variable with frequencies
xtabs() Cross-tabulation tables using formulas
mosaicplot() Make a mosaic plot
cut() Groups values of a variable into larger bins
mean(),median() Identify “center” of distribution
by() apply function to a column split by factors
summary() Display 5-number summary and mean
var(),sd() Find variance, sd of values in vector
sum() Add up all values in a vector
quantile() Find the position of a quantile in a dataset
barplot() Produces a bar graph
barchart() Lattice command for producing bar graphs
boxplot() Produces a boxplot
bwplot() Lattice command for producing boxplots
Command Purpose
plot() Produces a scatterplot
xyplot() Lattice command for producing a scatterplot
lm() Determine the least-squares regression line
anova() Analysis of variance (can use on results of lm())
predict() Obtain predicted values from linear model
nls() estimate parameters of a nonlinear model
residuals() gives (observed - predicted) for a model fit to data
sample() take a sample from a vector of data
replicate() repeat some process a set number of times
cumsum() produce running total of values for input vector
ecdf() builds empirical cumulative distribution function
dbinom(), etc. tools for binomial distributions
dpois(), etc. tools for Poisson distributions
pnorm(), etc. tools for normal distributions
qt(), etc. tools for student tdistributions
pchisq(), etc. tools for chi-square distributions
binom.test() hypothesis test and confidence interval for 1 proportion
prop.test() inference for 1 proportion using normal approx.
chisq.test() carries out a chi-square test
fisher.test() Fisher test for contingency table
t.test() student t test for inference on population mean
qqnorm(),qqline() tools for checking normality
addmargins() adds marginal sums to an existing table
prop.table() compute proportions from a contingency table
par() query and edit graphical settings
power.t.test() power calculations for 1- and 2-sample t
anova() compute analysis of variance table for fitted model
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
Discount

On special offer

Partial preview of the text

Download Table of Useful R Commands Cheat Sheet and more Cheat Sheet Probability and Statistics in PDF only on Docsity!

Table of Useful

R

commands

Command

Purpose

help()

Obtain documentation for a given

R

command

example()

View some examples on the use of a command

c()

,^ scan()

Enter data manually to a vector in

R

seq()

Make arithmetic progression vector

rep()

Make vector of repeated values

data()

Load (often into a data.frame) built-in dataset

View()

View dataset in a spreadsheet-type format

str()

Display internal structure of an R object

read.csv()

,^ read.table()

Load into a data.frame an existing data file

library()

,^ require()

Make available an

R

add-on package

dim()

See dimensions (# of rows/cols) of data.frame

length()

Give length of a vector

ls()

Lists memory contents

rm()

Removes an item from memory

names()

Lists names of variables in a data.frame

hist()

Command for producing a histogram

histogram()

Lattice command for producing a histogram

stem()

Make a stem plot

table()

List all values of a variable with frequencies

xtabs()

Cross-tabulation tables using formulas

mosaicplot()

Make a mosaic plot

cut()

Groups values of a variable into larger bins

mean()

,^ median()

Identify “center” of distribution

by()

apply function to a column split by factors

summary()

Display 5-number summary and mean

var()

,^ sd()

Find variance, sd of values in vector

sum()

Add up all values in a vector

quantile()

Find the position of a quantile in a dataset

barplot()

Produces a bar graph

barchart()

Lattice

command for producing bar graphs

boxplot()

Produces a boxplot

bwplot()

Lattice

command for producing boxplots

Command

Purpose

plot()

Produces a scatterplot

xyplot()

Lattice

command for producing a scatterplot

lm()

Determine the least-squares regression line

anova()

Analysis of variance (can use on results of

lm()

predict()

Obtain predicted values from linear model

nls()

estimate parameters of a nonlinear model

residuals()

gives

(observed - predicted)

for a model fit to data

sample()

take a sample from a vector of data

replicate()

repeat some process a set number of times

cumsum()

produce running total of values for input vector

ecdf()

builds empirical cumulative distribution function

dbinom()

, etc.

tools for binomial distributions

dpois()

, etc.

tools for Poisson distributions

pnorm()

, etc.

tools for normal distributions

qt()

, etc.

tools for student

t^

distributions

pchisq()

, etc.

tools for chi-square distributions

binom.test()

hypothesis test and confidence interval for 1 proportion

prop.test()

inference for 1 proportion using normal approx.

chisq.test()

carries out a chi-square test

fisher.test()

Fisher test for contingency table

t.test()

student t test for inference on population mean

qqnorm()

,^ qqline()

tools for checking normality

addmargins()

adds marginal sums to an existing table

prop.table()

compute proportions from a contingency table

par()

query and edit graphical settings

power.t.test()

power calculations for 1- and 2-sample

t

anova()

compute analysis of variance table for fitted model

Examples of usage

help()

help(mean)

example()

require(lattice) example(histogram)

c(), rep() seq()

x = c(8, 6, 7, 5, 3, 0, 9) x [1] 8 6 7 5 3 0 9

names = c("Owen", "Luke", "Anakin", "Leia", "Jacen", "Jaina") names [1] "Owen" "Luke" "Anakin" "Leia" "Jacen" "Jaina" heartDeck = c(rep(1, 13), rep(0, 39)) heartDeck [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [49] 0 0 0 0

y = seq(7, 41, 1.5) y [1] 7.0 8.5 10.0 11.5 13.0 14.5 16.0 17.5 19.0 20.5 22.0 23.5 25.0 26.5 28.0 29.5 31.0 32.5 34. [20] 35.5 37.0 38.5 40.

data(), dim(), names(), View(), str()

data(iris) names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" dim(iris) [1] 150 5 str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4. ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3. ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1. ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0. ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... View(iris)

library(), require()

library(abd) require(lattice)

histogram()

require(lattice) data(iris) histogram(iris$Sepal.Length, breaks=seq(4,8,.25)) histogram(~ Sepal.Length, data=iris, main="Iris Sepals", xlab="Length") histogram(~ Sepal.Length | Species, data=iris, col="red") histogram(~ Sepal.Length | Species, data=iris, n=15, layout=c(1,3))

read.csv()

As.in.H2O = read.csv("http://www.calvin.edu/~scofield/data/comma/arsenicInWater.csv")

read.table()

senate = read.table("http://www.calvin.edu/~scofield/data/tab/rc/senate99.dat", sep="\t", header=T)

mean(), median(), summary(), var(), sd(), quantile(),

counties=read.csv("http://www.calvin.edu/~stob/data/counties.csv") names(counties) [1] "County" "State" "Population" "HousingUnits" "TotalArea" [6] "WaterArea" "LandArea" "DensityPop" "DensityHousing" x = counties$LandArea mean(x, na.rm = T) [1] 1126. median(x, na.rm = T)

[1] 616.

summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.99 431.70 616.50 1126.00 923.20 145900. sd(x, na.rm = T) [1] 3622. var(x, na.rm = T) [1] 13122165

quantile(x, probs=seq(0, 1, .2), na.rm=T) 0% 20% 40% 60% 80% 100% 1.99 403.29 554.36 717.94 1043.82 145899.

sum()

firstTwentyIntegers = 1: sum(firstTwentyIntegers) [1] 210 die = 1: manyRolls = sample(die, 100, replace=T) sixFreq = sum(manyRolls == 6) sixFreq / 100 [1] 0.

stem()

monarchs = read.csv("http://www.calvin.edu/~scofield/data/comma/monarchReigns.csv") stem(monarchs$years) The decimal point is 1 digit(s) to the right of the |

0 | 0123566799 1 | 0023333579 2 | 012224455 3 | 355589 4 | 4 5 | 069 6 | 3

table(), table(), mosaicplot(), cut()

pol = read.csv("http://www.calvin.edu/~stob/data/csbv.csv") table(pol$sex) Female Male 133 88 table(pol$sex, pol$Political04) Conservative Far Right Liberal Middle-of-the-road Female 67 0 14 48 Male 47 7 6 28 xtabs(~sex, data=pol) sex

Female Male 133 88

xtabs(~Political04 + Political07, data=pol) Political Political04 Conservative Far Left Far Right Liberal Middle-of-the-road Conservative 58 0 2 13 39 Far Right 4 0 3 0 0 Liberal 0 1 1 14 4 Middle-of-the-road 20 0 0 22 32 mosaicplot(~Political04 + sex, data=pol)

barplot()

pol = read.csv("http://www.calvin.edu/~stob/data/csbv.csv") barplot(table(pol$Political04), main="Political Leanings, Calvin Freshman 2004") barplot(table(pol$Political04), horiz=T) barplot(table(pol$Political04),col=c("red","green","blue","orange")) barplot(table(pol$Political04),col=c("red","green","blue","orange"), names=c("Conservative","Far Right","Liberal","Centrist"))

Conservative Liberal Centrist

0

40

80

barplot(xtabs(~sex + Political04, data=pol), legend=c("Female","Male"), beside=T)

Conservative Far Right Liberal Middle−of−the−road

Female Male

0

10

20

30

40

50

60

boxplot()

data(iris) boxplot(iris$Sepal.Length) boxplot(iris$Sepal.Length, col="yellow") boxplot(Sepal.Length ~ Species, data=iris) boxplot(Sepal.Length ~ Species, data=iris, col="yellow", ylab="Sepal length",main="Iris Sepal Length by Species")

l

setosa versicolor virginica

Iris Sepal Length by Species

Sepal length

plot()

data(faithful) plot(waiting~eruptions,data=faithful) plot(waiting~eruptions,data=faithful,cex=.5) plot(waiting~eruptions,data=faithful,pch=6) plot(waiting~eruptions,data=faithful,pch=19) plot(waiting~eruptions,data=faithful,cex=.5,pch=19,col="blue") plot(waiting~eruptions, data=faithful, cex=.5, pch=19, col="blue", main="Old Faithful Eruptions", ylab="Wait time between eruptions", xlab="Duration of eruption")

l

l

l l

l

l

l l

l

l

l

l l

l

l

l

l

l

l

l

ll

l l

l

l

l

l (^) l l ll l

l l

l l

l

l

l l

l

l

l

l

l

l l

l

l

l

l

l

l

l

l l l

l l

l

l

l

l

l

l

l l l

l

l

l

l l l

l

l

l l

l l

l l l

l

l l l

l

l

l

l

l

l

l

l

l l

l

l

l

l

l

l l

l

l

l

l l l

l

l l

l

l

l

l

l

l

l

l

l

l

l l

l

l

l

l

l

l

l

l

l

l

l

l

l

l l

l

l l l

l

l

l

l

l

l l l

l l (^) l

l

l

l

l

l

l

l

l l

l l

l

l

l

l

l

l l

l l l

l

l l

l

l

l l

l

l

l

l

l

l

l

l

l

l l l

l l

l

l

l

l

l

l

l

l

l

l

l

l l

l

l

l l

l

l

l

l

l

l

l

l

l ll l l l

l l

l

l

l

l

ll

l l

l

l

l

l

l

l l

l

l

l

l

l

l l l

l l l

l

l

l (^) l

l

l

l

l

l

l

l

l

l

l

l

Old Faithful Eruptions

Duration of eruption

Time between eruptions

dbinom(), pbinom(), qbinom(), rbinom(), binom.test(), prop.test()

dbinom(0, 5, .5) # probability of 0 heads in 5 flips

[1] 0.

dbinom(0:5, 5, .5) # full probability dist. for 5 flips [1] 0.03125 0.15625 0.31250 0.31250 0.15625 0. sum(dbinom(0:2, 5, .5)) # probability of 2 or fewer heads in 5 flips [1] 0. pbinom(2, 5, .5) # same as last line [1] 0. flip5 = replicate(10000, sum(sample(c("H","T"), 5, rep=T)=="H")) table(flip5) / 10000 # distribution (simulated) of count of heads in 5 flips flip 0 1 2 3 4 5 0.0310 0.1545 0.3117 0.3166 0.1566 0. table(rbinom(10000, 5, .5)) / 10000 # shorter version of previous 2 lines 0 1 2 3 4 5 0.0304 0.1587 0.3087 0.3075 0.1634 0.

qbinom(seq(0,1,.2), 50, .2) # approx. 0/.2/.4/.6/.8/1-quantiles in Binom(50,.2) distribution [1] 0 8 9 11 12 50 binom.test(29, 200, .21) # inference on sample with 29 successes in 200 trials Exact binomial test

data: 29 and 200 number of successes = 29, number of trials = 200, p-value = 0. alternative hypothesis: true probability of success is not equal to 0. 95 percent confidence interval: 0.09930862 0. sample estimates: probability of success

prop.test(29, 200, .21) # inference on same sample, using normal approx. to binomial

1-sample proportions test with continuity correction

data: 29 out of 200, null probability 0. X-squared = 4.7092, df = 1, p-value = 0. alternative hypothesis: true p is not equal to 0. 95 percent confidence interval: 0.1007793 0. sample estimates: p

pchisq(), qchisq(), chisq.test()

1 - pchisq(3.1309, 5) # gives P-value associated with X-squared stat 3.1309 when df=

[1] 0.

pchisq(3.1309, df=5, lower.tail=F) # same as above [1] 0. qchisq(c(.001,.005,.01,.025,.05,.95,.975,.99,.995,.999), 2) # gives critical values like Table A [1] 0.002001001 0.010025084 0.020100672 0.050635616 0.102586589 5.991464547 7. [8] 9.210340372 10.596634733 13. qchisq(c(.999,.995,.99,.975,.95,.05,.025,.01,.005,.001), 2, lower.tail=F) # same as above [1] 0.002001001 0.010025084 0.020100672 0.050635616 0.102586589 5.991464547 7. [8] 9.210340372 10.596634733 13. observedCounts = c(35, 27, 33, 40, 47, 51) claimedProbabilities = c(.13, .13, .14, .16, .24, .20) chisq.test(observedCounts, p=claimedProbabilities) # goodness-of-fit test, assumes df = n- Chi-squared test for given probabilities

data: observedCounts X-squared = 3.1309, df = 5, p-value = 0.

addmargins()

blood = read.csv("http://www.calvin.edu/~scofield/data/comma/blood.csv") t = table(blood$Rh, blood$type) addmargins(t) # to add both row/column totals A AB B O Sum Neg 6 1 2 7 16 Pos 34 3 9 38 84 Sum 40 4 11 45 100 addmargins(t, 1) # to add only column totals A AB B O Neg 6 1 2 7 Pos 34 3 9 38 Sum 40 4 11 45 addmargins(t, 2) # to add only row totals

A AB B O Sum Neg 6 1 2 7 16 Pos 34 3 9 38 84

par()

par(mfrow = c(1,2)) # set figure so next two plots appear side-by-side poisSamp = rpois(50, 3) # Draw sample of size 50 from Pois(3) maxX = max(poisSamp) # will help in setting horizontal plotting region hist(poisSamp, freq=F, breaks=-.5:(maxX+.5), col="green", xlab="Sampled values") plot(0:maxX, dpois(0:maxX, 3), type="h", ylim=c(0,.25), col="blue", main="Probabilities for Pois(3)")

Histogram of poisSamp

Sampled values

Density

Probabilities for Pois(3)

0:maxX

dpois(0:maxX, 3)

fisher.test()

blood = read.csv("http://www.calvin.edu/~scofield/data/comma/blood.csv") tblood = xtabs(~Rh + type, data=blood) tblood # contingency table for blood type and Rh factor type Rh A AB B O Neg 6 1 2 7 Pos 34 3 9 38 chisq.test(tblood) Pearson's Chi-squared test

data: tblood X-squared = 0.3164, df = 3, p-value = 0.

fisher.test(tblood) Fisher's Exact Test for Count Data

data: tblood p-value = 0. alternative hypothesis: two.sided

dpois(), ppois()

dpois(2:7, 4.2) # probabilities of 2, 3, 4, 5, 6 or 7 successes in Pois(4.211)

[1] 0.13226099 0.18516538 0.19442365 0.16331587 0.11432111 0.

ppois(1, 4.2) # probability of 1 or fewer successes in Pois(4.2); same as sum(dpois(0:1, 4.2)) [1] 0. 1 - ppois(7, 4.2) # probability of 8 or more successes in Pois(4.2) [1] 0.

pnorm() qnorm(), rnorm(), dnorm()

pnorm(17, 19, 3) # gives Prob[X < 17], when X ~ Norm(19, 3) [1] 0. qnorm(c(.95, .975, .995)) # obtain z* critical values for 90, 95, 99% CIs

[1] 1.644854 1.959964 2.

nSamp = rnorm(10000, 7, 1.5) # draw random sample from Norm(7, 1.5) hist(nSamp, freq=F, col="green", main="Sampled values and population density curve") xs = seq(2, 12, .05) lines(xs, dnorm(xs, 7, 1.5), lwd=2, col="blue")

Sampled values and population density curve

nSamp

Density

t.test()

data(sleep) t.test(extra ~ group, data=sleep) # 2-sample t with group id column Welch Two Sample t-test

data: extra by group t = -1.8608, df = 17.776, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0. sample estimates: mean in group 1 mean in group 2 0.75 2.

sleepGrp1 = sleep$extra[sleep$group==1] sleepGrp2 = sleep$extra[sleep$group==2] t.test(sleepGrp1, sleepGrp2, conf.level=.99) # 2-sample t, data in separate vectors Welch Two Sample t-test

data: sleepGrp1 and sleepGrp t = -1.8608, df = 17.776, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 99 percent confidence interval: -4.027633 0. sample estimates: mean of x mean of y 0.75 2.

qqnorm(), qqline()

qqnorm(precip, ylab = "Precipitation [in/yr] for 70 US cities", pch=19, cex=.6) qqline(precip) # Is this line helpful? Is it the one you would eyeball?

l l

l

l

l l

l l

ll l

l

l l

l l

llll l

l

l

lll llll

l ll

l

l

l

l

l

l

l l

l ll

l

l ll l ll^ l l

l

l

l l l

l

l

l

l

ll l

l

l l l

l

Normal Q−Q Plot

Precipitation [in/yr] for 70 US cities Theoretical Quantiles

power.t.test()

power.t.test(n=20, delta=.1, sd=.4, sig.level=.05) # tells how much power at these settings

Two-sample t test power calculation

n = 20 delta = 0. sd = 0. sig.level = 0. power = 0. alternative = two.sided

NOTE: n is number in each group

power.t.test(delta=.1, sd=.4, sig.level=.05, power=.8) # tells sample size needed for desired power Two-sample t test power calculation

n = 252. delta = 0. sd = 0. sig.level = 0. power = 0. alternative = two.sided

NOTE: n is number in each group

anova()

require(lattice) require(abd) data(JetLagKnees) xyplot(shift ~ treatment, JetLagKnees, type=c('p','a'), col="navy", pch=19, cex=.5) anova( lm( shift ~ treatment, JetLagKnees ) ) Analysis of Variance Table

Response: shift Df Sum Sq Mean Sq F value Pr(>F) treatment 2 7.2245 3.6122 7.2894 0.004472 ** Residuals 19 9.4153 0.


Signif. codes: 0 ***' 0.001**' 0.01 *' 0.05.' 0.1 ` ' 1

treatment

shift −

0

control eyes knee

lll llll l

ll ll ll l

ll lll l l