Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Visualization with ggplot2 Cheat Sheet, Cheat Sheet of Programming Languages

Complete cheat sheet on Data Visualization with ggplot2, the data visualization package for the statistical programming language R.

Typology: Cheat Sheet

2019/2020

Uploaded on 10/09/2020

nicoline
nicoline 🇺🇸

4.6

(12)

277 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Graphical Primitives
Data Visualization
with ggplot2
Cheat Sheet
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/15
Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables. Each function returns a layer.
One Variable
a + geom_area(stat = "bin")
x, y, alpha, color, fill, linetype, size
b + geom_area(aes(y = ..density..), stat = "bin")
a + geom_density(kernel = "gaussian")
x, y, alpha, color, fill, linetype, size, weight
b + geom_density(aes(y = ..county..))
a + geom_dotplot()
x, y, alpha, color, fill
a + geom_freqpoly()
x, y, alpha, color, linetype, size
b + geom_freqpoly(aes(y = ..density..))
a + geom_histogram(binwidth = 5)
x, y, alpha, color, fill, linetype, size, weight
b + geom_histogram(aes(y = ..density..))
Discrete
b <- ggplot(mpg, aes(fl))
b + geom_bar()
x, alpha, color, fill, linetype, size, weight
Continuous
a <- ggplot(mpg, aes(hwy))
Two Variables
Continuous Function
Discrete X, Discrete Y
h <- ggplot(diamonds, aes(cut, color))
h + geom_jitter()
x, y, alpha, color, fill, shape, size
Discrete X, Continuous Y
g <- ggplot(mpg, aes(class, hwy))
g + geom_bar(stat = "identity")
x, y, alpha, color, fill, linetype, size, weight
g + geom_boxplot()
lower, middle, upper, x, ymax, ymin, alpha,
color, fill, linetype, shape, size, weight
g + geom_dotplot(binaxis = "y",
stackdir = "center")
x, y, alpha, color, fill
g + geom_violin(scale = "area")
x, y, alpha, color, fill, linetype, size, weight
Continuous X, Continuous Y
f <- ggplot(mpg, aes(cty, hwy))
f + geom_blank()
f + geom_jitter()
x, y, alpha, color, fill, shape, size
f + geom_point()
x, y, alpha, color, fill, shape, size
f + geom_quantile()
x, y, alpha, color, linetype, size, weight
f + geom_rug(sides = "bl")
alpha, color, linetype, size
f + geom_smooth(model = lm)
x, y, alpha, color, fill, linetype, size, weight
f + geom_text(aes(label = cty))
x, y, label, alpha, angle, color, family, fontface,
hjust, lineheight, size, vjust
Three Variables
m + geom_contour(aes(z = z))
x, y, z, alpha, colour, linetype, size, weight
seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2))
m <- ggplot(seals, aes(long, lat))
j <- ggplot(economics, aes(date, unemploy))
j + geom_area()
x, y, alpha, color, fill, linetype, size
j + geom_line()
x, y, alpha, color, linetype, size
j + geom_step(direction = "hv")
x, y, alpha, color, linetype, size
Continuous Bivariate Distribution
i <- ggplot(movies, aes(year, rating))
i + geom_bin2d(binwidth = c(5, 0.5))
xmax, xmin, ymax, ymin, alpha, color, fill,
linetype, size, weight
i + geom_density2d()
x, y, alpha, colour, linetype, size
i + geom_hex()
x, y, alpha, colour, fill size
e + geom_segment(aes(
xend = long + delta_long,
yend = lat + delta_lat))
x, xend, y, yend, alpha, color, linetype, size
e + geom_rect(aes(xmin = long, ymin = lat,
xmax= long + delta_long,
ymax = lat + delta_lat))
xmax, xmin, ymax, ymin, alpha, color, fill,
linetype, size
c + geom_polygon(aes(group = group))
x, y, alpha, color, fill, linetype, size
e <- ggplot(seals, aes(x = long, y = lat))
m + geom_raster(aes(fill = z), hjust=0.5,
vjust=0.5, interpolate=FALSE)
x, y, alpha, fill
m + geom_tile(aes(fill = z))
x, y, alpha, color, fill, linetype, size
k + geom_crossbar(fatten = 2)
x, y, ymax, ymin, alpha, color, fill, linetype,
size
k + geom_errorbar()
x, ymax, ymin, alpha, color, linetype, size,
width (also geom_errorbarh())
k + geom_linerange()
x, ymin, ymax, alpha, color, linetype, size
k + geom_pointrange()
x, y, ymin, ymax, alpha, color, fill, linetype,
shape, size
Visualizing error
df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2)
k <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se))
d + geom_path(lineend="butt",
linejoin="round’, linemitre=1)
x, y, alpha, color, linetype, size
d + geom_ribbon(aes(ymin=unemploy - 900,
ymax=unemploy + 900))
x, ymax, ymin, alpha, color, fill, linetype, size
d <- ggplot(economics, aes(date, unemploy))
c <- ggplot(map, aes(long, lat))
data <- data.frame(murder = USArrests$Murder,
state = tolower(rownames(USArrests)))
map <- map_data("state")
l <- ggplot(data, aes(fill = murder))
l + geom_map(aes(map_id = state), map = map) +
expand_limits(x = map$long, y = map$lat)
map_id, alpha, color, fill, linetype, size
Maps
AB
C
Basics
Build a graph with qplot() or ggplot()
To display data values, map variables in the data set
to aesthetic properties of the geom like size, color,
and x and y locations.
Graphical Primitives
Data Visualization
with ggplot2
Cheat Sheet
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/15
Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables
Basics
One Variable
a + geom_area(stat = "bin")
x, y, alpha, color, fill, linetype, size
b + geom_area(aes(y = ..density..), stat = "bin")
a + geom_density(kernal = "gaussian")
x, y, alpha, color, fill, linetype, size, weight
b + geom_density(aes(y = ..county..))
a+ geom_dotplot()
x, y, alpha, color, fill
a + geom_freqpoly()
x, y, alpha, color, linetype, size
b + geom_freqpoly(aes(y = ..density..))
a + geom_histogram(binwidth = 5)
x, y, alpha, color, fill, linetype, size, weight
b + geom_histogram(aes(y = ..density..))
Discrete
a <- ggplot(mpg, aes(fl))
b + geom_bar()
x, alpha, color, fill, linetype, size, weight
Continuous
a <- ggplot(mpg, aes(hwy))
Two Variables
Discrete X, Discrete Y
h <- ggplot(diamonds, aes(cut, color))
h + geom_jitter()
x, y, alpha, color, fill, shape, size
Discrete X, Continuous Y
g <- ggplot(mpg, aes(class, hwy))
g + geom_bar(stat = "identity")
x, y, alpha, color, fill, linetype, size, weight
g + geom_boxplot()
lower, middle, upper, x, ymax, ymin, alpha,
color, fill, linetype, shape, size, weight
g + geom_dotplot(binaxis = "y",
stackdir = "center")
x, y, alpha, color, fill
g + geom_violin(scale = "area")
x, y, alpha, color, fill, linetype, size, weight
Continuous X, Continuous Y
f <- ggplot(mpg, aes(cty, hwy))
f + geom_blank()
f + geom_jitter()
x, y, alpha, color, fill, shape, size
f + geom_point()
x, y, alpha, color, fill, shape, size
f + geom_quantile()
x, y, alpha, color, linetype, size, weight
f + geom_rug(sides = "bl")
alpha, color, linetype, size
f + geom_smooth(model = lm)
x, y, alpha, color, fill, linetype, size, weight
f + geom_text(aes(label = cty))
x, y, label, alpha, angle, color, family, fontface,
hjust, lineheight, size, vjust
Three Variables
i + geom_contour(aes(z = z))
x, y, z, alpha, colour, linetype, size, weight
seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2))
i <- ggplot(seals, aes(long, lat))
g <- ggplot(economics, aes(date, unemploy))
Continuous Function
g + geom_area()
x, y, alpha, color, fill, linetype, size
g + geom_line()
x, y, alpha, color, linetype, size
g + geom_step(direction = "hv")
x, y, alpha, color, linetype, size
Continuous Bivariate Distribution
h <- ggplot(movies, aes(year, rating))
h + geom_bin2d(binwidth = c(5, 0.5))
xmax, xmin, ymax, ymin, alpha, color, fill,
linetype, size, weight
h + geom_density2d()
x, y, alpha, colour, linetype, size
h + geom_hex()
x, y, alpha, colour, fill size
d + geom_segment(aes(
xend = long + delta_long,
yend = lat + delta_lat))
x, xend, y, yend, alpha, color, linetype, size
d + geom_rect(aes(xmin = long, ymin = lat,
xmax= long + delta_long,
ymax = lat + delta_lat))
xmax, xmin, ymax, ymin, alpha, color, fill,
linetype, size
c + geom_polygon(aes(group = group))
x, y, alpha, color, fill, linetype, size
d<- ggplot(seals, aes(x = long, y = lat))
i + geom_raster(aes(fill = z), hjust=0.5,
vjust=0.5, interpolate=FALSE)
x, y, alpha, fill
i + geom_tile(aes(fill = z))
x, y, alpha, color, fill, linetype, size
e + geom_crossbar(fatten = 2)
x, y, ymax, ymin, alpha, color, fill, linetype,
size
e + geom_errorbar()
x, ymax, ymin, alpha, color, linetype, size,
width (also geom_errorbarh())
e + geom_linerange()
x, ymin, ymax, alpha, color, linetype, size
e + geom_pointrange()
x, y, ymin, ymax, alpha, color, fill, linetype,
shape, size
Visualizing error
df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2)
e <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se))
g + geom_path(lineend="butt",
linejoin="round’, linemitre=1)
x, y, alpha, color, linetype, size
g + geom_ribbon(aes(ymin=unemploy - 900,
ymax=unemploy + 900))
x, ymax, ymin, alpha, color, fill, linetype, size
g <- ggplot(economics, aes(date, unemploy))
c <- ggplot(map, aes(long, lat))
data <- data.frame(murder = USArrests$Murder,
state = tolower(rownames(USArrests)))
map <- map_data("state")
e <- ggplot(data, aes(fill = murder))
e + geom_map(aes(map_id = state), map = map) +
expand_limits(x = map$long, y = map$lat)
map_id, alpha, color, fill, linetype, size
Maps
F
M
A
=
1
2
3
0
0
1
2
3
4
4
1
2
3
0
0
1
2
3
4
4
+
data
geom
coordinate
system
plot
+
F M A
=
1
2
3
0
0123 4
4
1
2
3
0
0123 4
4
data geom coordinate
system plot
x = F
y = A
color = F
size = A
1
2
3
0
0123 4
4
plot
+
F M A
=
1
2
3
0
0123 4
4
data geom coordinate
system
x = F
y = A
x = F
y = A
Graphical Primitives
Data Visualization
with ggplot2
Cheat Sheet
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/15
Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables
Basics
One Variable
a + geom_area(stat = "bin")
x, y, alpha, color, fill, linetype, size
b + geom_area(aes(y = ..density..), stat = "bin")
a + geom_density(kernal = "gaussian")
x, y, alpha, color, fill, linetype, size, weight
b + geom_density(aes(y = ..county..))
a+ geom_dotplot()
x, y, alpha, color, fill
a + geom_freqpoly()
x, y, alpha, color, linetype, size
b + geom_freqpoly(aes(y = ..density..))
a + geom_histogram(binwidth = 5)
x, y, alpha, color, fill, linetype, size, weight
b + geom_histogram(aes(y = ..density..))
Discrete
a <- ggplot(mpg, aes(fl))
b + geom_bar()
x, alpha, color, fill, linetype, size, weight
Continuous
a <- ggplot(mpg, aes(hwy))
Two Variables
Discrete X, Discrete Y
h <- ggplot(diamonds, aes(cut, color))
h + geom_jitter()
x, y, alpha, color, fill, shape, size
Discrete X, Continuous Y
g <- ggplot(mpg, aes(class, hwy))
g + geom_bar(stat = "identity")
x, y, alpha, color, fill, linetype, size, weight
g + geom_boxplot()
lower, middle, upper, x, ymax, ymin, alpha,
color, fill, linetype, shape, size, weight
g + geom_dotplot(binaxis = "y",
stackdir = "center")
x, y, alpha, color, fill
g + geom_violin(scale = "area")
x, y, alpha, color, fill, linetype, size, weight
Continuous X, Continuous Y
f <- ggplot(mpg, aes(cty, hwy))
f + geom_blank()
f + geom_jitter()
x, y, alpha, color, fill, shape, size
f + geom_point()
x, y, alpha, color, fill, shape, size
f + geom_quantile()
x, y, alpha, color, linetype, size, weight
f + geom_rug(sides = "bl")
alpha, color, linetype, size
f + geom_smooth(model = lm)
x, y, alpha, color, fill, linetype, size, weight
f + geom_text(aes(label = cty))
x, y, label, alpha, angle, color, family, fontface,
hjust, lineheight, size, vjust
Three Variables
i + geom_contour(aes(z = z))
x, y, z, alpha, colour, linetype, size, weight
seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2))
i <- ggplot(seals, aes(long, lat))
g <- ggplot(economics, aes(date, unemploy))
Continuous Function
g + geom_area()
x, y, alpha, color, fill, linetype, size
g + geom_line()
x, y, alpha, color, linetype, size
g + geom_step(direction = "hv")
x, y, alpha, color, linetype, size
Continuous Bivariate Distribution
h <- ggplot(movies, aes(year, rating))
h + geom_bin2d(binwidth = c(5, 0.5))
xmax, xmin, ymax, ymin, alpha, color, fill,
linetype, size, weight
h + geom_density2d()
x, y, alpha, colour, linetype, size
h + geom_hex()
x, y, alpha, colour, fill size
d + geom_segment(aes(
xend = long + delta_long,
yend = lat + delta_lat))
x, xend, y, yend, alpha, color, linetype, size
d + geom_rect(aes(xmin = long, ymin = lat,
xmax= long + delta_long,
ymax = lat + delta_lat))
xmax, xmin, ymax, ymin, alpha, color, fill,
linetype, size
c + geom_polygon(aes(group = group))
x, y, alpha, color, fill, linetype, size
d<- ggplot(seals, aes(x = long, y = lat))
i + geom_raster(aes(fill = z), hjust=0.5,
vjust=0.5, interpolate=FALSE)
x, y, alpha, fill
i + geom_tile(aes(fill = z))
x, y, alpha, color, fill, linetype, size
e + geom_crossbar(fatten = 2)
x, y, ymax, ymin, alpha, color, fill, linetype,
size
e + geom_errorbar()
x, ymax, ymin, alpha, color, linetype, size,
width (also geom_errorbarh())
e + geom_linerange()
x, ymin, ymax, alpha, color, linetype, size
e + geom_pointrange()
x, y, ymin, ymax, alpha, color, fill, linetype,
shape, size
Visualizing error
df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2)
e <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se))
g + geom_path(lineend="butt",
linejoin="round’, linemitre=1)
x, y, alpha, color, linetype, size
g + geom_ribbon(aes(ymin=unemploy - 900,
ymax=unemploy + 900))
x, ymax, ymin, alpha, color, fill, linetype, size
g <- ggplot(economics, aes(date, unemploy))
c <- ggplot(map, aes(long, lat))
data <- data.frame(murder = USArrests$Murder,
state = tolower(rownames(USArrests)))
map <- map_data("state")
e <- ggplot(data, aes(fill = murder))
e + geom_map(aes(map_id = state), map = map) +
expand_limits(x = map$long, y = map$lat)
map_id, alpha, color, fill, linetype, size
Maps
F M A
=
1
2
3
0
0123 4
4
1
2
3
0
0123 4
4
+
data geom coordinate
system plot
+
F
M
A
=
1
2
3
0
0
1
2
3
4
4
1
2
3
0
0
1
2
3
4
4
data
geom
coordinate
system
plot
x = F
y = A
color = F
size = A
1
2
3
0
0123 4
4
plot
+
F M A
=
1
2
3
0
0123 4
4
data geom coordinate
system
x = F
y = A
x = F
y = A
ggsave("plot.png", width = 5, height = 5)
Saves last plot as 5’ x 5’ file named "plot.png" in
working directory. Matches file type to file extension.
qplot(x = cty, y = hwy, color = cyl, data = mpg, geom = "point")
Creates a complete plot with given data, geom, and
mappings. Supplies many useful defaults.
ggplot(data = mpg, aes(x = cty, y = hwy))
Begins a plot that you finish by adding layers to. No
defaults, but provides more control than qplot().
ggplot(mpg, aes(hwy, cty)) +
geom_point(aes(color = cyl)) +
geom_smooth(method ="lm") +
coord_cartesian() +
scale_color_gradient() +
theme_bw()
data
aesthetic mappings
add layers,
elements with +
layer = geom +
default stat +
layer specific
mappings
additional
elements
data
geom
Add a new layer to a plot with a geom_*()
or stat_*() function. Each provides a geom, a
set of aesthetic mappings, and a default stat
and position adjustment.
last_plot()
Returns the last plot
pf2

Partial preview of the text

Download Data Visualization with ggplot2 Cheat Sheet and more Cheat Sheet Programming Languages in PDF only on Docsity!

Graphical Primitives

Data Visualization

with ggplot

Cheat Sheet

RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/1 5

Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables. Each function returns a layer.

One Variable

a + geom_area(stat = "bin") x, y, alpha, color, fill, linetype, size b + geom_area(aes(y = ..density..), stat = "bin") a + geom_density(kernel = "gaussian") x, y, alpha, color, fill, linetype, size, weight b + geom_density(aes(y = ..county..)) a + geom_dotplot() x, y, alpha, color, fill a + geom_freqpoly() x, y, alpha, color, linetype, size b + geom_freqpoly(aes(y = ..density..)) a + geom_histogram(binwidth = 5) x, y, alpha, color, fill, linetype, size, weight b + geom_histogram(aes(y = ..density..)) Discrete b <- ggplot(mpg, aes(fl)) b + geom_bar() x, alpha, color, fill, linetype, size, weight Continuous a <- ggplot(mpg, aes(hwy))

Two Variables

Continuous Function Discrete X, Discrete Y h <- ggplot(diamonds, aes(cut, color)) h + geom_jitter() x, y, alpha, color, fill, shape, size Discrete X, Continuous Y g <- ggplot(mpg, aes(class, hwy)) g + geom_bar(stat = "identity") x, y, alpha, color, fill, linetype, size, weight g + geom_boxplot() lower, middle, upper, x, ymax, ymin, alpha, color, fill, linetype, shape, size, weight g + geom_dotplot(binaxis = "y", stackdir = "center") x, y, alpha, color, fill g + geom_violin(scale = "area") x, y, alpha, color, fill, linetype, size, weight Continuous X, Continuous Y f <- ggplot(mpg, aes(cty, hwy)) f + geom_blank() f + geom_jitter() x, y, alpha, color, fill, shape, size f + geom_point() x, y, alpha, color, fill, shape, size f + geom_quantile() x, y, alpha, color, linetype, size, weight f + geom_rug(sides = "bl") alpha, color, linetype, size f + geom_smooth(model = lm) x, y, alpha, color, fill, linetype, size, weight f + geom_text(aes(label = cty)) x, y, label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust

Three Variables

m + geom_contour(aes(z = z)) x, y, z, alpha, colour, linetype, size, weight seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) m <- ggplot(seals, aes(long, lat)) j <- ggplot(economics, aes(date, unemploy)) j + geom_area() x, y, alpha, color, fill, linetype, size j + geom_line() x, y, alpha, color, linetype, size j + geom_step(direction = "hv") x, y, alpha, color, linetype, size Continuous Bivariate Distribution i <- ggplot(movies, aes(year, rating)) i + geom_bin2d(binwidth = c(5, 0.5)) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size, weight i + geom_density2d() x, y, alpha, colour, linetype, size i + geom_hex() x, y, alpha, colour, fill size e + geom_segment(aes( xend = long + delta_long, yend = lat + delta_lat)) x, xend, y, yend, alpha, color, linetype, size e + geom_rect(aes(xmin = long, ymin = lat, xmax= long + delta_long, ymax = lat + delta_lat)) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size c + geom_polygon(aes(group = group)) x, y, alpha, color, fill, linetype, size e <- ggplot(seals, aes(x = long, y = lat)) m + geom_raster(aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE) x, y, alpha, fill m + geom_tile(aes(fill = z)) x, y, alpha, color, fill, linetype, size k + geom_crossbar(fatten = 2) x, y, ymax, ymin, alpha, color, fill, linetype, size k + geom_errorbar() x, ymax, ymin, alpha, color, linetype, size, width (also geom_errorbarh()) k + geom_linerange() x, ymin, ymax, alpha, color, linetype, size k + geom_pointrange() x, y, ymin, ymax, alpha, color, fill, linetype, shape, size Visualizing error df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) k <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) d + geom_path(lineend="butt", linejoin="round’, linemitre=1) x, y, alpha, color, linetype, size d + geom_ribbon(aes(ymin=unemploy - 900, ymax=unemploy + 900)) x, ymax, ymin, alpha, color, fill, linetype, size d <- ggplot(economics, aes(date, unemploy)) c <- ggplot(map, aes(long, lat)) data <- data.frame(murder = USArrests$Murder, state = tolower(rownames(USArrests))) map <- map_data("state") l <- ggplot(data, aes(fill = murder)) l + geom_map(aes(map_id = state), map = map) + expand_limits(x = map$long, y = map$lat) map_id, alpha, color, fill, linetype, size Maps

AB

C

Basics

Build a graph with qplot() or ggplot() ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geoms—visual marks that represent data points, and a coordinate system. To display data values, map variables in the data set to aesthetic properties of the geom like size, color, and x and y locations.

Graphical Primitives

Data Visualization

with ggplot

Cheat Sheet RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/1 5 Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables Basics

One Variable

a + geom_area(stat = "bin") x, y, alpha, color, fill, linetype, size b + geom_area(aes(y = ..density..), stat = "bin") a + geom_density( kernal = "gaussian" ) x, y, alpha, color, fill, linetype, size, weight b + geom_density(aes(y = ..county..)) a+ geom_dotplot() x, y, alpha, color, fill a + geom_freqpoly() x, y, alpha, color, linetype, size b + geom_freqpoly(aes(y = ..density..)) a + geom_histogram( binwidth = 5 ) x, y, alpha, color, fill, linetype, size, weight b + geom_histogram(aes(y = ..density..)) Discrete a <- ggplot(mpg, aes(fl)) b + geom_bar() x, alpha, color, fill, linetype, size, weight Continuous a <- ggplot(mpg, aes(hwy))

Two Variables

Discrete X, Discrete Y h <- ggplot(diamonds, aes(cut, color)) h + geom_jitter() x, y, alpha, color, fill, shape, size Discrete X, Continuous Y g <- ggplot(mpg, aes(class, hwy)) g + geom_bar(stat = "identity") x, y, alpha, color, fill, linetype, size, weight g + geom_boxplot() lower, middle, upper, x, ymax, ymin, alpha, color, fill, linetype, shape, size, weight g + geom_dotplot( binaxis = "y", stackdir = "center" ) x, y, alpha, color, fill g + geom_violin( scale = "area" ) x, y, alpha, color, fill, linetype, size, weight Continuous X, Continuous Y f <- ggplot(mpg, aes(cty, hwy)) f + geom_blank() f + geom_jitter() x, y, alpha, color, fill, shape, size f + geom_point() x, y, alpha, color, fill, shape, size f + geom_quantile() x, y, alpha, color, linetype, size, weight f + geom_rug( sides = "bl" ) alpha, color, linetype, size f + geom_smooth( model = lm ) x, y, alpha, color, fill, linetype, size, weight f + geom_text( aes(label = cty) ) x, y, label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust

Three Variables

i + geom_contour( aes(z = z) ) x, y, z, alpha, colour, linetype, size, weight seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) i <- ggplot(seals, aes(long, lat)) g <- ggplot(economics, aes(date, unemploy)) Continuous Function g + geom_area() x, y, alpha, color, fill, linetype, size g + geom_line() x, y, alpha, color, linetype, size g + geom_step( direction = "hv" ) x, y, alpha, color, linetype, size Continuous Bivariate Distribution h <- ggplot(movies, aes(year, rating)) h + geom_bin2d( binwidth = c(5, 0.5) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size, weight h + geom_density2d() x, y, alpha, colour, linetype, size h + geom_hex() x, y, alpha, colour, fill size d + geom_segment( aes( xend = long + delta_long, yend = lat + delta_lat) ) x, xend, y, yend, alpha, color, linetype, size d + geom_rect( aes(xmin = long, ymin = lat, xmax= long + delta_long, ymax = lat + delta_lat) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size c + geom_polygon( aes(group = group) ) x, y, alpha, color, fill, linetype, size d<- ggplot(seals, aes(x = long, y = lat)) i + geom_raster( aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE ) x, y, alpha, fill i + geom_tile( aes(fill = z) ) x, y, alpha, color, fill, linetype, size e + geom_crossbar( fatten = 2 ) x, y, ymax, ymin, alpha, color, fill, linetype, size e + geom_errorbar() x, ymax, ymin, alpha, color, linetype, size, width (also geom_errorbarh() ) e + geom_linerange() x, ymin, ymax, alpha, color, linetype, size e + geom_pointrange() x, y, ymin, ymax, alpha, color, fill, linetype, shape, size Visualizing error df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) e <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) g + geom_path( lineend="butt", linejoin="round’, linemitre=1 ) x, y, alpha, color, linetype, size g + geom_ribbon( aes(ymin=unemploy - 900, ymax=unemploy + 900) ) x, ymax, ymin, alpha, color, fill, linetype, size g <- ggplot(economics, aes(date, unemploy)) c <- ggplot(map, aes(long, lat)) data <- data.frame(murder = USArrests$Murder, state = tolower(rownames(USArrests))) map <- map_data("state") e <- ggplot(data, aes(fill = murder)) e + geom_map( aes(map_id = state), map = map ) + expand_limits( x = map$long, y = map$lat ) map_id, alpha, color, fill, linetype, size Maps

F M A

= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4

data geom coordinate system plot

F M A

= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4 data geom coordinate system plot x = F y = A color = F size = A 1 2 3 (^00 1 2 3 ) 4 plot

F M A

= 1 2 3 (^00 1 2 3 ) 4 data geom coordinate x = F y = A system x = F y = A

Graphical Primitives

Data Visualization

with ggplot

Cheat Sheet RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/1 5 Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables Basics

One Variable

a + geom_area(stat = "bin") x, y, alpha, color, fill, linetype, size b + geom_area(aes(y = ..density..), stat = "bin") a + geom_density( kernal = "gaussian" ) x, y, alpha, color, fill, linetype, size, weight b + geom_density(aes(y = ..county..)) a+ geom_dotplot() x, y, alpha, color, fill a + geom_freqpoly() x, y, alpha, color, linetype, size b + geom_freqpoly(aes(y = ..density..)) a + geom_histogram( binwidth = 5 ) x, y, alpha, color, fill, linetype, size, weight b + geom_histogram(aes(y = ..density..)) Discrete a <- ggplot(mpg, aes(fl)) b + geom_bar() x, alpha, color, fill, linetype, size, weight Continuous a <- ggplot(mpg, aes(hwy))

Two Variables

Discrete X, Discrete Y h <- ggplot(diamonds, aes(cut, color)) h + geom_jitter() x, y, alpha, color, fill, shape, size Discrete X, Continuous Y g <- ggplot(mpg, aes(class, hwy)) g + geom_bar(stat = "identity") x, y, alpha, color, fill, linetype, size, weight g + geom_boxplot() lower, middle, upper, x, ymax, ymin, alpha, color, fill, linetype, shape, size, weight g + geom_dotplot( binaxis = "y", stackdir = "center" ) x, y, alpha, color, fill g + geom_violin( scale = "area" ) x, y, alpha, color, fill, linetype, size, weight Continuous X, Continuous Y f <- ggplot(mpg, aes(cty, hwy)) f + geom_blank() f + geom_jitter() x, y, alpha, color, fill, shape, size f + geom_point() x, y, alpha, color, fill, shape, size f + geom_quantile() x, y, alpha, color, linetype, size, weight f + geom_rug( sides = "bl" ) alpha, color, linetype, size f + geom_smooth( model = lm ) x, y, alpha, color, fill, linetype, size, weight f + geom_text( aes(label = cty) ) x, y, label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust

Three Variables

i + geom_contour( aes(z = z) ) x, y, z, alpha, colour, linetype, size, weight seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) i <- ggplot(seals, aes(long, lat)) g <- ggplot(economics, aes(date, unemploy)) Continuous Function g + geom_area() x, y, alpha, color, fill, linetype, size g + geom_line() x, y, alpha, color, linetype, size g + geom_step( direction = "hv" ) x, y, alpha, color, linetype, size Continuous Bivariate Distribution h <- ggplot(movies, aes(year, rating)) h + geom_bin2d( binwidth = c(5, 0.5) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size, weight h + geom_density2d() x, y, alpha, colour, linetype, size h + geom_hex() x, y, alpha, colour, fill size d + geom_segment( aes( xend = long + delta_long, yend = lat + delta_lat) ) x, xend, y, yend, alpha, color, linetype, size d + geom_rect( aes(xmin = long, ymin = lat, xmax= long + delta_long, ymax = lat + delta_lat) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size c + geom_polygon( aes(group = group) ) x, y, alpha, color, fill, linetype, size d<- ggplot(seals, aes(x = long, y = lat)) i + geom_raster( aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE ) x, y, alpha, fill i + geom_tile( aes(fill = z) ) x, y, alpha, color, fill, linetype, size e + geom_crossbar( fatten = 2 ) x, y, ymax, ymin, alpha, color, fill, linetype, size e + geom_errorbar() x, ymax, ymin, alpha, color, linetype, size, width (also geom_errorbarh() ) e + geom_linerange() x, ymin, ymax, alpha, color, linetype, size e + geom_pointrange() x, y, ymin, ymax, alpha, color, fill, linetype, shape, size Visualizing error df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) e <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) g + geom_path( lineend="butt", linejoin="round’, linemitre=1 ) x, y, alpha, color, linetype, size g + geom_ribbon( aes(ymin=unemploy - 900, ymax=unemploy + 900) ) x, ymax, ymin, alpha, color, fill, linetype, size g <- ggplot(economics, aes(date, unemploy)) c <- ggplot(map, aes(long, lat)) data <- data.frame(murder = USArrests$Murder, state = tolower(rownames(USArrests))) map <- map_data("state") e <- ggplot(data, aes(fill = murder)) e + geom_map( aes(map_id = state), map = map ) + expand_limits( x = map$long, y = map$lat ) map_id, alpha, color, fill, linetype, size Maps

F M A

= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4

data geom coordinate system plot

F M A

= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4 data geom coordinate system plot x = F y = A color = F size = A 1 2 3 (^00 1 2 3 ) 4 plot

F M A

12 = 3 (^00 1 2 3 ) 4 data geom coordinate x = F y = A system x = F y = A ggsave("plot.png", width = 5, height = 5) Saves last plot as 5’ x 5’ file named "plot.png" in working directory. Matches file type to file extension. qplot(x = cty, y = hwy, color = cyl, data = mpg, geom = "point") Creates a complete plot with given data, geom, and mappings. Supplies many useful defaults. ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot that you finish by adding layers to. No defaults, but provides more control than qplot().

ggplot(mpg, aes(hwy, cty)) +

geom_point(aes(color = cyl)) +

geom_smooth(method ="lm") +

coord_cartesian() +

scale_color_gradient() +

theme_bw()

data

aesthetic mappings

add layers,

elements with +

layer = geom +

default stat +

layer specific

mappings

additional

elements

data geom

Add a new layer to a plot with a geom_() or stat_() function. Each provides a geom, a set of aesthetic mappings, and a default stat and position adjustment. last_plot() Returns the last plot

RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/1 5

Stats - An alternative way to build a layer (^) Coordinate Systems r + coord_cartesian(xlim = c(0, 5)) xlim, ylim The default cartesian coordinate system r + coord_fixed(ratio = 1/2) ratio, xlim, ylim Cartesian coordinates with fixed aspect ratio between x and y units r + coord_flip() xlim, ylim Flipped Cartesian coordinates r + coord_polar(theta = "x", direction=1 ) theta, start, direction Polar coordinates r + coord_trans(ytrans = "sqrt") xtrans, ytrans, limx, limy Transformed cartesian coordinates. Set extras and strains to the name of a window function. r <- b + geom_bar() Scales Faceting t <- ggplot(mpg, aes(cty, hwy)) + geom_point() Position Adjustments s + geom_bar(position = "dodge") Arrange elements side by side s + geom_bar(position = "fill") Stack elements on top of one another, normalize height s + geom_bar(position = "stack") Stack elements on top of one another f + geom_point(position = "jitter") Add random noise to X and Y position of each element to avoid overplotting s <- ggplot(mpg, aes(fl, fill = drv)) Labels t + ggtitle("New Plot Title") Add a main title above the plot t + xlab("New X label") Change the label on the X axis t + ylab("New Y label") Change the label on the Y axis t + labs(title =" New title", x = "New x", y = "New y") All of the above Legends Themes Zooming Facets divide a plot into subplots based on the values of one or more discrete variables. t + facet_grid(. ~ fl) facet into columns based on fl t + facet_grid(year ~ .) facet into rows based on year t + facet_grid(year ~ fl) facet into both rows and columns t + facet_wrap(~ fl) wrap facets into a rectangular layout Set scales to let axis limits vary across facets t + facet_grid(y ~ x, scales = "free") x and y axis limits adjust to individual facets

  • "free_x" - x axis limits adjust
  • "free_y" - y axis limits adjust Set labeller to adjust facet labels t + facet_grid(. ~ fl, labeller = label_both) t + facet_grid(. ~ fl, labeller = label_bquote(alpha ^ .(x))) t + facet_grid(. ~ fl, labeller = label_parsed) Position adjustments determine how to arrange geoms that would otherwise occupy the same space. Each position adjustment can be recast as a function with manual width and height arguments s + geom_bar(position = position_dodge(width = 1)) r + theme_classic() White background no gridlines r + theme_minimal() Minimal theme t + coord_cartesian( xlim = c(0, 100), ylim = c(10, 20)) With clipping (removes unseen data points) t + xlim(0, 100) + ylim(10, 20) t + scale_x_continuous(limits = c(0, 100)) + scale_y_continuous(limits = c(0, 100)) t + theme(legend.position = "bottom") Place legend at "bottom", "top", "left", or "right" t + guides(color = "none") Set legend type for each aesthetic: colorbar, legend, or none (no legend) t + scale_fill_discrete(name = "Title", labels = c("A", "B", "C")) Set legend title and labels with a scale function. Each stat creates additional variables to map aesthetics to. These variables use a common ..name.. syntax. stat functions and geom functions both combine a stat with a geom to make a layer, i.e. stat_bin(geom="bar") does the same as geom_bar(stat="bin")

x (^) ..count.. = 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4 data geom^ coordinate system x = x plot y = ..count.. fl cty cyl stat

ggplot() + stat_function(aes(x = -3:3),

fun = dnorm, n = 101, args = list(sd=0.5))

x | ..y..

f + stat_identity()

ggplot() + stat_qq(aes(sample=1:100), distribution = qt,

dparams = list(df=5))

sample, x, y | ..x.., ..y..

f + stat_sum()

x, y, size | ..size..

f + stat_summary(fun.data = "mean_cl_boot")

f + stat_unique()

i + stat_density2d(aes(fill = ..level..), geom = "polygon", n = 100)

stat function

layer specific

mappings

variable created

by transformation

geom for layer parameters for stat

a + stat_bin(binwidth = 1, origin = 10)

x, y | ..count.., ..ncount.., ..density.., ..ndensity..

a + stat_bindot(binwidth = 1, binaxis = "x")

x, y, | ..count.., ..ncount..

a + stat_density(adjust = 1, kernel = "gaussian")

x, y, | ..count.., ..density.., ..scaled..

f + stat_bin2d(bins = 30, drop = TRUE)

x, y, fill | ..count.., ..density..

f + stat_binhex(bins = 30)

x, y, fill | ..count.., ..density..

f + stat_density2d(contour = TRUE, n = 100)

x, y, color, size | ..level..

m + stat_contour(aes(z = z))

x, y, z, order | ..level..

m+ stat_spoke(aes(radius= z, angle = z))

angle, radius, x, xend, y, yend | ..x.., ..xend.., ..y.., ..yend..

m + stat_summary_hex(aes(z = z), bins = 30, fun = mean)

x, y, z, fill | ..value..

m + stat_summary2d(aes(z = z), bins = 30, fun = mean)

x, y, z, fill | ..value..

g + stat_boxplot(coef = 1.5)

x, y | ..lower.., ..middle.., ..upper.., ..outliers..

g + stat_ydensity(adjust = 1, kernel = "gaussian", scale = "area")

x, y | ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width..

f + stat_ecdf(n = 40)

x, y | ..x.., ..y..

f + stat_quantile(quantiles = c(0.25, 0.5, 0.75), formula = y ~ log(x),

method = "rq")

x, y | ..quantile.., ..x.., ..y..

f + stat_smooth(method = "auto", formula = y ~ x, se = TRUE, n = 80,

fullrange = FALSE, level = 0.95)

x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax..

1D distributions 2D distributions 3 Variables Comparisons Functions General Purpose Scales control how a plot maps data values to the visual values of an aesthetic. To change the mapping, add a custom scale. n <- b + geom_bar(aes(fill = fl)) n n + scale_fill_manual( values = c("skyblue", "royalblue", "blue", "navy"), limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", "r"), name = "fuel", labels = c("D", "E", "P", "R"))

scale_ aesthetic to adjust^ prepackaged scale to use^ scale specific arguments

range of values to include in mapping title to use in legend/axis labels to use in legend/axis breaks to use in legend/axis General Purpose scales Use with any aesthetic: alpha, color, fill, linetype, shape, size scale_continuous() - map cont’ values to visual values scalediscrete() - map discrete values to visual values scaleidentity() - use data values as visual values scale_manual(values = c()) - map discrete values to manually chosen visual values X and Y location scales Color and fill scales Shape scales Size scales Use with x or y aesthetics (x shown here) scale_x_date(labels = date_format("%m/%d"), breaks = date_breaks("2 weeks")) - treat x values as dates. See ?strptime for label formats. scale_x_datetime() - treat x values as date times. Use same arguments as scale_x_date(). scale_x_log10() - Plot x on log10 scale scale_x_reverse() - Reverse direction of x axis scale_x_sqrt() - Plot x on square root scale Discrete Continuous n <- b + geom_bar( aes(fill = fl)) o <- a + geom_dotplot( aes(fill = ..x..))

n + scale_fill_brewer(

palette = "Blues")

For palette choices:

library(RcolorBrewer)

display.brewer.all()

n + scale_fill_grey(

start = 0.2, end = 0.8,

na.value = "red")

o + scale_fill_gradient(

low = "red",

high = "yellow")

o + scale_fill_gradient2( low = "red", hight = "blue", mid = "white", midpoint = 25)

o + scale_fill_gradientn(

colours = terrain.colors(6))

Also: rainbow(), heat.colors(),

topo.colors(), cm.colors(),

RColorBrewer::brewer.pal()

p <- f + geom_point( aes(shape = fl))

p + scale_shape(

solid = FALSE)

p + scale_shape_manual(

values = c(3:7))

Shape values shown in

chart on right

Manual Shape values 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 *. * o o O O (^00)

| | % %

Manual shape values q <- f + geom_point( aes(size = cyl))

q + scale_size_area(max = 6)

Value mapped to area of circle (not radius)

ggthemes - Package with additional ggplot2 themes

60 long lat z + coord_map(projection = "ortho", orientation=c(41, -74, 0)) projection, orientation, xlim, ylim Map projections from the mapproj package (mercator (default), azequalarea, lagrange, etc.) fl: c fl: d fl: e fl: p fl: r c d e p r ↵c^ ↵d^ ↵e^ ↵p^ ↵r

Use scale functions

to update legend

labels

Without clipping (preferred) 0 50 100 150 c d e fl p r count 0 50 100 150 c d e fl p r count 0 50 100 150 c d e fl p r count r + theme_bw() White background with grid lines r + theme_grey() Grey background (default theme) 0 50 100 150 c d e fl p r count Some plots visualize a transformation of the original data set. Use a stat to choose a common transformation to visualize, e.g. a + geom_bar(stat = "bin")