

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Complete cheat sheet on Data Visualization with ggplot2, the data visualization package for the statistical programming language R.
Typology: Cheat Sheet
1 / 2
This page cannot be seen from the preview
Don't miss anything!
a + geom_area(stat = "bin") x, y, alpha, color, fill, linetype, size b + geom_area(aes(y = ..density..), stat = "bin") a + geom_density(kernel = "gaussian") x, y, alpha, color, fill, linetype, size, weight b + geom_density(aes(y = ..county..)) a + geom_dotplot() x, y, alpha, color, fill a + geom_freqpoly() x, y, alpha, color, linetype, size b + geom_freqpoly(aes(y = ..density..)) a + geom_histogram(binwidth = 5) x, y, alpha, color, fill, linetype, size, weight b + geom_histogram(aes(y = ..density..)) Discrete b <- ggplot(mpg, aes(fl)) b + geom_bar() x, alpha, color, fill, linetype, size, weight Continuous a <- ggplot(mpg, aes(hwy))
Continuous Function Discrete X, Discrete Y h <- ggplot(diamonds, aes(cut, color)) h + geom_jitter() x, y, alpha, color, fill, shape, size Discrete X, Continuous Y g <- ggplot(mpg, aes(class, hwy)) g + geom_bar(stat = "identity") x, y, alpha, color, fill, linetype, size, weight g + geom_boxplot() lower, middle, upper, x, ymax, ymin, alpha, color, fill, linetype, shape, size, weight g + geom_dotplot(binaxis = "y", stackdir = "center") x, y, alpha, color, fill g + geom_violin(scale = "area") x, y, alpha, color, fill, linetype, size, weight Continuous X, Continuous Y f <- ggplot(mpg, aes(cty, hwy)) f + geom_blank() f + geom_jitter() x, y, alpha, color, fill, shape, size f + geom_point() x, y, alpha, color, fill, shape, size f + geom_quantile() x, y, alpha, color, linetype, size, weight f + geom_rug(sides = "bl") alpha, color, linetype, size f + geom_smooth(model = lm) x, y, alpha, color, fill, linetype, size, weight f + geom_text(aes(label = cty)) x, y, label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust
m + geom_contour(aes(z = z)) x, y, z, alpha, colour, linetype, size, weight seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) m <- ggplot(seals, aes(long, lat)) j <- ggplot(economics, aes(date, unemploy)) j + geom_area() x, y, alpha, color, fill, linetype, size j + geom_line() x, y, alpha, color, linetype, size j + geom_step(direction = "hv") x, y, alpha, color, linetype, size Continuous Bivariate Distribution i <- ggplot(movies, aes(year, rating)) i + geom_bin2d(binwidth = c(5, 0.5)) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size, weight i + geom_density2d() x, y, alpha, colour, linetype, size i + geom_hex() x, y, alpha, colour, fill size e + geom_segment(aes( xend = long + delta_long, yend = lat + delta_lat)) x, xend, y, yend, alpha, color, linetype, size e + geom_rect(aes(xmin = long, ymin = lat, xmax= long + delta_long, ymax = lat + delta_lat)) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size c + geom_polygon(aes(group = group)) x, y, alpha, color, fill, linetype, size e <- ggplot(seals, aes(x = long, y = lat)) m + geom_raster(aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE) x, y, alpha, fill m + geom_tile(aes(fill = z)) x, y, alpha, color, fill, linetype, size k + geom_crossbar(fatten = 2) x, y, ymax, ymin, alpha, color, fill, linetype, size k + geom_errorbar() x, ymax, ymin, alpha, color, linetype, size, width (also geom_errorbarh()) k + geom_linerange() x, ymin, ymax, alpha, color, linetype, size k + geom_pointrange() x, y, ymin, ymax, alpha, color, fill, linetype, shape, size Visualizing error df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) k <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) d + geom_path(lineend="butt", linejoin="round’, linemitre=1) x, y, alpha, color, linetype, size d + geom_ribbon(aes(ymin=unemploy - 900, ymax=unemploy + 900)) x, ymax, ymin, alpha, color, fill, linetype, size d <- ggplot(economics, aes(date, unemploy)) c <- ggplot(map, aes(long, lat)) data <- data.frame(murder = USArrests$Murder, state = tolower(rownames(USArrests))) map <- map_data("state") l <- ggplot(data, aes(fill = murder)) l + geom_map(aes(map_id = state), map = map) + expand_limits(x = map$long, y = map$lat) map_id, alpha, color, fill, linetype, size Maps
Build a graph with qplot() or ggplot() ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geoms—visual marks that represent data points, and a coordinate system. To display data values, map variables in the data set to aesthetic properties of the geom like size, color, and x and y locations.
Cheat Sheet RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/1 5 Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables Basics
a + geom_area(stat = "bin") x, y, alpha, color, fill, linetype, size b + geom_area(aes(y = ..density..), stat = "bin") a + geom_density( kernal = "gaussian" ) x, y, alpha, color, fill, linetype, size, weight b + geom_density(aes(y = ..county..)) a+ geom_dotplot() x, y, alpha, color, fill a + geom_freqpoly() x, y, alpha, color, linetype, size b + geom_freqpoly(aes(y = ..density..)) a + geom_histogram( binwidth = 5 ) x, y, alpha, color, fill, linetype, size, weight b + geom_histogram(aes(y = ..density..)) Discrete a <- ggplot(mpg, aes(fl)) b + geom_bar() x, alpha, color, fill, linetype, size, weight Continuous a <- ggplot(mpg, aes(hwy))
Discrete X, Discrete Y h <- ggplot(diamonds, aes(cut, color)) h + geom_jitter() x, y, alpha, color, fill, shape, size Discrete X, Continuous Y g <- ggplot(mpg, aes(class, hwy)) g + geom_bar(stat = "identity") x, y, alpha, color, fill, linetype, size, weight g + geom_boxplot() lower, middle, upper, x, ymax, ymin, alpha, color, fill, linetype, shape, size, weight g + geom_dotplot( binaxis = "y", stackdir = "center" ) x, y, alpha, color, fill g + geom_violin( scale = "area" ) x, y, alpha, color, fill, linetype, size, weight Continuous X, Continuous Y f <- ggplot(mpg, aes(cty, hwy)) f + geom_blank() f + geom_jitter() x, y, alpha, color, fill, shape, size f + geom_point() x, y, alpha, color, fill, shape, size f + geom_quantile() x, y, alpha, color, linetype, size, weight f + geom_rug( sides = "bl" ) alpha, color, linetype, size f + geom_smooth( model = lm ) x, y, alpha, color, fill, linetype, size, weight f + geom_text( aes(label = cty) ) x, y, label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust
i + geom_contour( aes(z = z) ) x, y, z, alpha, colour, linetype, size, weight seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) i <- ggplot(seals, aes(long, lat)) g <- ggplot(economics, aes(date, unemploy)) Continuous Function g + geom_area() x, y, alpha, color, fill, linetype, size g + geom_line() x, y, alpha, color, linetype, size g + geom_step( direction = "hv" ) x, y, alpha, color, linetype, size Continuous Bivariate Distribution h <- ggplot(movies, aes(year, rating)) h + geom_bin2d( binwidth = c(5, 0.5) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size, weight h + geom_density2d() x, y, alpha, colour, linetype, size h + geom_hex() x, y, alpha, colour, fill size d + geom_segment( aes( xend = long + delta_long, yend = lat + delta_lat) ) x, xend, y, yend, alpha, color, linetype, size d + geom_rect( aes(xmin = long, ymin = lat, xmax= long + delta_long, ymax = lat + delta_lat) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size c + geom_polygon( aes(group = group) ) x, y, alpha, color, fill, linetype, size d<- ggplot(seals, aes(x = long, y = lat)) i + geom_raster( aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE ) x, y, alpha, fill i + geom_tile( aes(fill = z) ) x, y, alpha, color, fill, linetype, size e + geom_crossbar( fatten = 2 ) x, y, ymax, ymin, alpha, color, fill, linetype, size e + geom_errorbar() x, ymax, ymin, alpha, color, linetype, size, width (also geom_errorbarh() ) e + geom_linerange() x, ymin, ymax, alpha, color, linetype, size e + geom_pointrange() x, y, ymin, ymax, alpha, color, fill, linetype, shape, size Visualizing error df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) e <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) g + geom_path( lineend="butt", linejoin="round’, linemitre=1 ) x, y, alpha, color, linetype, size g + geom_ribbon( aes(ymin=unemploy - 900, ymax=unemploy + 900) ) x, ymax, ymin, alpha, color, fill, linetype, size g <- ggplot(economics, aes(date, unemploy)) c <- ggplot(map, aes(long, lat)) data <- data.frame(murder = USArrests$Murder, state = tolower(rownames(USArrests))) map <- map_data("state") e <- ggplot(data, aes(fill = murder)) e + geom_map( aes(map_id = state), map = map ) + expand_limits( x = map$long, y = map$lat ) map_id, alpha, color, fill, linetype, size Maps
= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4
data geom coordinate system plot
= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4 data geom coordinate system plot x = F y = A color = F size = A 1 2 3 (^00 1 2 3 ) 4 plot
= 1 2 3 (^00 1 2 3 ) 4 data geom coordinate x = F y = A system x = F y = A
Cheat Sheet RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Learn more at docs.ggplot2.org • ggplot2 0.9.3.1 • Updated: 3/1 5 Geoms - Use a geom to represent data points, use the geom’s aesthetic properties to represent variables Basics
a + geom_area(stat = "bin") x, y, alpha, color, fill, linetype, size b + geom_area(aes(y = ..density..), stat = "bin") a + geom_density( kernal = "gaussian" ) x, y, alpha, color, fill, linetype, size, weight b + geom_density(aes(y = ..county..)) a+ geom_dotplot() x, y, alpha, color, fill a + geom_freqpoly() x, y, alpha, color, linetype, size b + geom_freqpoly(aes(y = ..density..)) a + geom_histogram( binwidth = 5 ) x, y, alpha, color, fill, linetype, size, weight b + geom_histogram(aes(y = ..density..)) Discrete a <- ggplot(mpg, aes(fl)) b + geom_bar() x, alpha, color, fill, linetype, size, weight Continuous a <- ggplot(mpg, aes(hwy))
Discrete X, Discrete Y h <- ggplot(diamonds, aes(cut, color)) h + geom_jitter() x, y, alpha, color, fill, shape, size Discrete X, Continuous Y g <- ggplot(mpg, aes(class, hwy)) g + geom_bar(stat = "identity") x, y, alpha, color, fill, linetype, size, weight g + geom_boxplot() lower, middle, upper, x, ymax, ymin, alpha, color, fill, linetype, shape, size, weight g + geom_dotplot( binaxis = "y", stackdir = "center" ) x, y, alpha, color, fill g + geom_violin( scale = "area" ) x, y, alpha, color, fill, linetype, size, weight Continuous X, Continuous Y f <- ggplot(mpg, aes(cty, hwy)) f + geom_blank() f + geom_jitter() x, y, alpha, color, fill, shape, size f + geom_point() x, y, alpha, color, fill, shape, size f + geom_quantile() x, y, alpha, color, linetype, size, weight f + geom_rug( sides = "bl" ) alpha, color, linetype, size f + geom_smooth( model = lm ) x, y, alpha, color, fill, linetype, size, weight f + geom_text( aes(label = cty) ) x, y, label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust
i + geom_contour( aes(z = z) ) x, y, z, alpha, colour, linetype, size, weight seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2)) i <- ggplot(seals, aes(long, lat)) g <- ggplot(economics, aes(date, unemploy)) Continuous Function g + geom_area() x, y, alpha, color, fill, linetype, size g + geom_line() x, y, alpha, color, linetype, size g + geom_step( direction = "hv" ) x, y, alpha, color, linetype, size Continuous Bivariate Distribution h <- ggplot(movies, aes(year, rating)) h + geom_bin2d( binwidth = c(5, 0.5) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size, weight h + geom_density2d() x, y, alpha, colour, linetype, size h + geom_hex() x, y, alpha, colour, fill size d + geom_segment( aes( xend = long + delta_long, yend = lat + delta_lat) ) x, xend, y, yend, alpha, color, linetype, size d + geom_rect( aes(xmin = long, ymin = lat, xmax= long + delta_long, ymax = lat + delta_lat) ) xmax, xmin, ymax, ymin, alpha, color, fill, linetype, size c + geom_polygon( aes(group = group) ) x, y, alpha, color, fill, linetype, size d<- ggplot(seals, aes(x = long, y = lat)) i + geom_raster( aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE ) x, y, alpha, fill i + geom_tile( aes(fill = z) ) x, y, alpha, color, fill, linetype, size e + geom_crossbar( fatten = 2 ) x, y, ymax, ymin, alpha, color, fill, linetype, size e + geom_errorbar() x, ymax, ymin, alpha, color, linetype, size, width (also geom_errorbarh() ) e + geom_linerange() x, ymin, ymax, alpha, color, linetype, size e + geom_pointrange() x, y, ymin, ymax, alpha, color, fill, linetype, shape, size Visualizing error df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) e <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) g + geom_path( lineend="butt", linejoin="round’, linemitre=1 ) x, y, alpha, color, linetype, size g + geom_ribbon( aes(ymin=unemploy - 900, ymax=unemploy + 900) ) x, ymax, ymin, alpha, color, fill, linetype, size g <- ggplot(economics, aes(date, unemploy)) c <- ggplot(map, aes(long, lat)) data <- data.frame(murder = USArrests$Murder, state = tolower(rownames(USArrests))) map <- map_data("state") e <- ggplot(data, aes(fill = murder)) e + geom_map( aes(map_id = state), map = map ) + expand_limits( x = map$long, y = map$lat ) map_id, alpha, color, fill, linetype, size Maps
= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4
data geom coordinate system plot
= 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4 data geom coordinate system plot x = F y = A color = F size = A 1 2 3 (^00 1 2 3 ) 4 plot
12 = 3 (^00 1 2 3 ) 4 data geom coordinate x = F y = A system x = F y = A ggsave("plot.png", width = 5, height = 5) Saves last plot as 5’ x 5’ file named "plot.png" in working directory. Matches file type to file extension. qplot(x = cty, y = hwy, color = cyl, data = mpg, geom = "point") Creates a complete plot with given data, geom, and mappings. Supplies many useful defaults. ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot that you finish by adding layers to. No defaults, but provides more control than qplot().
Add a new layer to a plot with a geom_() or stat_() function. Each provides a geom, a set of aesthetic mappings, and a default stat and position adjustment. last_plot() Returns the last plot
Stats - An alternative way to build a layer (^) Coordinate Systems r + coord_cartesian(xlim = c(0, 5)) xlim, ylim The default cartesian coordinate system r + coord_fixed(ratio = 1/2) ratio, xlim, ylim Cartesian coordinates with fixed aspect ratio between x and y units r + coord_flip() xlim, ylim Flipped Cartesian coordinates r + coord_polar(theta = "x", direction=1 ) theta, start, direction Polar coordinates r + coord_trans(ytrans = "sqrt") xtrans, ytrans, limx, limy Transformed cartesian coordinates. Set extras and strains to the name of a window function. r <- b + geom_bar() Scales Faceting t <- ggplot(mpg, aes(cty, hwy)) + geom_point() Position Adjustments s + geom_bar(position = "dodge") Arrange elements side by side s + geom_bar(position = "fill") Stack elements on top of one another, normalize height s + geom_bar(position = "stack") Stack elements on top of one another f + geom_point(position = "jitter") Add random noise to X and Y position of each element to avoid overplotting s <- ggplot(mpg, aes(fl, fill = drv)) Labels t + ggtitle("New Plot Title") Add a main title above the plot t + xlab("New X label") Change the label on the X axis t + ylab("New Y label") Change the label on the Y axis t + labs(title =" New title", x = "New x", y = "New y") All of the above Legends Themes Zooming Facets divide a plot into subplots based on the values of one or more discrete variables. t + facet_grid(. ~ fl) facet into columns based on fl t + facet_grid(year ~ .) facet into rows based on year t + facet_grid(year ~ fl) facet into both rows and columns t + facet_wrap(~ fl) wrap facets into a rectangular layout Set scales to let axis limits vary across facets t + facet_grid(y ~ x, scales = "free") x and y axis limits adjust to individual facets
x (^) ..count.. = 12 3 (^00 1 2 3 ) 4 1 2 3 (^00 1 2 3 ) 4 data geom^ coordinate system x = x plot y = ..count.. fl cty cyl stat
i + stat_density2d(aes(fill = ..level..), geom = "polygon", n = 100)
1D distributions 2D distributions 3 Variables Comparisons Functions General Purpose Scales control how a plot maps data values to the visual values of an aesthetic. To change the mapping, add a custom scale. n <- b + geom_bar(aes(fill = fl)) n n + scale_fill_manual( values = c("skyblue", "royalblue", "blue", "navy"), limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", "r"), name = "fuel", labels = c("D", "E", "P", "R"))
range of values to include in mapping title to use in legend/axis labels to use in legend/axis breaks to use in legend/axis General Purpose scales Use with any aesthetic: alpha, color, fill, linetype, shape, size scale_continuous() - map cont’ values to visual values scalediscrete() - map discrete values to visual values scaleidentity() - use data values as visual values scale_manual(values = c()) - map discrete values to manually chosen visual values X and Y location scales Color and fill scales Shape scales Size scales Use with x or y aesthetics (x shown here) scale_x_date(labels = date_format("%m/%d"), breaks = date_breaks("2 weeks")) - treat x values as dates. See ?strptime for label formats. scale_x_datetime() - treat x values as date times. Use same arguments as scale_x_date(). scale_x_log10() - Plot x on log10 scale scale_x_reverse() - Reverse direction of x axis scale_x_sqrt() - Plot x on square root scale Discrete Continuous n <- b + geom_bar( aes(fill = fl)) o <- a + geom_dotplot( aes(fill = ..x..))
o + scale_fill_gradient2( low = "red", hight = "blue", mid = "white", midpoint = 25)
p <- f + geom_point( aes(shape = fl))
Manual Shape values 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 *. * o o O O (^00)
| | % %
Manual shape values q <- f + geom_point( aes(size = cyl))
Value mapped to area of circle (not radius)
60 long lat z + coord_map(projection = "ortho", orientation=c(41, -74, 0)) projection, orientation, xlim, ylim Map projections from the mapproj package (mercator (default), azequalarea, lagrange, etc.) fl: c fl: d fl: e fl: p fl: r c d e p r ↵c^ ↵d^ ↵e^ ↵p^ ↵r
Without clipping (preferred) 0 50 100 150 c d e fl p r count 0 50 100 150 c d e fl p r count 0 50 100 150 c d e fl p r count r + theme_bw() White background with grid lines r + theme_grey() Grey background (default theme) 0 50 100 150 c d e fl p r count Some plots visualize a transformation of the original data set. Use a stat to choose a common transformation to visualize, e.g. a + geom_bar(stat = "bin")