Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Transformation with dplyr Cheat Sheet, Cheat Sheet of Advanced Computer Programming

dplyr is core package in R programming

Typology: Cheat Sheet

2020/2021

Uploaded on 04/23/2021

ananya
ananya 🇺🇸

4.4

(17)

251 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Summarise Cases
group_by(.data, ..., add =
FALSE)
Returns copy of table !
grouped by …
g_iris <- group_by(iris, Species)
ungroup(x, …)
Returns ungrouped copy !
of table.
ungroup(g_iris)
w
w
w
w
w
w
w
w
w
Use group_by() to create a "grouped" copy of a table. !
dplyr functions will manipulate each "group" separately and
then combine the results.
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
These apply summary functions to columns to create a new
table. Summary functions take vectors as input and return one
value (see back).
VARIATIONS
summarise_all() - Apply funs to every column.
summarise_at() - Apply funs to specific columns.
summarise_if() - Apply funs to all cols of one type.
w
w
w
w
w
w
summarise(.data, …)!
Compute table of summaries. Also
summarise_(). !
summarise(mtcars, avg = mean(mpg))
count(x, ..., wt = NULL, sort = FALSE)!
Count number of rows in each group defined
by the variables in … Also tally().!
count(iris, Species)
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.5.0 • tibble 1.2.0 • Updated: 2017-01
Each observation, or
case, is in its own row
Each variable is in
its own column
&
dplyr functions work with pipes and expect tidy data. In tidy data:
pipes
x %>% f(y)
becomes f(x, y)
filter(.data, …) Extract rows that meet logical
criteria. Also filter_(). filter(iris, Sepal.Length > 7)
distinct(.data, ..., .keep_all = FALSE) Remove
rows with duplicate values. Also distinct_(). !
distinct(iris, Species)
sample_frac(tbl, size = 1, replace = FALSE,
weight = NULL, .env = parent.frame()) Randomly
select fraction of rows. !
sample_frac(iris, 0.5, replace = TRUE)
sample_n(tbl, size, replace = FALSE, weight =
NULL, .env = parent.frame()) Randomly select
size rows. sample_n(iris, 10, replace = TRUE)
slice(.data, …) Select rows by position. Also
slice_(). slice(iris, 10:15)
top_n(x, n, wt) Select and order top n entries (by
group if grouped data). top_n(iris, 5, Sepal.Width)
Row functions return a subset of rows as a new table. Use a
variant that ends in _ for non-standard evaluation friendly code.
See ?base::logic and ?Comparison for help.
>
>=
!is.na()
!
&
<
<=
is.na()
%in%
|
xor()
arrange(.data, …) Order rows by values of a
column or columns (low to high), use with
desc() to order from high to low.
arrange(mtcars, mpg)
arrange(mtcars, desc(mpg))
add_row(.data, ..., .before = NULL, .after = NULL)
Add one or more rows to a table.
add_row(faithful, eruptions = 1, waiting = 1)
Group Cases
Manipulate Cases
EXTRACT VARIABLES
ADD CASES
ARRANGE CASES
Logical and boolean operators to use with filter()
Column functions return a set of columns as a new table. Use a
variant that ends in _ for non-standard evaluation friendly code.
contains(match)
ends_with(match)
matches(match)
:, e.g. mpg:cyl
-, e.g, -Species
num_range(prefix, range)
one_of()
starts_with(match)
select(.data, …)
Extract columns by name. Also select_if()
select(iris, Sepal.Length, Species)
Manipulate Variables
Use these helpers with select (),
e.g. select(iris, starts_with("Sepal"))
These apply vectorized functions to columns. Vectorized funs take
vectors as input and return vectors of the same length as output
(see back).
mutate(.data, …) !
Compute new column(s).
mutate(mtcars, gpm = 1/mpg)
transmute(.data, …)!
Compute new column(s), drop others.
transmute(mtcars, gpm = 1/mpg)
mutate_all(.tbl, .funs, …) Apply funs to every
column. Use with funs(). !
mutate_all(faithful, funs(log(.), log2(.)))
mutate_at(.tbl, .cols, .funs, …) Apply funs to
specific columns. Use with funs(), vars() and
the helper functions for select().!
mutate_at(iris, vars( -Species), funs(log(.)))
mutate_if(.tbl, .predicate, .funs, …) !
Apply funs to all columns of one type. !
Use with funs().!
mutate_if(iris, is.numeric, funs(log(.)))
add_column(.data, ..., .before = NULL, .after =
NULL) Add new column(s).
add_column(mtcars, new = 1:32)
rename(.data, …) Rename columns.!
rename(iris, Length = Sepal.Length)
MAKE NEW VARIABLES
EXTRACT CASES
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
w
dplyr
summary function
vectorized function
dplyr
Data Transformation with dplyr : : CHEAT SHEET
B
B
pf2