Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Work with Strings with Stringr Cheat Sheet, Cheat Sheet of Java Programming

Cheat sheet on the stringr package that provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.

Typology: Cheat Sheet

2019/2020

Uploaded on 10/23/2020

judyth
judyth 🇺🇸

4.6

(27)

321 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Join and Split
str_c(..., sep = "", collapse = NULL) Join
multiple strings into a single string.
str_c(letters, LETTERS)
str_c(..., sep = "", collapse = NULL) Collapse a
vector of strings into a single string.
str_c(letters, collapse = "")
str_dup(string, times) Repeat strings times
times. str_dup(fruit, times = 2)
str_split_fixed(string, pattern, n) Split a
vector of strings into a matrix of substrings
(splitting at occurrences of a pattern match).
Also str_split to return a list of substrings.
str_split_fixed(fruit, " ", n=2)
glue::glue(..., .sep = "", .envir =
parent.frame(), .open = "{", .close = "}") Create
a string from strings and {expressions} to
evaluate. glue::glue("Pi is {pi}")
glue::glue_data(.x, ..., .sep = "", .envir =
parent.frame(), .open = "{", .close = "}") Use a
data frame, list, or environment to create a
string from strings and {expressions} to
evaluate. glue::glue_data(mtcars,
"{rownames(mtcars)} has {hp} hp")
{xx}
{yy}
a string
A STRING
A STRING
a string
Mutate Strings
str_sub() <- value. Replace substrings by
identifying the substrings with str_sub() and
assigning into the results.
str_sub(fruit, 1, 3) <- "str"
str_replace(string, pattern, replacement)
Replace the first matched pattern in each
string. str_replace(fruit, "a", "-")
str_replace_all(string, pattern,
replacement) Replace all matched patterns
in each string. str_replace_all(fruit, "a", "-")
str_to_lower(string, locale = "en")1 Convert
strings to lower case.
str_to_lower(sentences)
str_to_upper(string, locale = "en")1 Convert
strings to upper case.
str_to_upper(sentences)
str_to_title(string, locale = "en")1 Convert
strings to title case. str_to_title(sentences)
a string
A String
str_conv(string, encoding) Override the
encoding of a string. str_conv(fruit,"ISO-8859-1")
str_view(string, pattern, match = NA) View
HTML rendering of first regex match in each
string. str_view(fruit, "[aeiou]")
str_view_all(string, pattern, match = NA) View
HTML rendering of all regex matches.
str_view_all(fruit, "[aeiou]")
str_wrap(string, width = 80, indent = 0, exdent
= 0) Wrap strings into nicely formatted
paragraphs. str_wrap(sentences, 20)
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor ! • stringr 1.2.0 • Updated: 2017-10
Work with strings with stringr : : CHEAT SHEET
Detect Matches
str_detect(string, pattern) Detect the
presence of a pattern match in a string.
str_detect(fruit, "a")
str_which(string, pattern) Find the indexes of
strings that contain a pattern match.
str_which(fruit, "a")
str_count(string, pattern) Count the number
of matches in a string.
str_count(fruit, "a")
str_locate(string, pattern) Locate the
positions of pattern matches in a string. Also
str_locate_all. str_locate(fruit, "a")
Manage Lengths
TRUE
TRUE
FALSE
TRUE
1
2
4
0
3
1
2
start
end
2
4
4
7
NA
NA
3
4
str_length(string) The width of strings (i.e.
number of code points, which generally equals
the number of characters). str_length(fruit)
str_pad(string, width, side = c("left", "right",
"both"), pad = " ") Pad strings to constant
width. str_pad(fruit, 17)
str_trunc(string, width, side = c("right", "left",
"center"), ellipsis = "...") Truncate the width of
strings, replacing content with ellipsis.
str_trunc(fruit, 3)
str_trim(string, side = c("both", "left", "right"))
Trim whitespace from the start and/or end of a
string. str_trim(fruit)
4
6
2
3
Helpers
str_order(x, decreasing = FALSE, na_last =
TRUE, locale = "en", numeric = FALSE, ...)1 Return
the vector of indexes that sorts a character
vector. x[str_order(x)]
str_sort(x, decreasing = FALSE, na_last = TRUE,
locale = "en", numeric = FALSE, ...)1 Sort a
character vector.
str_sort(x)
4
1
3
2
Order Strings
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
Subset Strings
str_sub(string, start = 1L, end = -1L) Extract
substrings from a character vector.
str_sub(fruit, 1, 3); str_sub(fruit, -2)
str_subset(string, pattern) Return only the
strings that contain a pattern match.
str_subset(fruit, "b")
str_extract(string, pattern) Return the first
pattern match found in each string, as a vector.
Also str_extract_all to return every pattern
match. str_extract(fruit, "[aeiou]")
str_match(string, pattern) Return the first
pattern match found in each string, as a
matrix with a column for each ( ) group in
pattern. Also str_match_all.
str_match(sentences, "(a|the) ([^ ]+)")
NA
1 See bit.ly/ISO639-1 for a complete list of locales.
pf2

Partial preview of the text

Download Work with Strings with Stringr Cheat Sheet and more Cheat Sheet Java Programming in PDF only on Docsity!

Join and Split

str_c (..., sep = "", collapse = NULL) Join

multiple strings into a single string.

str_c(letters, LETTERS)

str_c (..., sep = "", collapse = NULL ) Collapse a

vector of strings into a single string.

str_c(letters, collapse = "")

str_dup (string, times) Repeat strings times

times. str_dup(fruit, times = 2)

str_split_fixed (string, pattern , n) Split a

vector of strings into a matrix of substrings

(splitting at occurrences of a pattern match).

Also str_split to return a list of substrings.

str_split_fixed(fruit, " ", n=2)

glue::glue (..., .sep = "", .envir =

parent.frame(), .open = "{", .close = "}") Create

a string from strings and {expressions} to

evaluate. glue::glue("Pi is {pi}")

glue::glue_data (.x, ..., .sep = "", .envir =

parent.frame(), .open = "{", .close = "}") Use a

data frame, list, or environment to create a

string from strings and {expressions} to

evaluate. glue::glue_data(mtcars,

"{rownames(mtcars)} has {hp} hp")

a string^ {xx}^ {yy} A STRING A STRING a string

Mutate Strings

str_sub () <- value. Replace substrings by

identifying the substrings with str_sub() and

assigning into the results.

str_sub(fruit, 1, 3) <- "str"

str_replace (string, pattern , replacement)

Replace the first matched pattern in each

string. str_replace(fruit, "a", "-")

str_replace_all (string, pattern ,

replacement) Replace all matched patterns

in each string. str_replace_all(fruit, "a", "-")

str_to_lower (string, locale = "en")^1 Convert

strings to lower case.

str_to_lower(sentences)

str_to_upper (string, locale = "en")^1 Convert

strings to upper case.

str_to_upper(sentences)

str_to_title (string, locale = "en")^1 Convert

strings to title case. str_to_title(sentences)

a string A String

str_conv (string, encoding) Override the

encoding of a string. str_conv(fruit,"ISO-8859-1")

str_view (string, pattern , match = NA) View

HTML rendering of first regex match in each

string. str_view(fruit, "[aeiou]")

str_view_all (string, pattern , match = NA) View

HTML rendering of all regex matches.

str_view_all(fruit, "[aeiou]")

str_wrap (string, width = 80, indent = 0, exdent

= 0) Wrap strings into nicely formatted

paragraphs. str_wrap(sentences, 20)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor! • stringr 1.2.0 • Updated: 2017-

Work with strings with stringr : : CHEAT SHEET

Detect Matches

str_detect (string, pattern ) Detect the

presence of a pattern match in a string.

str_detect(fruit, "a")

str_which (string, pattern ) Find the indexes of

strings that contain a pattern match.

str_which(fruit, "a")

str_count (string, pattern ) Count the number

of matches in a string.

str_count(fruit, "a")

str_locate (string, pattern ) Locate the

positions of pattern matches in a string. Also

str_locate_all. str_locate(fruit, "a")

Manage Lengths

TRUE TRUE FALSE TRUE 1 2 4 0 3 1 2 start end 2 4 4 7 NA NA 3 4

str_length (string) The width of strings (i.e.

number of code points, which generally equals

the number of characters). str_length(fruit)

str_pad (string, width, side = c("left", "right",

"both"), pad = " ") Pad strings to constant

width. str_pad(fruit, 17)

str_trunc (string, width, side = c("right", "left",

"center"), ellipsis = "...") Truncate the width of

strings, replacing content with ellipsis.

str_trunc(fruit, 3)

str_trim (string, side = c("both", "left", "right"))

Trim whitespace from the start and/or end of a

string. str_trim(fruit)

4 6 2 3

Helpers

str_order (x, decreasing = FALSE, na_last =

TRUE, locale = "en", numeric = FALSE, ...)^1 Return

the vector of indexes that sorts a character

vector. x[str_order(x)]

str_sort (x, decreasing = FALSE, na_last = TRUE,

locale = "en", numeric = FALSE, ...)^1 Sort a

character vector.

str_sort(x)

4 1 3 2

Order Strings

The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks. NA NA

Subset Strings

str_sub (string, start = 1L, end = -1L) Extract

substrings from a character vector.

str_sub(fruit, 1, 3); str_sub(fruit, -2)

str_subset (string, pattern ) Return only the

strings that contain a pattern match.

str_subset(fruit, "b")

str_extract (string, pattern ) Return the first

pattern match found in each string, as a vector.

Also str_extract_all to return every pattern

match. str_extract(fruit, "[aeiou]")

str_match (string, pattern ) Return the first

pattern match found in each string, as a

matrix with a column for each ( ) group in

pattern. Also str_match_all.

str_match(sentences, "(a|the) ([^ ]+)")

NA

1 See bit.ly/ISO639-1 for a complete list of locales.

1 2 ... n n ... m regexp matches example a? zero or one quant("a?") .a.aa.aaa a ***** zero or more quant("a*") .a.aa.aaa a + one or more quant("a+") .a.aa.aaa a {n} exactly n quant("a{2}") .a.aa.aaa a {n, } n or more quant("a{2,}") .a.aa.aaa a {n, m} between n and m quant("a{2,4}") .a.aa.aaa string (type this) regexp (to mean this) matches (which matches this) example (the result is the same as ref("abba")) \1 \1 (etc.) first () group, etc. ref("(a)(b)\2\1") abbaab regexp matches example ^ a start of string anchor("^a") aaa a $ end of string anchor("a$") aaa regexp matches example ab | d or alt("ab|d") abcde [ abe ] one of alt("[abe]") abcde [^ abe ] anything but alt("[^abe]") abcde [ a - c ] range alt("[a-c]") abcde regex (pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...) Modifies a regex to ignore cases, match end of lines as well of end of strings, allow R comments within regex's , and/or to have. match everything including \n. str_detect("I", regex("i", TRUE)) fixed () Matches raw bytes but will miss some characters that can be represented in multiple ways (fast). str_detect("\u0130", fixed("i")) coll () Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr")) boundary () Matches boundaries between characters, line_breaks, sentences, or words. str_split(sentences, boundary("word")) Special Character Represents \
" " \n new line Need to Know Regular Expressions - Pattern arguments in stringr are interpreted as regular expressions after any special characters have been parsed. In R, you write regular expressions as strings, sequences of characters surrounded by quotes ( "" ) or single quotes( '' ). Some characters cannot be represented directly in an R string. These must be represented as special characters , sequences of characters that have a specific meaning., e.g. RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor! • stringr 1.2.0 • Updated: 2017- Run ?"'" to see a complete list Because of this, whenever a \ appears in a regular expression, you must write it as \ in the string that represents the regular expression. Use writeLines () to see how R views your string after all special characters have been parsed. **writeLines("\.")

.

writeLines("\ is a backslash")

\ is a backslash**

MATCH CHARACTERS

QUANTIFIERS^ quant <- function(rx) str_view_all(".a.aa.aaa", rx)

ANCHORS^ anchor <- function(rx) str_view_all("aaa", rx)

GROUPS

Use parentheses to set precedent (order of evaluation) and create groups Use an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern. Refer to each group by its order of appearance ref <- function(rx) str_view_all("abbaab", rx)

ALTERNATES^ alt <- function(rx) str_view_all("abcde", rx)

LOOK AROUNDS^ look <- function(rx) str_view_all("bacad", rx)

INTERPRETATION

Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of: regexp matches example ( ab | d ) e sets precedence alt("(ab|d)e") abcde see <- function(rx) str_view_all("abc ABC 123\t.!?\(){}\n", rx) Regular expressions, or regexps, are a concise language for describing patterns in strings.

a b c d e f

g h i j k l

m n o p q r

s t u v w x

z

[:lower:]

A B C D E F

G H I J K L

M N O P Q R

S T U V W X

Z

[:upper:]

[:alpha:]

[:digit:]

[:alnum:]

[:punct:]

. , : ;?! \ | / ` = * + - ^

_ ~ " ' [ ] { } ( ) < > @ # $

[:graph:]

[:blank:].

[:space:]

space

tab

" new line

(^1) Many base R functions require classes to be wrapped in a second set of [ ] , e.g. [[:digit:]] string (type this) regexp (to mean this) matches (which matches this) example a (etc.) a (etc.) see("a") abc ABC 123 .!?(){} \. .. see("\.") abc ABC 123 .!?(){} \! !! see("\!") abc ABC 123 .!?(){} \? ?? see("\?") abc ABC 123 .!?(){} \\ \ \ see("\\") abc ABC 123 .!?(){} \( ( ( see("\(") abc ABC 123 .!?(){} \) ) ) see("\)") abc ABC 123 .!?(){} \{ { { see("\{") abc ABC 123 .!?(){} \} } } see( "\}") abc ABC 123 .!?(){} \n \n new line (return) see("\n") abc ABC 123 .!?(){} \t \t tab see("\t") abc ABC 123 .!?(){} \s \s any whitespace ( \S for non-whitespaces) see("\s") abc ABC 123 .!?(){} \d \d any digit ( \D for non-digits) see("\d") abc ABC 123 .!?(){} \w \w any word character ( \W for non-word chars) see("\w") abc ABC 123 .!?(){} \b \b word boundaries see("\b") abc ABC 123 .!?(){} [:digit:] digits see("[:digit:]") abc ABC 123 .!?(){} [:alpha:] letters see("[:alpha:]") abc ABC 123 .!?(){} [:lower:] lowercase letters see("[:lower:]") abc ABC 123 .!?(){} [:upper:] uppercase letters see("[:upper:]") abc ABC 123 .!?(){} [:alnum:] letters and numbers see("[:alnum:]") abc ABC 123 .!?(){} [:punct:] punctuation see("[:punct:]") abc ABC 123 .!?(){} [:graph:] letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?(){} [:space:] space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?(){} [:blank:] space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?(){}

. every character except a new line see(".") abc ABC 123 .!?(){} 1 1 1 1 1 1 1 1 1 (^1 2) ... n regexp matches example a (?= c ) followed by look("a(?=c)") bacad a (?! c ) not followed by look("a(?!c)") bacad (?<= b ) a preceded by look("(?<=b)a") bacad (?<! b ) a not preceded by look("(?<!b)a") bacad