

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Cheat sheet on the stringr package that provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
Typology: Cheat Sheet
1 / 2
This page cannot be seen from the preview
Don't miss anything!
a string^ {xx}^ {yy} A STRING A STRING a string
a string A String
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor! • stringr 1.2.0 • Updated: 2017-
TRUE TRUE FALSE TRUE 1 2 4 0 3 1 2 start end 2 4 4 7 NA NA 3 4
4 6 2 3
4 1 3 2
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks. NA NA
NA
1 2 ... n n ... m regexp matches example a? zero or one quant("a?") .a.aa.aaa a ***** zero or more quant("a*") .a.aa.aaa a + one or more quant("a+") .a.aa.aaa a {n} exactly n quant("a{2}") .a.aa.aaa a {n, } n or more quant("a{2,}") .a.aa.aaa a {n, m} between n and m quant("a{2,4}") .a.aa.aaa string (type this) regexp (to mean this) matches (which matches this) example (the result is the same as ref("abba")) \1 \1 (etc.) first () group, etc. ref("(a)(b)\2\1") abbaab regexp matches example ^ a start of string anchor("^a") aaa a $ end of string anchor("a$") aaa regexp matches example ab | d or alt("ab|d") abcde [ abe ] one of alt("[abe]") abcde [^ abe ] anything but alt("[^abe]") abcde [ a - c ] range alt("[a-c]") abcde regex (pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...) Modifies a regex to ignore cases, match end of lines as well of end of strings, allow R comments within regex's , and/or to have. match everything including \n. str_detect("I", regex("i", TRUE)) fixed () Matches raw bytes but will miss some characters that can be represented in multiple ways (fast). str_detect("\u0130", fixed("i")) coll () Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr")) boundary () Matches boundaries between characters, line_breaks, sentences, or words. str_split(sentences, boundary("word")) Special Character Represents \
" " \n new line Need to Know Regular Expressions - Pattern arguments in stringr are interpreted as regular expressions after any special characters have been parsed. In R, you write regular expressions as strings, sequences of characters surrounded by quotes ( "" ) or single quotes( '' ). Some characters cannot be represented directly in an R string. These must be represented as special characters , sequences of characters that have a specific meaning., e.g. RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor! • stringr 1.2.0 • Updated: 2017- Run ?"'" to see a complete list Because of this, whenever a \ appears in a regular expression, you must write it as \ in the string that represents the regular expression. Use writeLines () to see how R views your string after all special characters have been parsed. **writeLines("\.")
writeLines("\ is a backslash")
Use parentheses to set precedent (order of evaluation) and create groups Use an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern. Refer to each group by its order of appearance ref <- function(rx) str_view_all("abbaab", rx)
Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of: regexp matches example ( ab | d ) e sets precedence alt("(ab|d)e") abcde see <- function(rx) str_view_all("abc ABC 123\t.!?\(){}\n", rx) Regular expressions, or regexps, are a concise language for describing patterns in strings.
(^1) Many base R functions require classes to be wrapped in a second set of [ ] , e.g. [[:digit:]] string (type this) regexp (to mean this) matches (which matches this) example a (etc.) a (etc.) see("a") abc ABC 123 .!?(){} \. .. see("\.") abc ABC 123 .!?(){} \! !! see("\!") abc ABC 123 .!?(){} \? ?? see("\?") abc ABC 123 .!?(){} \\ \ \ see("\\") abc ABC 123 .!?(){} \( ( ( see("\(") abc ABC 123 .!?(){} \) ) ) see("\)") abc ABC 123 .!?(){} \{ { { see("\{") abc ABC 123 .!?(){} \} } } see( "\}") abc ABC 123 .!?(){} \n \n new line (return) see("\n") abc ABC 123 .!?(){} \t \t tab see("\t") abc ABC 123 .!?(){} \s \s any whitespace ( \S for non-whitespaces) see("\s") abc ABC 123 .!?(){} \d \d any digit ( \D for non-digits) see("\d") abc ABC 123 .!?(){} \w \w any word character ( \W for non-word chars) see("\w") abc ABC 123 .!?(){} \b \b word boundaries see("\b") abc ABC 123 .!?(){} [:digit:] digits see("[:digit:]") abc ABC 123 .!?(){} [:alpha:] letters see("[:alpha:]") abc ABC 123 .!?(){} [:lower:] lowercase letters see("[:lower:]") abc ABC 123 .!?(){} [:upper:] uppercase letters see("[:upper:]") abc ABC 123 .!?(){} [:alnum:] letters and numbers see("[:alnum:]") abc ABC 123 .!?(){} [:punct:] punctuation see("[:punct:]") abc ABC 123 .!?(){} [:graph:] letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?(){} [:space:] space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?(){} [:blank:] space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?(){}
. every character except a new line see(".") abc ABC 123 .!?(){} 1 1 1 1 1 1 1 1 1 (^1 2) ... n regexp matches example a (?= c ) followed by look("a(?=c)") bacad a (?! c ) not followed by look("a(?!c)") bacad (?<= b ) a preceded by look("(?<=b)a") bacad (?<! b ) a not preceded by look("(?<!b)a") bacad