Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Regular Expressions: Understanding and Using Patterns for String Matching in Python, Schemes and Mind Maps of Mathematics

An introduction to regular expressions, a powerful tool for specifying and matching patterns in strings. Regular expressions are used to define regular languages, which can be matched against strings using Python's re module. the basics of regular expression matching using the match() and search() functions, as well as the use of metacharacters, repetition, character classes, and capture groups. It also includes examples of using these techniques to extract specific information from strings.

What you will learn

  • What is a regular language and how is it specified with a regular expression?
  • What are capture groups and how are they used in regular expressions?
  • What is the difference between match() and search() functions in Python's re module?
  • How do metacharacters work in regular expressions?
  • How does the match() function in Python's re module work?

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

lalitdiya
lalitdiya 🇺🇸

4.3

(25)

240 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Regular Expressions
1 / 12
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Regular Expressions: Understanding and Using Patterns for String Matching in Python and more Schemes and Mind Maps Mathematics in PDF only on Docsity!

Regular Expressions

https://xkcd.com/208/

Matching with match()

Every string is a regular expression, so let’s explore the re module using

simple string patterns.

re ’s match(pattern, string) function applies a pattern to a string:

re.match(r'foo', 'foobar') <_sre.SRE_Match object; span=(0, 3), match='foo'> re.match(r'oo', 'foobar')

match returns a Match object if the string begins with the pattern, or

None if it does not.

Notice that we use a special raw string syntax for regular expressions

because normal Python strings use backslash () as an escape character

but regexes use backslash extensively, so usgin raw strings avoids having

to double-escape special regex forms that use backslash.

Finding Matches with search() and findall()

search(pattern, string) is like match, but it finds the first

occurrence of pattern in string, wherever it occurs in the string (not just

the beginning).

re.match(r'oo', 'foobar') re.search(r'oo', 'foobar') <_sre.SRE_Match object; span=(1, 3), match='oo'>

Note the span=(1, 3) in the returned match object. It specifies the

location within the string that contained the match, using the same

indexing scheme used in slices, i.e., from beginning index inclusive to

ending index exclusive.

findall returns a list of substrings matched by the regex pattern.

re.findall(r'na', 'nana nana nana nana Batman!') ['na', 'na', 'na', 'na', 'na', 'na', 'na', 'na']

Using the Match Object

Since match and search return a Match object if a match is found, or

None if no match is found, a common programming idiom is to test the

Match object directly.

m = re.match(r'foo', 'foobar') if m: ... print('Match found: ' + m.group()) ... Match found: oo

Most of the examples in this lecture will use findall for simplicity and

to demonstrate multiple matches in a single string.

Metacharacters

Regexes are much more powerful when you add metacharacters. We’ll

learn the basics of:

I . - Match any character

I \ - Escape special characters

I | - Or operator

I ^ - Match at the beginning of a string/line

I $ - Match at the end of a string/line

I * - Match 0 or more of the preceding regex

I + - Match 1 or more of the preceding regex

I ? - Match 0 or 1 of the preceding regex

I { } - Bounded repetition

I [ ] - Character class

I ( ) - Capture group within a matched substring

Repetition

* matches 0 or more of the preceding regex

re.findall(r'a.a*', 'abra abra cadabra') ['ab', 'a a', 'a ', 'ada']

+ matches 1 or more of the preceding regex

re.findall(r'a.+a', 'abra abra cadabra') ['abra abra cadabra']

Notice that .+ performed a greedy match - it matched as many

characters as possible. We can make it non-greedy by adding a ?:

re.findall(r'a.+?a', 'abra abra cadabra') ['abra', 'abra', 'ada']

? after an ordinary character matches 0 or 1 of them

re.findall(r'ab?a', 'aba anna abba aa') ['aba', 'aa']

{ } bounds the repetition by an arbitray number

re.findall(r'ab{2}a', 'aba anna abba abbba') ['abba']

Character Classes and Alternatives

[ ] creates an arbitrary character class

re.findall(r'[rmpl]ain', 'the rain in spain falls mainly in the plain') ['rain', 'pain', 'main', 'lain']

You can specify ranges of characters in a character class.

re.findall(r'[0-9]+', '500 Tech Parkway, Atlanta, GA 30332') [' 500 ', ' 30332 ']

You can specify alternative patterns to match with |, which you can read

as "or."

re.findall(r'rain|plain', 'the rain in spain falls mainly in the plain') ['rain', 'plain']

Match Capture Groups

Capture groups allow you to match on a pattern but capture a substring

of what was matched. This is particularly useful in extracting element

text from XML-like documents where your pattern includes the open and

close tags but you only want the text between the tags.

activities = ''' ...

    ...
  • eat
  • ...
  • sleep
  • ...
  • code
  • ...
''' re.findall(r'
  • (.+)
  • ', activities) ['eat', 'sleep', 'code']