Search notes:

Python standard library: re

Python's standard library re manages regular expressions.
Match
Pattern
RegexFlag
Scanner
T
TEMPLATE
U
UNICODE
compile()
copyreg
enum
error
escape()
findall() Matches all occurences of a pattern (unlike search() which only matches the first one) and returns them as a list of strings or tuples.
finditer()
fullmatch()
functools
match() Checks for a match at the beginning of a string. Compare with search()
purge()
search() Matches one regular expression anywhere in a string. Compare with match() and findall().
split() Creates a list from a string. The regular expression is used to determine where the string is divided.
sre_compile
sre_parse
sub() Replace matched text with a constant value or the value a function returns.
subn()
template()
_MAXCACHE
_cache
_compile()
_compile_repl
_expand()
_locale()
_pickle()
_special_chars_map
_subx()

flags

Some functions in the re module (such as re.compile()) have a flags parameter which specifies further characteristics of the regular expression. The value of this parameter is added from one or more the following values:
re.A re.ASCII (?a) 256 \w etc match ASCII characters only (default is that \w etc. match Unicode characrters)
re.DEBUG 128 print debug information about compiled regexp
re.I re.IGNORECASE (?i) 2 match case insensitively
re.L re.LOCALE (?L) 4 \w etc. mtach case insensitively depending on the current locale
re.M re.MULTILINE (?m) 8 ^ matches start of string or character after new line
re.S re.DOTALL (?s) 16 . matches all chracters inclusively new line
re.X re.VERBOSE (?x) 64 allow to write more readable regexpes.

Simple script

#!/usr/bin/python
import re

re_number = re.compile('\d+')


for i in ['foo', 'bar 42 baz', 'hello', 'etc', '20' ]:

    if re_number.match(i):
       print (i + " is a number")

    if re_number.search(i):
       print (i + " contains a number")
    
    # bar 42 baz contains a number
    # 20 is a number
    # 20 contains a number


print ("---")

for found in re.findall(r'(\w+)\s+(\d+)', 'foo 42 bar 18 baz 19 x'):
    print (found[0] + ': ' + found[1])
    # foo: 42
    # bar: 18
    # baz: 19

print ("---")

print (re.sub(r'\d+', 'XX', 'foo 42 bar 18 baz 19 x'))
# foo XX bar XX baz XX x
Github repository about-python, path: /standard-library/re/script.py
re.search() returns a re.Match object.
#!/usr/bin/python

import re

if re.search('\d\d\d', 'one 234 five six'):
   print ("matched")
   # matched
else:
   print ("didn't match")
Github repository about-python, path: /standard-library/re/search-1.py
#!/usr/bin/python

import re

match = re.search('(\d\d\d|\w\w\w)', 'one 234 five s')
if match:
   print (match.group())
   # one
   print (match.group(1))
   # one
else:
   print ("didn't match")
Github repository about-python, path: /standard-library/re/search-2.py
Using if match := re.search… to put the if statement the return value of re.search in one line:
import re

if match := re.search('foo:\s+(\d+),\s+bar:\s+(\d+)', 'hello foo:   42, bar:  1001xyz'):

   print (match.group() )  # foo:   42, bar:  1001
   print (match.group(1))  # 42
   print (match.group(2))  # 1001
Github repository about-python, path: /standard-library/re/search-3.py

findall

#!/usr/bin/python

import re

re_numbers = re.compile('\d+')

for found in re_numbers.findall('foo 42 bar 18 baz 19 x'):
    print (found)
    # 42
    # 18
    # 19
Github repository about-python, path: /standard-library/re/findall.py

Return a list of tuples

In the following example, the pattern contains parantheses. Each match is returned as a tuple where the values of the text matched in the parantheses is captured in the elements of the tuple.
import re
for pair in re.findall('(\w+): (\d+)', 'foo: 42; bar: 99; baz: 0'):
    print(pair[0] + ' = ' + pair[1]) 

search() vs match()

re.search() searches within the entire text while match() only searches from the text's start.
Both, re.search() and re.match() return a re.Match object.

sub

Replace a range

Replace every character between g and p with an asterik.
Note the unintuitive order of parameters: First the pattern, then the replacement and only then the text on which the replacement is to take place.
#!/usr/bin/python
import re

txt = 'abc defghi jklmn opq rstu vwx yz';

print(re.sub('[g-p]', '*', txt))
#
#  abc def*** ***** **q rstu vwx yz
#

Replace with the result of a function

import re
def double(m):
    print(type(m.group(0)))
    return str(2 * int(m.group(0)))

print(re.sub(r'\d+', double, 'foo 42 bar 99 baz'))

Iterate over words in a text

The following example iterates over the words in a piece of text and skips punctuation:
import re

txt = """\
Foo, bar and baz. Those three words! Do
new lines work, too? Yes: they do.\
"""

words=re.split('[ .,?;:!\n]+', txt)

for word in words:
    print(word)
Github repository about-python, path: /standard-library/re/split-text-into-words.py

Extract first line from a text

import re

text = """\
This is the first line.
The second one.
The final one."""

re_1st_line = re.compile('.*')

first_line = re.match(re_1st_line, text)

print(first_line[0])
Github repository about-python, path: /standard-library/re/extract-first-line-from-text.py

Using the returned search() object in an if statement

With the walrus operator, it is possible to assign the the object that is returned by search() in an if statement:
import re

reNumbers = re.compile('(\d+)')

def getNumber(txt):

   if m := reNumbers.search(txt):
      print('The extracted number is ' + m.group(1))

   else:
      print('No number found in ' + txt)


getNumber('hello world')
getNumber('the number is 42, what else?')
Github repository about-python, path: /standard-library/re/search-if.py

See also

Some simple examples
There is also a non-standard library named regex that is compatible with re but offers additional functionality and a more thorough Unicode support.
Among the additional functionality is the ability to match Unicode properties (\p{…})
standard library
R-strings (like r'foo\bar\baz') are helpful to deal with the many backslashes used for regular expressions.

Index