Python Regular Expressions: Difference between revisions
Line 90: | Line 90: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Groups are 1-based. <code>group(0)</code> represents the entire expression. | Groups are 1-based. <code>group(0)</code> represents the entire expression. | ||
<font color=darkkhaki> | |||
Bug: when a regular expression like this one is used: '....()?()?' (two optional groups), and the last group is None, m.groups(one_based_last_group_index) throws IndexError. The solution was to retrieve the groups as a tuple before any evaluation, and use it for testing: | |||
<syntaxhighlight lang='python'> | |||
</syntaxhighlight> | |||
groups = m.groups() | |||
if m.groups(1): | |||
... | |||
... | |||
if groups[4]: | |||
... | |||
</font> | |||
=Scan a String= | =Scan a String= |
Revision as of 00:29, 16 March 2022
External
Internal
TODO
PROCESS: https://docs.python.org/3/howto/regex.html#regex-howto
Overview
A regular expression is specified with r"..."
Metacharacters
(...)
Used to capture groups.
^
Not a certain character, or a set of characters.
[^a]
. (dot)
Stands for "any one character". To match an actual dot, escape it:
\.
NOT Metacharacters
The following characters are matched without any escaping:
{...}
Patterns
At most one group of characters:
(...)?
Replacing Regular Expression Occurrences
import re
s = "this is a {{color}} car"
print(re.sub(r"{{color}}", 'blue', s))
Strip quotes:
s = "'something'"
re.sub(r"'$", '', re.sub(r"^'", '', s))
Capture groups and use them in the replacement:
s = 'this is red'
s2 = re.sub(r'^(this is).*$', '\\1 blue', s)
assert 'this is blue' == s2
To dynamically build a regular expression, use rf'...'
s = 'this is a red string'
color = 'red'
s2 = re.sub(rf'{color}', 'blue', s)
assert 'this is a blue string' == s2
Match a new line:
r'\n'
Match not a new line:
r'[^\n]'
Match a Pattern and Pick Up Groups
import re
p = re.compile(r'^(\w+):(\w+)-(\w+)$')
s = 'abc:mnp-xyz'
m = p.match(s)
if m:
assert 'abc:mnp-xyz' == m.group(0)
assert 'abc' == m.group(1)
assert 'mnp' == m.group(2)
assert 'xyz' == m.group(3)
Groups are 1-based. group(0)
represents the entire expression.
Bug: when a regular expression like this one is used: '....()?()?' (two optional groups), and the last group is None, m.groups(one_based_last_group_index) throws IndexError. The solution was to retrieve the groups as a tuple before any evaluation, and use it for testing:
groups = m.groups() if m.groups(1):
...
... if groups[4]:
...
Scan a String
match = re.search(pattern, string)
if match:
process(match)