1839 lines
192 KiB
HTML
1839 lines
192 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<title>re — Regular expression operations — Python 3.7.4 documentation</title>
|
||
<link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
|
||
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
|
||
|
||
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
|
||
<script type="text/javascript" src="../_static/jquery.js"></script>
|
||
<script type="text/javascript" src="../_static/underscore.js"></script>
|
||
<script type="text/javascript" src="../_static/doctools.js"></script>
|
||
<script type="text/javascript" src="../_static/language_data.js"></script>
|
||
|
||
<script type="text/javascript" src="../_static/sidebar.js"></script>
|
||
|
||
<link rel="search" type="application/opensearchdescription+xml"
|
||
title="Search within Python 3.7.4 documentation"
|
||
href="../_static/opensearch.xml"/>
|
||
<link rel="author" title="About these documents" href="../about.html" />
|
||
<link rel="index" title="Index" href="../genindex.html" />
|
||
<link rel="search" title="Search" href="../search.html" />
|
||
<link rel="copyright" title="Copyright" href="../copyright.html" />
|
||
<link rel="next" title="difflib — Helpers for computing deltas" href="difflib.html" />
|
||
<link rel="prev" title="string — Common string operations" href="string.html" />
|
||
<link rel="shortcut icon" type="image/png" href="../_static/py.png" />
|
||
<link rel="canonical" href="https://docs.python.org/3/library/re.html" />
|
||
|
||
<script type="text/javascript" src="../_static/copybutton.js"></script>
|
||
<script type="text/javascript" src="../_static/switchers.js"></script>
|
||
|
||
|
||
|
||
<style>
|
||
@media only screen {
|
||
table.full-width-table {
|
||
width: 100%;
|
||
}
|
||
}
|
||
</style>
|
||
|
||
|
||
</head><body>
|
||
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
accesskey="I">index</a></li>
|
||
<li class="right" >
|
||
<a href="../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="difflib.html" title="difflib — Helpers for computing deltas"
|
||
accesskey="N">next</a> |</li>
|
||
<li class="right" >
|
||
<a href="string.html" title="string — Common string operations"
|
||
accesskey="P">previous</a> |</li>
|
||
<li><img src="../_static/py.png" alt=""
|
||
style="vertical-align: middle; margin-top: -1px"/></li>
|
||
<li><a href="https://www.python.org/">Python</a> »</li>
|
||
<li>
|
||
<span class="language_switcher_placeholder">en</span>
|
||
<span class="version_switcher_placeholder">3.7.4</span>
|
||
<a href="../index.html">Documentation </a> »
|
||
</li>
|
||
|
||
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="text.html" accesskey="U">Text Processing Services</a> »</li>
|
||
<li class="right">
|
||
|
||
|
||
<div class="inline-search" style="display: none" role="search">
|
||
<form class="inline-search" action="../search.html" method="get">
|
||
<input placeholder="Quick search" type="text" name="q" />
|
||
<input type="submit" value="Go" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
<script type="text/javascript">$('.inline-search').show(0);</script>
|
||
|
|
||
</li>
|
||
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="document">
|
||
<div class="documentwrapper">
|
||
<div class="bodywrapper">
|
||
<div class="body" role="main">
|
||
|
||
<div class="section" id="module-re">
|
||
<span id="re-regular-expression-operations"></span><h1><a class="reference internal" href="#module-re" title="re: Regular expression operations."><code class="xref py py-mod docutils literal notranslate"><span class="pre">re</span></code></a> — Regular expression operations<a class="headerlink" href="#module-re" title="Permalink to this headline">¶</a></h1>
|
||
<p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.7/Lib/re.py">Lib/re.py</a></p>
|
||
<hr class="docutils" />
|
||
<p>This module provides regular expression matching operations similar to
|
||
those found in Perl.</p>
|
||
<p>Both patterns and strings to be searched can be Unicode strings (<a class="reference internal" href="stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a>)
|
||
as well as 8-bit strings (<a class="reference internal" href="stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a>).
|
||
However, Unicode strings and 8-bit strings cannot be mixed:
|
||
that is, you cannot match a Unicode string with a byte pattern or
|
||
vice-versa; similarly, when asking for a substitution, the replacement
|
||
string must be of the same type as both the pattern and the search string.</p>
|
||
<p>Regular expressions use the backslash character (<code class="docutils literal notranslate"><span class="pre">'\'</span></code>) to indicate
|
||
special forms or to allow special characters to be used without invoking
|
||
their special meaning. This collides with Python’s usage of the same
|
||
character for the same purpose in string literals; for example, to match
|
||
a literal backslash, one might have to write <code class="docutils literal notranslate"><span class="pre">'\\\\'</span></code> as the pattern
|
||
string, because the regular expression must be <code class="docutils literal notranslate"><span class="pre">\\</span></code>, and each
|
||
backslash must be expressed as <code class="docutils literal notranslate"><span class="pre">\\</span></code> inside a regular Python string
|
||
literal.</p>
|
||
<p>The solution is to use Python’s raw string notation for regular expression
|
||
patterns; backslashes are not handled in any special way in a string literal
|
||
prefixed with <code class="docutils literal notranslate"><span class="pre">'r'</span></code>. So <code class="docutils literal notranslate"><span class="pre">r"\n"</span></code> is a two-character string containing
|
||
<code class="docutils literal notranslate"><span class="pre">'\'</span></code> and <code class="docutils literal notranslate"><span class="pre">'n'</span></code>, while <code class="docutils literal notranslate"><span class="pre">"\n"</span></code> is a one-character string containing a
|
||
newline. Usually patterns will be expressed in Python code using this raw
|
||
string notation.</p>
|
||
<p>It is important to note that most regular expression operations are available as
|
||
module-level functions and methods on
|
||
<a class="reference internal" href="#re-objects"><span class="std std-ref">compiled regular expressions</span></a>. The functions are shortcuts
|
||
that don’t require you to compile a regex object first, but miss some
|
||
fine-tuning parameters.</p>
|
||
<div class="admonition seealso">
|
||
<p class="admonition-title">See also</p>
|
||
<p>The third-party <a class="reference external" href="https://pypi.org/project/regex/">regex</a> module,
|
||
which has an API compatible with the standard library <a class="reference internal" href="#module-re" title="re: Regular expression operations."><code class="xref py py-mod docutils literal notranslate"><span class="pre">re</span></code></a> module,
|
||
but offers additional functionality and a more thorough Unicode support.</p>
|
||
</div>
|
||
<div class="section" id="regular-expression-syntax">
|
||
<span id="re-syntax"></span><h2>Regular Expression Syntax<a class="headerlink" href="#regular-expression-syntax" title="Permalink to this headline">¶</a></h2>
|
||
<p>A regular expression (or RE) specifies a set of strings that matches it; the
|
||
functions in this module let you check if a particular string matches a given
|
||
regular expression (or if a given regular expression matches a particular
|
||
string, which comes down to the same thing).</p>
|
||
<p>Regular expressions can be concatenated to form new regular expressions; if <em>A</em>
|
||
and <em>B</em> are both regular expressions, then <em>AB</em> is also a regular expression.
|
||
In general, if a string <em>p</em> matches <em>A</em> and another string <em>q</em> matches <em>B</em>, the
|
||
string <em>pq</em> will match AB. This holds unless <em>A</em> or <em>B</em> contain low precedence
|
||
operations; boundary conditions between <em>A</em> and <em>B</em>; or have numbered group
|
||
references. Thus, complex expressions can easily be constructed from simpler
|
||
primitive expressions like the ones described here. For details of the theory
|
||
and implementation of regular expressions, consult the Friedl book <a class="reference internal" href="#frie09" id="id1"><span>[Frie09]</span></a>,
|
||
or almost any textbook about compiler construction.</p>
|
||
<p>A brief explanation of the format of regular expressions follows. For further
|
||
information and a gentler presentation, consult the <a class="reference internal" href="../howto/regex.html#regex-howto"><span class="std std-ref">Regular Expression HOWTO</span></a>.</p>
|
||
<p>Regular expressions can contain both special and ordinary characters. Most
|
||
ordinary characters, like <code class="docutils literal notranslate"><span class="pre">'A'</span></code>, <code class="docutils literal notranslate"><span class="pre">'a'</span></code>, or <code class="docutils literal notranslate"><span class="pre">'0'</span></code>, are the simplest regular
|
||
expressions; they simply match themselves. You can concatenate ordinary
|
||
characters, so <code class="docutils literal notranslate"><span class="pre">last</span></code> matches the string <code class="docutils literal notranslate"><span class="pre">'last'</span></code>. (In the rest of this
|
||
section, we’ll write RE’s in <code class="docutils literal notranslate"><span class="pre">this</span> <span class="pre">special</span> <span class="pre">style</span></code>, usually without quotes, and
|
||
strings to be matched <code class="docutils literal notranslate"><span class="pre">'in</span> <span class="pre">single</span> <span class="pre">quotes'</span></code>.)</p>
|
||
<p>Some characters, like <code class="docutils literal notranslate"><span class="pre">'|'</span></code> or <code class="docutils literal notranslate"><span class="pre">'('</span></code>, are special. Special
|
||
characters either stand for classes of ordinary characters, or affect
|
||
how the regular expressions around them are interpreted.</p>
|
||
<p>Repetition qualifiers (<code class="docutils literal notranslate"><span class="pre">*</span></code>, <code class="docutils literal notranslate"><span class="pre">+</span></code>, <code class="docutils literal notranslate"><span class="pre">?</span></code>, <code class="docutils literal notranslate"><span class="pre">{m,n}</span></code>, etc) cannot be
|
||
directly nested. This avoids ambiguity with the non-greedy modifier suffix
|
||
<code class="docutils literal notranslate"><span class="pre">?</span></code>, and with other modifiers in other implementations. To apply a second
|
||
repetition to an inner repetition, parentheses may be used. For example,
|
||
the expression <code class="docutils literal notranslate"><span class="pre">(?:a{6})*</span></code> matches any multiple of six <code class="docutils literal notranslate"><span class="pre">'a'</span></code> characters.</p>
|
||
<p>The special characters are:</p>
|
||
<dl class="simple" id="index-0">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">.</span></code></dt><dd><p>(Dot.) In the default mode, this matches any character except a newline. If
|
||
the <a class="reference internal" href="#re.DOTALL" title="re.DOTALL"><code class="xref py py-const docutils literal notranslate"><span class="pre">DOTALL</span></code></a> flag has been specified, this matches any character
|
||
including a newline.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-1">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">^</span></code></dt><dd><p>(Caret.) Matches the start of the string, and in <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal notranslate"><span class="pre">MULTILINE</span></code></a> mode also
|
||
matches immediately after each newline.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-2">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">$</span></code></dt><dd><p>Matches the end of the string or just before the newline at the end of the
|
||
string, and in <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal notranslate"><span class="pre">MULTILINE</span></code></a> mode also matches before a newline. <code class="docutils literal notranslate"><span class="pre">foo</span></code>
|
||
matches both ‘foo’ and ‘foobar’, while the regular expression <code class="docutils literal notranslate"><span class="pre">foo$</span></code> matches
|
||
only ‘foo’. More interestingly, searching for <code class="docutils literal notranslate"><span class="pre">foo.$</span></code> in <code class="docutils literal notranslate"><span class="pre">'foo1\nfoo2\n'</span></code>
|
||
matches ‘foo2’ normally, but ‘foo1’ in <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal notranslate"><span class="pre">MULTILINE</span></code></a> mode; searching for
|
||
a single <code class="docutils literal notranslate"><span class="pre">$</span></code> in <code class="docutils literal notranslate"><span class="pre">'foo\n'</span></code> will find two (empty) matches: one just before
|
||
the newline, and one at the end of the string.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-3">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">*</span></code></dt><dd><p>Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
|
||
many repetitions as are possible. <code class="docutils literal notranslate"><span class="pre">ab*</span></code> will match ‘a’, ‘ab’, or ‘a’ followed
|
||
by any number of ‘b’s.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-4">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">+</span></code></dt><dd><p>Causes the resulting RE to match 1 or more repetitions of the preceding RE.
|
||
<code class="docutils literal notranslate"><span class="pre">ab+</span></code> will match ‘a’ followed by any non-zero number of ‘b’s; it will not
|
||
match just ‘a’.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-5">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">?</span></code></dt><dd><p>Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
|
||
<code class="docutils literal notranslate"><span class="pre">ab?</span></code> will match either ‘a’ or ‘ab’.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-6">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">*?</span></code>, <code class="docutils literal notranslate"><span class="pre">+?</span></code>, <code class="docutils literal notranslate"><span class="pre">??</span></code></dt><dd><p>The <code class="docutils literal notranslate"><span class="pre">'*'</span></code>, <code class="docutils literal notranslate"><span class="pre">'+'</span></code>, and <code class="docutils literal notranslate"><span class="pre">'?'</span></code> qualifiers are all <em class="dfn">greedy</em>; they match
|
||
as much text as possible. Sometimes this behaviour isn’t desired; if the RE
|
||
<code class="docutils literal notranslate"><span class="pre"><.*></span></code> is matched against <code class="docutils literal notranslate"><span class="pre">'<a></span> <span class="pre">b</span> <span class="pre"><c>'</span></code>, it will match the entire
|
||
string, and not just <code class="docutils literal notranslate"><span class="pre">'<a>'</span></code>. Adding <code class="docutils literal notranslate"><span class="pre">?</span></code> after the qualifier makes it
|
||
perform the match in <em class="dfn">non-greedy</em> or <em class="dfn">minimal</em> fashion; as <em>few</em>
|
||
characters as possible will be matched. Using the RE <code class="docutils literal notranslate"><span class="pre"><.*?></span></code> will match
|
||
only <code class="docutils literal notranslate"><span class="pre">'<a>'</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-7">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">{m}</span></code></dt><dd><p>Specifies that exactly <em>m</em> copies of the previous RE should be matched; fewer
|
||
matches cause the entire RE not to match. For example, <code class="docutils literal notranslate"><span class="pre">a{6}</span></code> will match
|
||
exactly six <code class="docutils literal notranslate"><span class="pre">'a'</span></code> characters, but not five.</p>
|
||
</dd>
|
||
<dt><code class="docutils literal notranslate"><span class="pre">{m,n}</span></code></dt><dd><p>Causes the resulting RE to match from <em>m</em> to <em>n</em> repetitions of the preceding
|
||
RE, attempting to match as many repetitions as possible. For example,
|
||
<code class="docutils literal notranslate"><span class="pre">a{3,5}</span></code> will match from 3 to 5 <code class="docutils literal notranslate"><span class="pre">'a'</span></code> characters. Omitting <em>m</em> specifies a
|
||
lower bound of zero, and omitting <em>n</em> specifies an infinite upper bound. As an
|
||
example, <code class="docutils literal notranslate"><span class="pre">a{4,}b</span></code> will match <code class="docutils literal notranslate"><span class="pre">'aaaab'</span></code> or a thousand <code class="docutils literal notranslate"><span class="pre">'a'</span></code> characters
|
||
followed by a <code class="docutils literal notranslate"><span class="pre">'b'</span></code>, but not <code class="docutils literal notranslate"><span class="pre">'aaab'</span></code>. The comma may not be omitted or the
|
||
modifier would be confused with the previously described form.</p>
|
||
</dd>
|
||
<dt><code class="docutils literal notranslate"><span class="pre">{m,n}?</span></code></dt><dd><p>Causes the resulting RE to match from <em>m</em> to <em>n</em> repetitions of the preceding
|
||
RE, attempting to match as <em>few</em> repetitions as possible. This is the
|
||
non-greedy version of the previous qualifier. For example, on the
|
||
6-character string <code class="docutils literal notranslate"><span class="pre">'aaaaaa'</span></code>, <code class="docutils literal notranslate"><span class="pre">a{3,5}</span></code> will match 5 <code class="docutils literal notranslate"><span class="pre">'a'</span></code> characters,
|
||
while <code class="docutils literal notranslate"><span class="pre">a{3,5}?</span></code> will only match 3 characters.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl id="index-8">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\</span></code></dt><dd><p>Either escapes special characters (permitting you to match characters like
|
||
<code class="docutils literal notranslate"><span class="pre">'*'</span></code>, <code class="docutils literal notranslate"><span class="pre">'?'</span></code>, and so forth), or signals a special sequence; special
|
||
sequences are discussed below.</p>
|
||
<p>If you’re not using a raw string to express the pattern, remember that Python
|
||
also uses the backslash as an escape sequence in string literals; if the escape
|
||
sequence isn’t recognized by Python’s parser, the backslash and subsequent
|
||
character are included in the resulting string. However, if Python would
|
||
recognize the resulting sequence, the backslash should be repeated twice. This
|
||
is complicated and hard to understand, so it’s highly recommended that you use
|
||
raw strings for all but the simplest expressions.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl id="index-9">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">[]</span></code></dt><dd><p>Used to indicate a set of characters. In a set:</p>
|
||
<ul class="simple">
|
||
<li><p>Characters can be listed individually, e.g. <code class="docutils literal notranslate"><span class="pre">[amk]</span></code> will match <code class="docutils literal notranslate"><span class="pre">'a'</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">'m'</span></code>, or <code class="docutils literal notranslate"><span class="pre">'k'</span></code>.</p></li>
|
||
</ul>
|
||
<ul class="simple" id="index-10">
|
||
<li><p>Ranges of characters can be indicated by giving two characters and separating
|
||
them by a <code class="docutils literal notranslate"><span class="pre">'-'</span></code>, for example <code class="docutils literal notranslate"><span class="pre">[a-z]</span></code> will match any lowercase ASCII letter,
|
||
<code class="docutils literal notranslate"><span class="pre">[0-5][0-9]</span></code> will match all the two-digits numbers from <code class="docutils literal notranslate"><span class="pre">00</span></code> to <code class="docutils literal notranslate"><span class="pre">59</span></code>, and
|
||
<code class="docutils literal notranslate"><span class="pre">[0-9A-Fa-f]</span></code> will match any hexadecimal digit. If <code class="docutils literal notranslate"><span class="pre">-</span></code> is escaped (e.g.
|
||
<code class="docutils literal notranslate"><span class="pre">[a\-z]</span></code>) or if it’s placed as the first or last character
|
||
(e.g. <code class="docutils literal notranslate"><span class="pre">[-a]</span></code> or <code class="docutils literal notranslate"><span class="pre">[a-]</span></code>), it will match a literal <code class="docutils literal notranslate"><span class="pre">'-'</span></code>.</p></li>
|
||
<li><p>Special characters lose their special meaning inside sets. For example,
|
||
<code class="docutils literal notranslate"><span class="pre">[(+*)]</span></code> will match any of the literal characters <code class="docutils literal notranslate"><span class="pre">'('</span></code>, <code class="docutils literal notranslate"><span class="pre">'+'</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">'*'</span></code>, or <code class="docutils literal notranslate"><span class="pre">')'</span></code>.</p></li>
|
||
</ul>
|
||
<ul class="simple" id="index-11">
|
||
<li><p>Character classes such as <code class="docutils literal notranslate"><span class="pre">\w</span></code> or <code class="docutils literal notranslate"><span class="pre">\S</span></code> (defined below) are also accepted
|
||
inside a set, although the characters they match depends on whether
|
||
<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> or <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">LOCALE</span></code></a> mode is in force.</p></li>
|
||
</ul>
|
||
<ul class="simple" id="index-12">
|
||
<li><p>Characters that are not within a range can be matched by <em class="dfn">complementing</em>
|
||
the set. If the first character of the set is <code class="docutils literal notranslate"><span class="pre">'^'</span></code>, all the characters
|
||
that are <em>not</em> in the set will be matched. For example, <code class="docutils literal notranslate"><span class="pre">[^5]</span></code> will match
|
||
any character except <code class="docutils literal notranslate"><span class="pre">'5'</span></code>, and <code class="docutils literal notranslate"><span class="pre">[^^]</span></code> will match any character except
|
||
<code class="docutils literal notranslate"><span class="pre">'^'</span></code>. <code class="docutils literal notranslate"><span class="pre">^</span></code> has no special meaning if it’s not the first character in
|
||
the set.</p></li>
|
||
<li><p>To match a literal <code class="docutils literal notranslate"><span class="pre">']'</span></code> inside a set, precede it with a backslash, or
|
||
place it at the beginning of the set. For example, both <code class="docutils literal notranslate"><span class="pre">[()[\]{}]</span></code> and
|
||
<code class="docutils literal notranslate"><span class="pre">[]()[{}]</span></code> will both match a parenthesis.</p></li>
|
||
</ul>
|
||
<ul class="simple">
|
||
<li><p>Support of nested sets and set operations as in <a class="reference external" href="https://unicode.org/reports/tr18/">Unicode Technical
|
||
Standard #18</a> might be added in the future. This would change the
|
||
syntax, so to facilitate this change a <a class="reference internal" href="exceptions.html#FutureWarning" title="FutureWarning"><code class="xref py py-exc docutils literal notranslate"><span class="pre">FutureWarning</span></code></a> will be raised
|
||
in ambiguous cases for the time being.
|
||
That includes sets starting with a literal <code class="docutils literal notranslate"><span class="pre">'['</span></code> or containing literal
|
||
character sequences <code class="docutils literal notranslate"><span class="pre">'--'</span></code>, <code class="docutils literal notranslate"><span class="pre">'&&'</span></code>, <code class="docutils literal notranslate"><span class="pre">'~~'</span></code>, and <code class="docutils literal notranslate"><span class="pre">'||'</span></code>. To
|
||
avoid a warning escape them with a backslash.</p></li>
|
||
</ul>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span><a class="reference internal" href="exceptions.html#FutureWarning" title="FutureWarning"><code class="xref py py-exc docutils literal notranslate"><span class="pre">FutureWarning</span></code></a> is raised if a character set contains constructs
|
||
that will change semantically in the future.</p>
|
||
</div>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-13">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">|</span></code></dt><dd><p><code class="docutils literal notranslate"><span class="pre">A|B</span></code>, where <em>A</em> and <em>B</em> can be arbitrary REs, creates a regular expression that
|
||
will match either <em>A</em> or <em>B</em>. An arbitrary number of REs can be separated by the
|
||
<code class="docutils literal notranslate"><span class="pre">'|'</span></code> in this way. This can be used inside groups (see below) as well. As
|
||
the target string is scanned, REs separated by <code class="docutils literal notranslate"><span class="pre">'|'</span></code> are tried from left to
|
||
right. When one pattern completely matches, that branch is accepted. This means
|
||
that once <em>A</em> matches, <em>B</em> will not be tested further, even if it would
|
||
produce a longer overall match. In other words, the <code class="docutils literal notranslate"><span class="pre">'|'</span></code> operator is never
|
||
greedy. To match a literal <code class="docutils literal notranslate"><span class="pre">'|'</span></code>, use <code class="docutils literal notranslate"><span class="pre">\|</span></code>, or enclose it inside a
|
||
character class, as in <code class="docutils literal notranslate"><span class="pre">[|]</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-14">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(...)</span></code></dt><dd><p>Matches whatever regular expression is inside the parentheses, and indicates the
|
||
start and end of a group; the contents of a group can be retrieved after a match
|
||
has been performed, and can be matched later in the string with the <code class="docutils literal notranslate"><span class="pre">\number</span></code>
|
||
special sequence, described below. To match the literals <code class="docutils literal notranslate"><span class="pre">'('</span></code> or <code class="docutils literal notranslate"><span class="pre">')'</span></code>,
|
||
use <code class="docutils literal notranslate"><span class="pre">\(</span></code> or <code class="docutils literal notranslate"><span class="pre">\)</span></code>, or enclose them inside a character class: <code class="docutils literal notranslate"><span class="pre">[(]</span></code>, <code class="docutils literal notranslate"><span class="pre">[)]</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-15">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?...)</span></code></dt><dd><p>This is an extension notation (a <code class="docutils literal notranslate"><span class="pre">'?'</span></code> following a <code class="docutils literal notranslate"><span class="pre">'('</span></code> is not meaningful
|
||
otherwise). The first character after the <code class="docutils literal notranslate"><span class="pre">'?'</span></code> determines what the meaning
|
||
and further syntax of the construct is. Extensions usually do not create a new
|
||
group; <code class="docutils literal notranslate"><span class="pre">(?P<name>...)</span></code> is the only exception to this rule. Following are the
|
||
currently supported extensions.</p>
|
||
</dd>
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?aiLmsux)</span></code></dt><dd><p>(One or more letters from the set <code class="docutils literal notranslate"><span class="pre">'a'</span></code>, <code class="docutils literal notranslate"><span class="pre">'i'</span></code>, <code class="docutils literal notranslate"><span class="pre">'L'</span></code>, <code class="docutils literal notranslate"><span class="pre">'m'</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">'s'</span></code>, <code class="docutils literal notranslate"><span class="pre">'u'</span></code>, <code class="docutils literal notranslate"><span class="pre">'x'</span></code>.) The group matches the empty string; the
|
||
letters set the corresponding flags: <a class="reference internal" href="#re.A" title="re.A"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.A</span></code></a> (ASCII-only matching),
|
||
<a class="reference internal" href="#re.I" title="re.I"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.I</span></code></a> (ignore case), <a class="reference internal" href="#re.L" title="re.L"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.L</span></code></a> (locale dependent),
|
||
<a class="reference internal" href="#re.M" title="re.M"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.M</span></code></a> (multi-line), <a class="reference internal" href="#re.S" title="re.S"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.S</span></code></a> (dot matches all),
|
||
<code class="xref py py-const docutils literal notranslate"><span class="pre">re.U</span></code> (Unicode matching), and <a class="reference internal" href="#re.X" title="re.X"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.X</span></code></a> (verbose),
|
||
for the entire regular expression.
|
||
(The flags are described in <a class="reference internal" href="#contents-of-module-re"><span class="std std-ref">Module Contents</span></a>.)
|
||
This is useful if you wish to include the flags as part of the
|
||
regular expression, instead of passing a <em>flag</em> argument to the
|
||
<a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal notranslate"><span class="pre">re.compile()</span></code></a> function. Flags should be used first in the
|
||
expression string.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl id="index-16">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?:...)</span></code></dt><dd><p>A non-capturing version of regular parentheses. Matches whatever regular
|
||
expression is inside the parentheses, but the substring matched by the group
|
||
<em>cannot</em> be retrieved after performing a match or referenced later in the
|
||
pattern.</p>
|
||
</dd>
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?aiLmsux-imsx:...)</span></code></dt><dd><p>(Zero or more letters from the set <code class="docutils literal notranslate"><span class="pre">'a'</span></code>, <code class="docutils literal notranslate"><span class="pre">'i'</span></code>, <code class="docutils literal notranslate"><span class="pre">'L'</span></code>, <code class="docutils literal notranslate"><span class="pre">'m'</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">'s'</span></code>, <code class="docutils literal notranslate"><span class="pre">'u'</span></code>, <code class="docutils literal notranslate"><span class="pre">'x'</span></code>, optionally followed by <code class="docutils literal notranslate"><span class="pre">'-'</span></code> followed by
|
||
one or more letters from the <code class="docutils literal notranslate"><span class="pre">'i'</span></code>, <code class="docutils literal notranslate"><span class="pre">'m'</span></code>, <code class="docutils literal notranslate"><span class="pre">'s'</span></code>, <code class="docutils literal notranslate"><span class="pre">'x'</span></code>.)
|
||
The letters set or remove the corresponding flags:
|
||
<a class="reference internal" href="#re.A" title="re.A"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.A</span></code></a> (ASCII-only matching), <a class="reference internal" href="#re.I" title="re.I"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.I</span></code></a> (ignore case),
|
||
<a class="reference internal" href="#re.L" title="re.L"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.L</span></code></a> (locale dependent), <a class="reference internal" href="#re.M" title="re.M"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.M</span></code></a> (multi-line),
|
||
<a class="reference internal" href="#re.S" title="re.S"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.S</span></code></a> (dot matches all), <code class="xref py py-const docutils literal notranslate"><span class="pre">re.U</span></code> (Unicode matching),
|
||
and <a class="reference internal" href="#re.X" title="re.X"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.X</span></code></a> (verbose), for the part of the expression.
|
||
(The flags are described in <a class="reference internal" href="#contents-of-module-re"><span class="std std-ref">Module Contents</span></a>.)</p>
|
||
<p>The letters <code class="docutils literal notranslate"><span class="pre">'a'</span></code>, <code class="docutils literal notranslate"><span class="pre">'L'</span></code> and <code class="docutils literal notranslate"><span class="pre">'u'</span></code> are mutually exclusive when used
|
||
as inline flags, so they can’t be combined or follow <code class="docutils literal notranslate"><span class="pre">'-'</span></code>. Instead,
|
||
when one of them appears in an inline group, it overrides the matching mode
|
||
in the enclosing group. In Unicode patterns <code class="docutils literal notranslate"><span class="pre">(?a:...)</span></code> switches to
|
||
ASCII-only matching, and <code class="docutils literal notranslate"><span class="pre">(?u:...)</span></code> switches to Unicode matching
|
||
(default). In byte pattern <code class="docutils literal notranslate"><span class="pre">(?L:...)</span></code> switches to locale depending
|
||
matching, and <code class="docutils literal notranslate"><span class="pre">(?a:...)</span></code> switches to ASCII-only matching (default).
|
||
This override is only in effect for the narrow inline group, and the
|
||
original matching mode is restored outside of the group.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.6.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>The letters <code class="docutils literal notranslate"><span class="pre">'a'</span></code>, <code class="docutils literal notranslate"><span class="pre">'L'</span></code> and <code class="docutils literal notranslate"><span class="pre">'u'</span></code> also can be used in a group.</p>
|
||
</div>
|
||
</dd>
|
||
</dl>
|
||
<dl id="index-17">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?P<name>...)</span></code></dt><dd><p>Similar to regular parentheses, but the substring matched by the group is
|
||
accessible via the symbolic group name <em>name</em>. Group names must be valid
|
||
Python identifiers, and each group name must be defined only once within a
|
||
regular expression. A symbolic group is also a numbered group, just as if
|
||
the group were not named.</p>
|
||
<p>Named groups can be referenced in three contexts. If the pattern is
|
||
<code class="docutils literal notranslate"><span class="pre">(?P<quote>['"]).*?(?P=quote)</span></code> (i.e. matching a string quoted with either
|
||
single or double quotes):</p>
|
||
<table class="docutils align-center">
|
||
<colgroup>
|
||
<col style="width: 53%" />
|
||
<col style="width: 47%" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr class="row-odd"><th class="head"><p>Context of reference to group “quote”</p></th>
|
||
<th class="head"><p>Ways to reference it</p></th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr class="row-even"><td><p>in the same pattern itself</p></td>
|
||
<td><ul class="simple">
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">(?P=quote)</span></code> (as shown)</p></li>
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">\1</span></code></p></li>
|
||
</ul>
|
||
</td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p>when processing match object <em>m</em></p></td>
|
||
<td><ul class="simple">
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">m.group('quote')</span></code></p></li>
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">m.end('quote')</span></code> (etc.)</p></li>
|
||
</ul>
|
||
</td>
|
||
</tr>
|
||
<tr class="row-even"><td><p>in a string passed to the <em>repl</em>
|
||
argument of <code class="docutils literal notranslate"><span class="pre">re.sub()</span></code></p></td>
|
||
<td><ul class="simple">
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">\g<quote></span></code></p></li>
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">\g<1></span></code></p></li>
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">\1</span></code></p></li>
|
||
</ul>
|
||
</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-18">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?P=name)</span></code></dt><dd><p>A backreference to a named group; it matches whatever text was matched by the
|
||
earlier group named <em>name</em>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-19">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?#...)</span></code></dt><dd><p>A comment; the contents of the parentheses are simply ignored.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-20">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?=...)</span></code></dt><dd><p>Matches if <code class="docutils literal notranslate"><span class="pre">...</span></code> matches next, but doesn’t consume any of the string. This is
|
||
called a <em class="dfn">lookahead assertion</em>. For example, <code class="docutils literal notranslate"><span class="pre">Isaac</span> <span class="pre">(?=Asimov)</span></code> will match
|
||
<code class="docutils literal notranslate"><span class="pre">'Isaac</span> <span class="pre">'</span></code> only if it’s followed by <code class="docutils literal notranslate"><span class="pre">'Asimov'</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-21">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?!...)</span></code></dt><dd><p>Matches if <code class="docutils literal notranslate"><span class="pre">...</span></code> doesn’t match next. This is a <em class="dfn">negative lookahead assertion</em>.
|
||
For example, <code class="docutils literal notranslate"><span class="pre">Isaac</span> <span class="pre">(?!Asimov)</span></code> will match <code class="docutils literal notranslate"><span class="pre">'Isaac</span> <span class="pre">'</span></code> only if it’s <em>not</em>
|
||
followed by <code class="docutils literal notranslate"><span class="pre">'Asimov'</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl id="index-22">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?<=...)</span></code></dt><dd><p>Matches if the current position in the string is preceded by a match for <code class="docutils literal notranslate"><span class="pre">...</span></code>
|
||
that ends at the current position. This is called a <em class="dfn">positive lookbehind
|
||
assertion</em>. <code class="docutils literal notranslate"><span class="pre">(?<=abc)def</span></code> will find a match in <code class="docutils literal notranslate"><span class="pre">'abcdef'</span></code>, since the
|
||
lookbehind will back up 3 characters and check if the contained pattern matches.
|
||
The contained pattern must only match strings of some fixed length, meaning that
|
||
<code class="docutils literal notranslate"><span class="pre">abc</span></code> or <code class="docutils literal notranslate"><span class="pre">a|b</span></code> are allowed, but <code class="docutils literal notranslate"><span class="pre">a*</span></code> and <code class="docutils literal notranslate"><span class="pre">a{3,4}</span></code> are not. Note that
|
||
patterns which start with positive lookbehind assertions will not match at the
|
||
beginning of the string being searched; you will most likely want to use the
|
||
<a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">search()</span></code></a> function rather than the <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal notranslate"><span class="pre">match()</span></code></a> function:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">re</span>
|
||
<span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'(?<=abc)def'</span><span class="p">,</span> <span class="s1">'abcdef'</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||
<span class="go">'def'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>This example looks for a word following a hyphen:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(?<=-)\w+'</span><span class="p">,</span> <span class="s1">'spam-egg'</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||
<span class="go">'egg'</span>
|
||
</pre></div>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.5: </span>Added support for group references of fixed length.</p>
|
||
</div>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-23">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?<!...)</span></code></dt><dd><p>Matches if the current position in the string is not preceded by a match for
|
||
<code class="docutils literal notranslate"><span class="pre">...</span></code>. This is called a <em class="dfn">negative lookbehind assertion</em>. Similar to
|
||
positive lookbehind assertions, the contained pattern must only match strings of
|
||
some fixed length. Patterns which start with negative lookbehind assertions may
|
||
match at the beginning of the string being searched.</p>
|
||
</dd>
|
||
<dt><code class="docutils literal notranslate"><span class="pre">(?(id/name)yes-pattern|no-pattern)</span></code></dt><dd><p>Will try to match with <code class="docutils literal notranslate"><span class="pre">yes-pattern</span></code> if the group with given <em>id</em> or
|
||
<em>name</em> exists, and with <code class="docutils literal notranslate"><span class="pre">no-pattern</span></code> if it doesn’t. <code class="docutils literal notranslate"><span class="pre">no-pattern</span></code> is
|
||
optional and can be omitted. For example,
|
||
<code class="docutils literal notranslate"><span class="pre">(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)</span></code> is a poor email matching pattern, which
|
||
will match with <code class="docutils literal notranslate"><span class="pre">'<user@host.com>'</span></code> as well as <code class="docutils literal notranslate"><span class="pre">'user@host.com'</span></code>, but
|
||
not with <code class="docutils literal notranslate"><span class="pre">'<user@host.com'</span></code> nor <code class="docutils literal notranslate"><span class="pre">'user@host.com>'</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<p>The special sequences consist of <code class="docutils literal notranslate"><span class="pre">'\'</span></code> and a character from the list below.
|
||
If the ordinary character is not an ASCII digit or an ASCII letter, then the
|
||
resulting RE will match the second character. For example, <code class="docutils literal notranslate"><span class="pre">\$</span></code> matches the
|
||
character <code class="docutils literal notranslate"><span class="pre">'$'</span></code>.</p>
|
||
<dl class="simple" id="index-24">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\number</span></code></dt><dd><p>Matches the contents of the group of the same number. Groups are numbered
|
||
starting from 1. For example, <code class="docutils literal notranslate"><span class="pre">(.+)</span> <span class="pre">\1</span></code> matches <code class="docutils literal notranslate"><span class="pre">'the</span> <span class="pre">the'</span></code> or <code class="docutils literal notranslate"><span class="pre">'55</span> <span class="pre">55'</span></code>,
|
||
but not <code class="docutils literal notranslate"><span class="pre">'thethe'</span></code> (note the space after the group). This special sequence
|
||
can only be used to match one of the first 99 groups. If the first digit of
|
||
<em>number</em> is 0, or <em>number</em> is 3 octal digits long, it will not be interpreted as
|
||
a group match, but as the character with octal value <em>number</em>. Inside the
|
||
<code class="docutils literal notranslate"><span class="pre">'['</span></code> and <code class="docutils literal notranslate"><span class="pre">']'</span></code> of a character class, all numeric escapes are treated as
|
||
characters.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-25">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\A</span></code></dt><dd><p>Matches only at the start of the string.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl id="index-26">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\b</span></code></dt><dd><p>Matches the empty string, but only at the beginning or end of a word.
|
||
A word is defined as a sequence of word characters. Note that formally,
|
||
<code class="docutils literal notranslate"><span class="pre">\b</span></code> is defined as the boundary between a <code class="docutils literal notranslate"><span class="pre">\w</span></code> and a <code class="docutils literal notranslate"><span class="pre">\W</span></code> character
|
||
(or vice versa), or between <code class="docutils literal notranslate"><span class="pre">\w</span></code> and the beginning/end of the string.
|
||
This means that <code class="docutils literal notranslate"><span class="pre">r'\bfoo\b'</span></code> matches <code class="docutils literal notranslate"><span class="pre">'foo'</span></code>, <code class="docutils literal notranslate"><span class="pre">'foo.'</span></code>, <code class="docutils literal notranslate"><span class="pre">'(foo)'</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">'bar</span> <span class="pre">foo</span> <span class="pre">baz'</span></code> but not <code class="docutils literal notranslate"><span class="pre">'foobar'</span></code> or <code class="docutils literal notranslate"><span class="pre">'foo3'</span></code>.</p>
|
||
<p>By default Unicode alphanumerics are the ones used in Unicode patterns, but
|
||
this can be changed by using the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag. Word boundaries are
|
||
determined by the current locale if the <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">LOCALE</span></code></a> flag is used.
|
||
Inside a character range, <code class="docutils literal notranslate"><span class="pre">\b</span></code> represents the backspace character, for
|
||
compatibility with Python’s string literals.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-27">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\B</span></code></dt><dd><p>Matches the empty string, but only when it is <em>not</em> at the beginning or end
|
||
of a word. This means that <code class="docutils literal notranslate"><span class="pre">r'py\B'</span></code> matches <code class="docutils literal notranslate"><span class="pre">'python'</span></code>, <code class="docutils literal notranslate"><span class="pre">'py3'</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">'py2'</span></code>, but not <code class="docutils literal notranslate"><span class="pre">'py'</span></code>, <code class="docutils literal notranslate"><span class="pre">'py.'</span></code>, or <code class="docutils literal notranslate"><span class="pre">'py!'</span></code>.
|
||
<code class="docutils literal notranslate"><span class="pre">\B</span></code> is just the opposite of <code class="docutils literal notranslate"><span class="pre">\b</span></code>, so word characters in Unicode
|
||
patterns are Unicode alphanumerics or the underscore, although this can
|
||
be changed by using the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag. Word boundaries are
|
||
determined by the current locale if the <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">LOCALE</span></code></a> flag is used.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-28">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\d</span></code></dt><dd><dl class="simple">
|
||
<dt>For Unicode (str) patterns:</dt><dd><p>Matches any Unicode decimal digit (that is, any character in
|
||
Unicode character category [Nd]). This includes <code class="docutils literal notranslate"><span class="pre">[0-9]</span></code>, and
|
||
also many other digit characters. If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is
|
||
used only <code class="docutils literal notranslate"><span class="pre">[0-9]</span></code> is matched.</p>
|
||
</dd>
|
||
<dt>For 8-bit (bytes) patterns:</dt><dd><p>Matches any decimal digit; this is equivalent to <code class="docutils literal notranslate"><span class="pre">[0-9]</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-29">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\D</span></code></dt><dd><p>Matches any character which is not a decimal digit. This is
|
||
the opposite of <code class="docutils literal notranslate"><span class="pre">\d</span></code>. If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is used this
|
||
becomes the equivalent of <code class="docutils literal notranslate"><span class="pre">[^0-9]</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-30">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\s</span></code></dt><dd><dl class="simple">
|
||
<dt>For Unicode (str) patterns:</dt><dd><p>Matches Unicode whitespace characters (which includes
|
||
<code class="docutils literal notranslate"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>, and also many other characters, for example the
|
||
non-breaking spaces mandated by typography rules in many
|
||
languages). If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is used, only
|
||
<code class="docutils literal notranslate"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code> is matched.</p>
|
||
</dd>
|
||
<dt>For 8-bit (bytes) patterns:</dt><dd><p>Matches characters considered whitespace in the ASCII character set;
|
||
this is equivalent to <code class="docutils literal notranslate"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-31">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\S</span></code></dt><dd><p>Matches any character which is not a whitespace character. This is
|
||
the opposite of <code class="docutils literal notranslate"><span class="pre">\s</span></code>. If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is used this
|
||
becomes the equivalent of <code class="docutils literal notranslate"><span class="pre">[^</span> <span class="pre">\t\n\r\f\v]</span></code>.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-32">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\w</span></code></dt><dd><dl class="simple">
|
||
<dt>For Unicode (str) patterns:</dt><dd><p>Matches Unicode word characters; this includes most characters
|
||
that can be part of a word in any language, as well as numbers and
|
||
the underscore. If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is used, only
|
||
<code class="docutils literal notranslate"><span class="pre">[a-zA-Z0-9_]</span></code> is matched.</p>
|
||
</dd>
|
||
<dt>For 8-bit (bytes) patterns:</dt><dd><p>Matches characters considered alphanumeric in the ASCII character set;
|
||
this is equivalent to <code class="docutils literal notranslate"><span class="pre">[a-zA-Z0-9_]</span></code>. If the <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">LOCALE</span></code></a> flag is
|
||
used, matches characters considered alphanumeric in the current locale
|
||
and the underscore.</p>
|
||
</dd>
|
||
</dl>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-33">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\W</span></code></dt><dd><p>Matches any character which is not a word character. This is
|
||
the opposite of <code class="docutils literal notranslate"><span class="pre">\w</span></code>. If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is used this
|
||
becomes the equivalent of <code class="docutils literal notranslate"><span class="pre">[^a-zA-Z0-9_]</span></code>. If the <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">LOCALE</span></code></a> flag is
|
||
used, matches characters considered alphanumeric in the current locale
|
||
and the underscore.</p>
|
||
</dd>
|
||
</dl>
|
||
<dl class="simple" id="index-34">
|
||
<dt><code class="docutils literal notranslate"><span class="pre">\Z</span></code></dt><dd><p>Matches only at the end of the string.</p>
|
||
</dd>
|
||
</dl>
|
||
<p id="index-35">Most of the standard escapes supported by Python string literals are also
|
||
accepted by the regular expression parser:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span>\<span class="n">a</span> \<span class="n">b</span> \<span class="n">f</span> \<span class="n">n</span>
|
||
\<span class="n">r</span> \<span class="n">t</span> \<span class="n">u</span> \<span class="n">U</span>
|
||
\<span class="n">v</span> \<span class="n">x</span> \\
|
||
</pre></div>
|
||
</div>
|
||
<p>(Note that <code class="docutils literal notranslate"><span class="pre">\b</span></code> is used to represent word boundaries, and means “backspace”
|
||
only inside character classes.)</p>
|
||
<p><code class="docutils literal notranslate"><span class="pre">'\u'</span></code> and <code class="docutils literal notranslate"><span class="pre">'\U'</span></code> escape sequences are only recognized in Unicode
|
||
patterns. In bytes patterns they are errors. Unknown escapes of ASCII
|
||
letters are reserved for future use and treated as errors.</p>
|
||
<p>Octal escapes are included in a limited form. If the first digit is a 0, or if
|
||
there are three octal digits, it is considered an octal escape. Otherwise, it is
|
||
a group reference. As for string literals, octal escapes are always at most
|
||
three digits in length.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.3: </span>The <code class="docutils literal notranslate"><span class="pre">'\u'</span></code> and <code class="docutils literal notranslate"><span class="pre">'\U'</span></code> escape sequences have been added.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Unknown escapes consisting of <code class="docutils literal notranslate"><span class="pre">'\'</span></code> and an ASCII letter now are errors.</p>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="module-contents">
|
||
<span id="contents-of-module-re"></span><h2>Module Contents<a class="headerlink" href="#module-contents" title="Permalink to this headline">¶</a></h2>
|
||
<p>The module defines several functions, constants, and an exception. Some of the
|
||
functions are simplified versions of the full featured methods for compiled
|
||
regular expressions. Most non-trivial applications always use the compiled
|
||
form.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Flag constants are now instances of <code class="xref py py-class docutils literal notranslate"><span class="pre">RegexFlag</span></code>, which is a subclass of
|
||
<a class="reference internal" href="enum.html#enum.IntFlag" title="enum.IntFlag"><code class="xref py py-class docutils literal notranslate"><span class="pre">enum.IntFlag</span></code></a>.</p>
|
||
</div>
|
||
<dl class="function">
|
||
<dt id="re.compile">
|
||
<code class="descclassname">re.</code><code class="descname">compile</code><span class="sig-paren">(</span><em>pattern</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.compile" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Compile a regular expression pattern into a <a class="reference internal" href="#re-objects"><span class="std std-ref">regular expression object</span></a>, which can be used for matching using its
|
||
<a class="reference internal" href="#re.Pattern.match" title="re.Pattern.match"><code class="xref py py-func docutils literal notranslate"><span class="pre">match()</span></code></a>, <a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">search()</span></code></a> and other methods, described
|
||
below.</p>
|
||
<p>The expression’s behaviour can be modified by specifying a <em>flags</em> value.
|
||
Values can be any of the following variables, combined using bitwise OR (the
|
||
<code class="docutils literal notranslate"><span class="pre">|</span></code> operator).</p>
|
||
<p>The sequence</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">prog</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">pattern</span><span class="p">)</span>
|
||
<span class="n">result</span> <span class="o">=</span> <span class="n">prog</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>is equivalent to</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>but using <a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal notranslate"><span class="pre">re.compile()</span></code></a> and saving the resulting regular expression
|
||
object for reuse is more efficient when the expression will be used several
|
||
times in a single program.</p>
|
||
<div class="admonition note">
|
||
<p class="admonition-title">Note</p>
|
||
<p>The compiled versions of the most recent patterns passed to
|
||
<a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal notranslate"><span class="pre">re.compile()</span></code></a> and the module-level matching functions are cached, so
|
||
programs that use only a few regular expressions at a time needn’t worry
|
||
about compiling regular expressions.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.A">
|
||
<code class="descclassname">re.</code><code class="descname">A</code><a class="headerlink" href="#re.A" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.ASCII">
|
||
<code class="descclassname">re.</code><code class="descname">ASCII</code><a class="headerlink" href="#re.ASCII" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Make <code class="docutils literal notranslate"><span class="pre">\w</span></code>, <code class="docutils literal notranslate"><span class="pre">\W</span></code>, <code class="docutils literal notranslate"><span class="pre">\b</span></code>, <code class="docutils literal notranslate"><span class="pre">\B</span></code>, <code class="docutils literal notranslate"><span class="pre">\d</span></code>, <code class="docutils literal notranslate"><span class="pre">\D</span></code>, <code class="docutils literal notranslate"><span class="pre">\s</span></code> and <code class="docutils literal notranslate"><span class="pre">\S</span></code>
|
||
perform ASCII-only matching instead of full Unicode matching. This is only
|
||
meaningful for Unicode patterns, and is ignored for byte patterns.
|
||
Corresponds to the inline flag <code class="docutils literal notranslate"><span class="pre">(?a)</span></code>.</p>
|
||
<p>Note that for backward compatibility, the <code class="xref py py-const docutils literal notranslate"><span class="pre">re.U</span></code> flag still
|
||
exists (as well as its synonym <code class="xref py py-const docutils literal notranslate"><span class="pre">re.UNICODE</span></code> and its embedded
|
||
counterpart <code class="docutils literal notranslate"><span class="pre">(?u)</span></code>), but these are redundant in Python 3 since
|
||
matches are Unicode by default for strings (and Unicode matching
|
||
isn’t allowed for bytes).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.DEBUG">
|
||
<code class="descclassname">re.</code><code class="descname">DEBUG</code><a class="headerlink" href="#re.DEBUG" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Display debug information about compiled expression.
|
||
No corresponding inline flag.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.I">
|
||
<code class="descclassname">re.</code><code class="descname">I</code><a class="headerlink" href="#re.I" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.IGNORECASE">
|
||
<code class="descclassname">re.</code><code class="descname">IGNORECASE</code><a class="headerlink" href="#re.IGNORECASE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Perform case-insensitive matching; expressions like <code class="docutils literal notranslate"><span class="pre">[A-Z]</span></code> will also
|
||
match lowercase letters. Full Unicode matching (such as <code class="docutils literal notranslate"><span class="pre">Ü</span></code> matching
|
||
<code class="docutils literal notranslate"><span class="pre">ü</span></code>) also works unless the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.ASCII</span></code></a> flag is used to disable
|
||
non-ASCII matches. The current locale does not change the effect of this
|
||
flag unless the <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.LOCALE</span></code></a> flag is also used.
|
||
Corresponds to the inline flag <code class="docutils literal notranslate"><span class="pre">(?i)</span></code>.</p>
|
||
<p>Note that when the Unicode patterns <code class="docutils literal notranslate"><span class="pre">[a-z]</span></code> or <code class="docutils literal notranslate"><span class="pre">[A-Z]</span></code> are used in
|
||
combination with the <a class="reference internal" href="#re.IGNORECASE" title="re.IGNORECASE"><code class="xref py py-const docutils literal notranslate"><span class="pre">IGNORECASE</span></code></a> flag, they will match the 52 ASCII
|
||
letters and 4 additional non-ASCII letters: ‘İ’ (U+0130, Latin capital
|
||
letter I with dot above), ‘ı’ (U+0131, Latin small letter dotless i),
|
||
‘ſ’ (U+017F, Latin small letter long s) and ‘K’ (U+212A, Kelvin sign).
|
||
If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">ASCII</span></code></a> flag is used, only letters ‘a’ to ‘z’
|
||
and ‘A’ to ‘Z’ are matched.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.L">
|
||
<code class="descclassname">re.</code><code class="descname">L</code><a class="headerlink" href="#re.L" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.LOCALE">
|
||
<code class="descclassname">re.</code><code class="descname">LOCALE</code><a class="headerlink" href="#re.LOCALE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Make <code class="docutils literal notranslate"><span class="pre">\w</span></code>, <code class="docutils literal notranslate"><span class="pre">\W</span></code>, <code class="docutils literal notranslate"><span class="pre">\b</span></code>, <code class="docutils literal notranslate"><span class="pre">\B</span></code> and case-insensitive matching
|
||
dependent on the current locale. This flag can be used only with bytes
|
||
patterns. The use of this flag is discouraged as the locale mechanism
|
||
is very unreliable, it only handles one “culture” at a time, and it only
|
||
works with 8-bit locales. Unicode matching is already enabled by default
|
||
in Python 3 for Unicode (str) patterns, and it is able to handle different
|
||
locales/languages.
|
||
Corresponds to the inline flag <code class="docutils literal notranslate"><span class="pre">(?L)</span></code>.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span><a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.LOCALE</span></code></a> can be used only with bytes patterns and is
|
||
not compatible with <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.ASCII</span></code></a>.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Compiled regular expression objects with the <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal notranslate"><span class="pre">re.LOCALE</span></code></a> flag no
|
||
longer depend on the locale at compile time. Only the locale at
|
||
matching time affects the result of matching.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.M">
|
||
<code class="descclassname">re.</code><code class="descname">M</code><a class="headerlink" href="#re.M" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.MULTILINE">
|
||
<code class="descclassname">re.</code><code class="descname">MULTILINE</code><a class="headerlink" href="#re.MULTILINE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>When specified, the pattern character <code class="docutils literal notranslate"><span class="pre">'^'</span></code> matches at the beginning of the
|
||
string and at the beginning of each line (immediately following each newline);
|
||
and the pattern character <code class="docutils literal notranslate"><span class="pre">'$'</span></code> matches at the end of the string and at the
|
||
end of each line (immediately preceding each newline). By default, <code class="docutils literal notranslate"><span class="pre">'^'</span></code>
|
||
matches only at the beginning of the string, and <code class="docutils literal notranslate"><span class="pre">'$'</span></code> only at the end of the
|
||
string and immediately before the newline (if any) at the end of the string.
|
||
Corresponds to the inline flag <code class="docutils literal notranslate"><span class="pre">(?m)</span></code>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.S">
|
||
<code class="descclassname">re.</code><code class="descname">S</code><a class="headerlink" href="#re.S" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.DOTALL">
|
||
<code class="descclassname">re.</code><code class="descname">DOTALL</code><a class="headerlink" href="#re.DOTALL" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Make the <code class="docutils literal notranslate"><span class="pre">'.'</span></code> special character match any character at all, including a
|
||
newline; without this flag, <code class="docutils literal notranslate"><span class="pre">'.'</span></code> will match anything <em>except</em> a newline.
|
||
Corresponds to the inline flag <code class="docutils literal notranslate"><span class="pre">(?s)</span></code>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="re.X">
|
||
<code class="descclassname">re.</code><code class="descname">X</code><a class="headerlink" href="#re.X" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.VERBOSE">
|
||
<code class="descclassname">re.</code><code class="descname">VERBOSE</code><a class="headerlink" href="#re.VERBOSE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p id="index-36">This flag allows you to write regular expressions that look nicer and are
|
||
more readable by allowing you to visually separate logical sections of the
|
||
pattern and add comments. Whitespace within the pattern is ignored, except
|
||
when in a character class, or when preceded by an unescaped backslash,
|
||
or within tokens like <code class="docutils literal notranslate"><span class="pre">*?</span></code>, <code class="docutils literal notranslate"><span class="pre">(?:</span></code> or <code class="docutils literal notranslate"><span class="pre">(?P<...></span></code>.
|
||
When a line contains a <code class="docutils literal notranslate"><span class="pre">#</span></code> that is not in a character class and is not
|
||
preceded by an unescaped backslash, all characters from the leftmost such
|
||
<code class="docutils literal notranslate"><span class="pre">#</span></code> through the end of the line are ignored.</p>
|
||
<p>This means that the two following regular expression objects that match a
|
||
decimal number are functionally equal:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">a</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"""\d + # the integral part</span>
|
||
<span class="s2"> \. # the decimal point</span>
|
||
<span class="s2"> \d * # some fractional digits"""</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">X</span><span class="p">)</span>
|
||
<span class="n">b</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\d+\.\d*"</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Corresponds to the inline flag <code class="docutils literal notranslate"><span class="pre">(?x)</span></code>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.search">
|
||
<code class="descclassname">re.</code><code class="descname">search</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.search" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Scan through <em>string</em> looking for the first location where the regular expression
|
||
<em>pattern</em> produces a match, and return a corresponding <a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>. Return <code class="docutils literal notranslate"><span class="pre">None</span></code> if no position in the string matches the
|
||
pattern; note that this is different from finding a zero-length match at some
|
||
point in the string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.match">
|
||
<code class="descclassname">re.</code><code class="descname">match</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.match" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>If zero or more characters at the beginning of <em>string</em> match the regular
|
||
expression <em>pattern</em>, return a corresponding <a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>. Return <code class="docutils literal notranslate"><span class="pre">None</span></code> if the string does not match the pattern;
|
||
note that this is different from a zero-length match.</p>
|
||
<p>Note that even in <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal notranslate"><span class="pre">MULTILINE</span></code></a> mode, <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal notranslate"><span class="pre">re.match()</span></code></a> will only match
|
||
at the beginning of the string and not at the beginning of each line.</p>
|
||
<p>If you want to locate a match anywhere in <em>string</em>, use <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">search()</span></code></a>
|
||
instead (see also <a class="reference internal" href="#search-vs-match"><span class="std std-ref">search() vs. match()</span></a>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.fullmatch">
|
||
<code class="descclassname">re.</code><code class="descname">fullmatch</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.fullmatch" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>If the whole <em>string</em> matches the regular expression <em>pattern</em>, return a
|
||
corresponding <a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>. Return <code class="docutils literal notranslate"><span class="pre">None</span></code> if the
|
||
string does not match the pattern; note that this is different from a
|
||
zero-length match.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.4.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.split">
|
||
<code class="descclassname">re.</code><code class="descname">split</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>maxsplit=0</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.split" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Split <em>string</em> by the occurrences of <em>pattern</em>. If capturing parentheses are
|
||
used in <em>pattern</em>, then the text of all groups in the pattern are also returned
|
||
as part of the resulting list. If <em>maxsplit</em> is nonzero, at most <em>maxsplit</em>
|
||
splits occur, and the remainder of the string is returned as the final element
|
||
of the list.</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'\W+'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">)</span>
|
||
<span class="go">['Words', 'words', 'words', '']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(\W+)'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">)</span>
|
||
<span class="go">['Words', ', ', 'words', ', ', 'words', '.', '']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'\W+'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">['Words', 'words, words.']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'[a-f]+'</span><span class="p">,</span> <span class="s1">'0a3B9'</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">)</span>
|
||
<span class="go">['0', '3', '9']</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If there are capturing groups in the separator and it matches at the start of
|
||
the string, the result will start with an empty string. The same holds for
|
||
the end of the string:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(\W+)'</span><span class="p">,</span> <span class="s1">'...words, words...'</span><span class="p">)</span>
|
||
<span class="go">['', '...', 'words', ', ', 'words', '...', '']</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>That way, separator components are always found at the same relative
|
||
indices within the result list.</p>
|
||
<p>Empty matches for the pattern split the string only when not adjacent
|
||
to a previous empty match.</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'\b'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">)</span>
|
||
<span class="go">['', 'Words', ', ', 'words', ', ', 'words', '.']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'\W*'</span><span class="p">,</span> <span class="s1">'...words...'</span><span class="p">)</span>
|
||
<span class="go">['', '', 'w', 'o', 'r', 'd', 's', '', '']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sa">r</span><span class="s1">'(\W*)'</span><span class="p">,</span> <span class="s1">'...words...'</span><span class="p">)</span>
|
||
<span class="go">['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']</span>
|
||
</pre></div>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.1: </span>Added the optional flags argument.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Added support of splitting on a pattern that could match an empty string.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.findall">
|
||
<code class="descclassname">re.</code><code class="descname">findall</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.findall" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return all non-overlapping matches of <em>pattern</em> in <em>string</em>, as a list of
|
||
strings. The <em>string</em> is scanned left-to-right, and matches are returned in
|
||
the order found. If one or more groups are present in the pattern, return a
|
||
list of groups; this will be a list of tuples if the pattern has more than
|
||
one group. Empty matches are included in the result.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Non-empty matches can now start just after a previous empty match.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.finditer">
|
||
<code class="descclassname">re.</code><code class="descname">finditer</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.finditer" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return an <a class="reference internal" href="../glossary.html#term-iterator"><span class="xref std std-term">iterator</span></a> yielding <a class="reference internal" href="#match-objects"><span class="std std-ref">match objects</span></a> over
|
||
all non-overlapping matches for the RE <em>pattern</em> in <em>string</em>. The <em>string</em>
|
||
is scanned left-to-right, and matches are returned in the order found. Empty
|
||
matches are included in the result.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Non-empty matches can now start just after a previous empty match.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.sub">
|
||
<code class="descclassname">re.</code><code class="descname">sub</code><span class="sig-paren">(</span><em>pattern</em>, <em>repl</em>, <em>string</em>, <em>count=0</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.sub" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the string obtained by replacing the leftmost non-overlapping occurrences
|
||
of <em>pattern</em> in <em>string</em> by the replacement <em>repl</em>. If the pattern isn’t found,
|
||
<em>string</em> is returned unchanged. <em>repl</em> can be a string or a function; if it is
|
||
a string, any backslash escapes in it are processed. That is, <code class="docutils literal notranslate"><span class="pre">\n</span></code> is
|
||
converted to a single newline character, <code class="docutils literal notranslate"><span class="pre">\r</span></code> is converted to a carriage return, and
|
||
so forth. Unknown escapes of ASCII letters are reserved for future use and
|
||
treated as errors. Other unknown escapes such as <code class="docutils literal notranslate"><span class="pre">\&</span></code> are left alone.
|
||
Backreferences, such
|
||
as <code class="docutils literal notranslate"><span class="pre">\6</span></code>, are replaced with the substring matched by group 6 in the pattern.
|
||
For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s1">'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'</span><span class="p">,</span>
|
||
<span class="gp">... </span> <span class="sa">r</span><span class="s1">'static PyObject*\npy_\1(void)\n{'</span><span class="p">,</span>
|
||
<span class="gp">... </span> <span class="s1">'def myfunc():'</span><span class="p">)</span>
|
||
<span class="go">'static PyObject*\npy_myfunc(void)\n{'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If <em>repl</em> is a function, it is called for every non-overlapping occurrence of
|
||
<em>pattern</em>. The function takes a single <a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>
|
||
argument, and returns the replacement string. For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">dashrepl</span><span class="p">(</span><span class="n">matchobj</span><span class="p">):</span>
|
||
<span class="gp">... </span> <span class="k">if</span> <span class="n">matchobj</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span> <span class="k">return</span> <span class="s1">' '</span>
|
||
<span class="gp">... </span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="s1">'-'</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s1">'-{1,2}'</span><span class="p">,</span> <span class="n">dashrepl</span><span class="p">,</span> <span class="s1">'pro----gram-files'</span><span class="p">)</span>
|
||
<span class="go">'pro--gram files'</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s1">'\sAND\s'</span><span class="p">,</span> <span class="s1">' & '</span><span class="p">,</span> <span class="s1">'Baked Beans And Spam'</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">)</span>
|
||
<span class="go">'Baked Beans & Spam'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>The pattern may be a string or a <a class="reference internal" href="#re-objects"><span class="std std-ref">pattern object</span></a>.</p>
|
||
<p>The optional argument <em>count</em> is the maximum number of pattern occurrences to be
|
||
replaced; <em>count</em> must be a non-negative integer. If omitted or zero, all
|
||
occurrences will be replaced. Empty matches for the pattern are replaced only
|
||
when not adjacent to a previous empty match, so <code class="docutils literal notranslate"><span class="pre">sub('x*',</span> <span class="pre">'-',</span> <span class="pre">'abxd')</span></code> returns
|
||
<code class="docutils literal notranslate"><span class="pre">'-a-b--d-'</span></code>.</p>
|
||
<p id="index-37">In string-type <em>repl</em> arguments, in addition to the character escapes and
|
||
backreferences described above,
|
||
<code class="docutils literal notranslate"><span class="pre">\g<name></span></code> will use the substring matched by the group named <code class="docutils literal notranslate"><span class="pre">name</span></code>, as
|
||
defined by the <code class="docutils literal notranslate"><span class="pre">(?P<name>...)</span></code> syntax. <code class="docutils literal notranslate"><span class="pre">\g<number></span></code> uses the corresponding
|
||
group number; <code class="docutils literal notranslate"><span class="pre">\g<2></span></code> is therefore equivalent to <code class="docutils literal notranslate"><span class="pre">\2</span></code>, but isn’t ambiguous
|
||
in a replacement such as <code class="docutils literal notranslate"><span class="pre">\g<2>0</span></code>. <code class="docutils literal notranslate"><span class="pre">\20</span></code> would be interpreted as a
|
||
reference to group 20, not a reference to group 2 followed by the literal
|
||
character <code class="docutils literal notranslate"><span class="pre">'0'</span></code>. The backreference <code class="docutils literal notranslate"><span class="pre">\g<0></span></code> substitutes in the entire
|
||
substring matched by the RE.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.1: </span>Added the optional flags argument.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.5: </span>Unmatched groups are replaced with an empty string.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Unknown escapes in <em>pattern</em> consisting of <code class="docutils literal notranslate"><span class="pre">'\'</span></code> and an ASCII letter
|
||
now are errors.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Unknown escapes in <em>repl</em> consisting of <code class="docutils literal notranslate"><span class="pre">'\'</span></code> and an ASCII letter
|
||
now are errors.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Empty matches for the pattern are replaced when adjacent to a previous
|
||
non-empty match.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.subn">
|
||
<code class="descclassname">re.</code><code class="descname">subn</code><span class="sig-paren">(</span><em>pattern</em>, <em>repl</em>, <em>string</em>, <em>count=0</em>, <em>flags=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.subn" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Perform the same operation as <a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal notranslate"><span class="pre">sub()</span></code></a>, but return a tuple <code class="docutils literal notranslate"><span class="pre">(new_string,</span>
|
||
<span class="pre">number_of_subs_made)</span></code>.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.1: </span>Added the optional flags argument.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.5: </span>Unmatched groups are replaced with an empty string.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.escape">
|
||
<code class="descclassname">re.</code><code class="descname">escape</code><span class="sig-paren">(</span><em>pattern</em><span class="sig-paren">)</span><a class="headerlink" href="#re.escape" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Escape special characters in <em>pattern</em>.
|
||
This is useful if you want to match an arbitrary literal string that may
|
||
have regular expression metacharacters in it. For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">escape</span><span class="p">(</span><span class="s1">'python.exe'</span><span class="p">))</span>
|
||
<span class="go">python\.exe</span>
|
||
|
||
<span class="gp">>>> </span><span class="n">legal_chars</span> <span class="o">=</span> <span class="n">string</span><span class="o">.</span><span class="n">ascii_lowercase</span> <span class="o">+</span> <span class="n">string</span><span class="o">.</span><span class="n">digits</span> <span class="o">+</span> <span class="s2">"!#$%&'*+-.^_`|~:"</span>
|
||
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'[</span><span class="si">%s</span><span class="s1">]+'</span> <span class="o">%</span> <span class="n">re</span><span class="o">.</span><span class="n">escape</span><span class="p">(</span><span class="n">legal_chars</span><span class="p">))</span>
|
||
<span class="go">[abcdefghijklmnopqrstuvwxyz0123456789!\#\$%\&'\*\+\-\.\^_`\|\~:]+</span>
|
||
|
||
<span class="gp">>>> </span><span class="n">operators</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'+'</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">,</span> <span class="s1">'**'</span><span class="p">]</span>
|
||
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="s1">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">escape</span><span class="p">,</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">operators</span><span class="p">,</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">))))</span>
|
||
<span class="go">/|\-|\+|\*\*|\*</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>This functions must not be used for the replacement string in <a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal notranslate"><span class="pre">sub()</span></code></a>
|
||
and <a class="reference internal" href="#re.subn" title="re.subn"><code class="xref py py-func docutils literal notranslate"><span class="pre">subn()</span></code></a>, only backslashes should be escaped. For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">digits_re</span> <span class="o">=</span> <span class="sa">r</span><span class="s1">'\d+'</span>
|
||
<span class="gp">>>> </span><span class="n">sample</span> <span class="o">=</span> <span class="s1">'/usr/sbin/sendmail - 0 errors, 12 warnings'</span>
|
||
<span class="gp">>>> </span><span class="nb">print</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="n">digits_re</span><span class="p">,</span> <span class="n">digits_re</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="p">),</span> <span class="n">sample</span><span class="p">))</span>
|
||
<span class="go">/usr/sbin/sendmail - \d+ errors, \d+ warnings</span>
|
||
</pre></div>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.3: </span>The <code class="docutils literal notranslate"><span class="pre">'_'</span></code> character is no longer escaped.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Only characters that can have special meaning in a regular expression
|
||
are escaped.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="re.purge">
|
||
<code class="descclassname">re.</code><code class="descname">purge</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#re.purge" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Clear the regular expression cache.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="exception">
|
||
<dt id="re.error">
|
||
<em class="property">exception </em><code class="descclassname">re.</code><code class="descname">error</code><span class="sig-paren">(</span><em>msg</em>, <em>pattern=None</em>, <em>pos=None</em><span class="sig-paren">)</span><a class="headerlink" href="#re.error" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Exception raised when a string passed to one of the functions here is not a
|
||
valid regular expression (for example, it might contain unmatched parentheses)
|
||
or when some other error occurs during compilation or matching. It is never an
|
||
error if a string contains no match for a pattern. The error instance has
|
||
the following additional attributes:</p>
|
||
<dl class="attribute">
|
||
<dt id="re.error.msg">
|
||
<code class="descname">msg</code><a class="headerlink" href="#re.error.msg" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The unformatted error message.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.error.pattern">
|
||
<code class="descname">pattern</code><a class="headerlink" href="#re.error.pattern" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The regular expression pattern.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.error.pos">
|
||
<code class="descname">pos</code><a class="headerlink" href="#re.error.pos" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The index in <em>pattern</em> where compilation failed (may be <code class="docutils literal notranslate"><span class="pre">None</span></code>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.error.lineno">
|
||
<code class="descname">lineno</code><a class="headerlink" href="#re.error.lineno" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The line corresponding to <em>pos</em> (may be <code class="docutils literal notranslate"><span class="pre">None</span></code>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.error.colno">
|
||
<code class="descname">colno</code><a class="headerlink" href="#re.error.colno" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The column corresponding to <em>pos</em> (may be <code class="docutils literal notranslate"><span class="pre">None</span></code>).</p>
|
||
</dd></dl>
|
||
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.5: </span>Added additional attributes.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="regular-expression-objects">
|
||
<span id="re-objects"></span><h2>Regular Expression Objects<a class="headerlink" href="#regular-expression-objects" title="Permalink to this headline">¶</a></h2>
|
||
<p>Compiled regular expression objects support the following methods and
|
||
attributes:</p>
|
||
<dl class="method">
|
||
<dt id="re.Pattern.search">
|
||
<code class="descclassname">Pattern.</code><code class="descname">search</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.search" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Scan through <em>string</em> looking for the first location where this regular
|
||
expression produces a match, and return a corresponding <a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>. Return <code class="docutils literal notranslate"><span class="pre">None</span></code> if no position in the string matches the
|
||
pattern; note that this is different from finding a zero-length match at some
|
||
point in the string.</p>
|
||
<p>The optional second parameter <em>pos</em> gives an index in the string where the
|
||
search is to start; it defaults to <code class="docutils literal notranslate"><span class="pre">0</span></code>. This is not completely equivalent to
|
||
slicing the string; the <code class="docutils literal notranslate"><span class="pre">'^'</span></code> pattern character matches at the real beginning
|
||
of the string and at positions just after a newline, but not necessarily at the
|
||
index where the search is to start.</p>
|
||
<p>The optional parameter <em>endpos</em> limits how far the string will be searched; it
|
||
will be as if the string is <em>endpos</em> characters long, so only the characters
|
||
from <em>pos</em> to <code class="docutils literal notranslate"><span class="pre">endpos</span> <span class="pre">-</span> <span class="pre">1</span></code> will be searched for a match. If <em>endpos</em> is less
|
||
than <em>pos</em>, no match will be found; otherwise, if <em>rx</em> is a compiled regular
|
||
expression object, <code class="docutils literal notranslate"><span class="pre">rx.search(string,</span> <span class="pre">0,</span> <span class="pre">50)</span></code> is equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">rx.search(string[:50],</span> <span class="pre">0)</span></code>.</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"d"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">)</span> <span class="c1"># Match at index 0</span>
|
||
<span class="go"><re.Match object; span=(0, 1), match='d'></span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># No match; search doesn't include the "d"</span>
|
||
</pre></div>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.match">
|
||
<code class="descclassname">Pattern.</code><code class="descname">match</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.match" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>If zero or more characters at the <em>beginning</em> of <em>string</em> match this regular
|
||
expression, return a corresponding <a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>.
|
||
Return <code class="docutils literal notranslate"><span class="pre">None</span></code> if the string does not match the pattern; note that this is
|
||
different from a zero-length match.</p>
|
||
<p>The optional <em>pos</em> and <em>endpos</em> parameters have the same meaning as for the
|
||
<a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> method.</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"o"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">)</span> <span class="c1"># No match as "o" is not at the start of "dog".</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># Match as "o" is the 2nd character of "dog".</span>
|
||
<span class="go"><re.Match object; span=(1, 2), match='o'></span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If you want to locate a match anywhere in <em>string</em>, use
|
||
<a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> instead (see also <a class="reference internal" href="#search-vs-match"><span class="std std-ref">search() vs. match()</span></a>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.fullmatch">
|
||
<code class="descclassname">Pattern.</code><code class="descname">fullmatch</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.fullmatch" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>If the whole <em>string</em> matches this regular expression, return a corresponding
|
||
<a class="reference internal" href="#match-objects"><span class="std std-ref">match object</span></a>. Return <code class="docutils literal notranslate"><span class="pre">None</span></code> if the string does not
|
||
match the pattern; note that this is different from a zero-length match.</p>
|
||
<p>The optional <em>pos</em> and <em>endpos</em> parameters have the same meaning as for the
|
||
<a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> method.</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"o[gh]"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">)</span> <span class="c1"># No match as "o" is not at the start of "dog".</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="s2">"ogre"</span><span class="p">)</span> <span class="c1"># No match as not the full string matches.</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="s2">"doggie"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># Matches within given limits.</span>
|
||
<span class="go"><re.Match object; span=(1, 3), match='og'></span>
|
||
</pre></div>
|
||
</div>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.4.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.split">
|
||
<code class="descclassname">Pattern.</code><code class="descname">split</code><span class="sig-paren">(</span><em>string</em>, <em>maxsplit=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.split" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Identical to the <a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal notranslate"><span class="pre">split()</span></code></a> function, using the compiled pattern.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.findall">
|
||
<code class="descclassname">Pattern.</code><code class="descname">findall</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.findall" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Similar to the <a class="reference internal" href="#re.findall" title="re.findall"><code class="xref py py-func docutils literal notranslate"><span class="pre">findall()</span></code></a> function, using the compiled pattern, but
|
||
also accepts optional <em>pos</em> and <em>endpos</em> parameters that limit the search
|
||
region like for <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.finditer">
|
||
<code class="descclassname">Pattern.</code><code class="descname">finditer</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.finditer" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Similar to the <a class="reference internal" href="#re.finditer" title="re.finditer"><code class="xref py py-func docutils literal notranslate"><span class="pre">finditer()</span></code></a> function, using the compiled pattern, but
|
||
also accepts optional <em>pos</em> and <em>endpos</em> parameters that limit the search
|
||
region like for <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.sub">
|
||
<code class="descclassname">Pattern.</code><code class="descname">sub</code><span class="sig-paren">(</span><em>repl</em>, <em>string</em>, <em>count=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.sub" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Identical to the <a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal notranslate"><span class="pre">sub()</span></code></a> function, using the compiled pattern.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Pattern.subn">
|
||
<code class="descclassname">Pattern.</code><code class="descname">subn</code><span class="sig-paren">(</span><em>repl</em>, <em>string</em>, <em>count=0</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Pattern.subn" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Identical to the <a class="reference internal" href="#re.subn" title="re.subn"><code class="xref py py-func docutils literal notranslate"><span class="pre">subn()</span></code></a> function, using the compiled pattern.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Pattern.flags">
|
||
<code class="descclassname">Pattern.</code><code class="descname">flags</code><a class="headerlink" href="#re.Pattern.flags" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The regex matching flags. This is a combination of the flags given to
|
||
<a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal notranslate"><span class="pre">compile()</span></code></a>, any <code class="docutils literal notranslate"><span class="pre">(?...)</span></code> inline flags in the pattern, and implicit
|
||
flags such as <code class="xref py py-data docutils literal notranslate"><span class="pre">UNICODE</span></code> if the pattern is a Unicode string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Pattern.groups">
|
||
<code class="descclassname">Pattern.</code><code class="descname">groups</code><a class="headerlink" href="#re.Pattern.groups" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The number of capturing groups in the pattern.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Pattern.groupindex">
|
||
<code class="descclassname">Pattern.</code><code class="descname">groupindex</code><a class="headerlink" href="#re.Pattern.groupindex" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>A dictionary mapping any symbolic group names defined by <code class="docutils literal notranslate"><span class="pre">(?P<id>)</span></code> to group
|
||
numbers. The dictionary is empty if no symbolic groups were used in the
|
||
pattern.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Pattern.pattern">
|
||
<code class="descclassname">Pattern.</code><code class="descname">pattern</code><a class="headerlink" href="#re.Pattern.pattern" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The pattern string from which the pattern object was compiled.</p>
|
||
</dd></dl>
|
||
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Added support of <a class="reference internal" href="copy.html#copy.copy" title="copy.copy"><code class="xref py py-func docutils literal notranslate"><span class="pre">copy.copy()</span></code></a> and <a class="reference internal" href="copy.html#copy.deepcopy" title="copy.deepcopy"><code class="xref py py-func docutils literal notranslate"><span class="pre">copy.deepcopy()</span></code></a>. Compiled
|
||
regular expression objects are considered atomic.</p>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="match-objects">
|
||
<span id="id2"></span><h2>Match Objects<a class="headerlink" href="#match-objects" title="Permalink to this headline">¶</a></h2>
|
||
<p>Match objects always have a boolean value of <code class="docutils literal notranslate"><span class="pre">True</span></code>.
|
||
Since <a class="reference internal" href="#re.Pattern.match" title="re.Pattern.match"><code class="xref py py-meth docutils literal notranslate"><span class="pre">match()</span></code></a> and <a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> return <code class="docutils literal notranslate"><span class="pre">None</span></code>
|
||
when there is no match, you can test whether there was a match with a simple
|
||
<code class="docutils literal notranslate"><span class="pre">if</span></code> statement:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
|
||
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
|
||
<span class="n">process</span><span class="p">(</span><span class="n">match</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Match objects support the following methods and attributes:</p>
|
||
<dl class="method">
|
||
<dt id="re.Match.expand">
|
||
<code class="descclassname">Match.</code><code class="descname">expand</code><span class="sig-paren">(</span><em>template</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.expand" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the string obtained by doing backslash substitution on the template
|
||
string <em>template</em>, as done by the <a class="reference internal" href="#re.Pattern.sub" title="re.Pattern.sub"><code class="xref py py-meth docutils literal notranslate"><span class="pre">sub()</span></code></a> method.
|
||
Escapes such as <code class="docutils literal notranslate"><span class="pre">\n</span></code> are converted to the appropriate characters,
|
||
and numeric backreferences (<code class="docutils literal notranslate"><span class="pre">\1</span></code>, <code class="docutils literal notranslate"><span class="pre">\2</span></code>) and named backreferences
|
||
(<code class="docutils literal notranslate"><span class="pre">\g<1></span></code>, <code class="docutils literal notranslate"><span class="pre">\g<name></span></code>) are replaced by the contents of the
|
||
corresponding group.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.5: </span>Unmatched groups are replaced with an empty string.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Match.group">
|
||
<code class="descclassname">Match.</code><code class="descname">group</code><span class="sig-paren">(</span><span class="optional">[</span><em>group1</em>, <em>...</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.group" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns one or more subgroups of the match. If there is a single argument, the
|
||
result is a single string; if there are multiple arguments, the result is a
|
||
tuple with one item per argument. Without arguments, <em>group1</em> defaults to zero
|
||
(the whole match is returned). If a <em>groupN</em> argument is zero, the corresponding
|
||
return value is the entire matching string; if it is in the inclusive range
|
||
[1..99], it is the string matching the corresponding parenthesized group. If a
|
||
group number is negative or larger than the number of groups defined in the
|
||
pattern, an <a class="reference internal" href="exceptions.html#IndexError" title="IndexError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">IndexError</span></code></a> exception is raised. If a group is contained in a
|
||
part of the pattern that did not match, the corresponding result is <code class="docutils literal notranslate"><span class="pre">None</span></code>.
|
||
If a group is contained in a part of the pattern that matched multiple times,
|
||
the last match is returned.</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(\w+) (\w+)"</span><span class="p">,</span> <span class="s2">"Isaac Newton, physicist"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># The entire match</span>
|
||
<span class="go">'Isaac Newton'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># The first parenthesized subgroup.</span>
|
||
<span class="go">'Isaac'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># The second parenthesized subgroup.</span>
|
||
<span class="go">'Newton'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="c1"># Multiple arguments give us a tuple.</span>
|
||
<span class="go">('Isaac', 'Newton')</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If the regular expression uses the <code class="docutils literal notranslate"><span class="pre">(?P<name>...)</span></code> syntax, the <em>groupN</em>
|
||
arguments may also be strings identifying groups by their group name. If a
|
||
string argument is not used as a group name in the pattern, an <a class="reference internal" href="exceptions.html#IndexError" title="IndexError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">IndexError</span></code></a>
|
||
exception is raised.</p>
|
||
<p>A moderately complicated example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(?P<first_name>\w+) (?P<last_name>\w+)"</span><span class="p">,</span> <span class="s2">"Malcolm Reynolds"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'first_name'</span><span class="p">)</span>
|
||
<span class="go">'Malcolm'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'last_name'</span><span class="p">)</span>
|
||
<span class="go">'Reynolds'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Named groups can also be referred to by their index:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">'Malcolm'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
|
||
<span class="go">'Reynolds'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If a group matches multiple times, only the last match is accessible:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(..)+"</span><span class="p">,</span> <span class="s2">"a1b2c3"</span><span class="p">)</span> <span class="c1"># Matches 3 times.</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Returns only the last match.</span>
|
||
<span class="go">'c3'</span>
|
||
</pre></div>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Match.__getitem__">
|
||
<code class="descclassname">Match.</code><code class="descname">__getitem__</code><span class="sig-paren">(</span><em>g</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.__getitem__" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>This is identical to <code class="docutils literal notranslate"><span class="pre">m.group(g)</span></code>. This allows easier access to
|
||
an individual group from a match:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(\w+) (\w+)"</span><span class="p">,</span> <span class="s2">"Isaac Newton, physicist"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># The entire match</span>
|
||
<span class="go">'Isaac Newton'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="c1"># The first parenthesized subgroup.</span>
|
||
<span class="go">'Isaac'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="c1"># The second parenthesized subgroup.</span>
|
||
<span class="go">'Newton'</span>
|
||
</pre></div>
|
||
</div>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.6.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Match.groups">
|
||
<code class="descclassname">Match.</code><code class="descname">groups</code><span class="sig-paren">(</span><em>default=None</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.groups" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a tuple containing all the subgroups of the match, from 1 up to however
|
||
many groups are in the pattern. The <em>default</em> argument is used for groups that
|
||
did not participate in the match; it defaults to <code class="docutils literal notranslate"><span class="pre">None</span></code>.</p>
|
||
<p>For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(\d+)\.(\d+)"</span><span class="p">,</span> <span class="s2">"24.1632"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groups</span><span class="p">()</span>
|
||
<span class="go">('24', '1632')</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If we make the decimal place and everything after it optional, not all groups
|
||
might participate in the match. These groups will default to <code class="docutils literal notranslate"><span class="pre">None</span></code> unless
|
||
the <em>default</em> argument is given:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(\d+)\.?(\d+)?"</span><span class="p">,</span> <span class="s2">"24"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groups</span><span class="p">()</span> <span class="c1"># Second group defaults to None.</span>
|
||
<span class="go">('24', None)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groups</span><span class="p">(</span><span class="s1">'0'</span><span class="p">)</span> <span class="c1"># Now, the second group defaults to '0'.</span>
|
||
<span class="go">('24', '0')</span>
|
||
</pre></div>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Match.groupdict">
|
||
<code class="descclassname">Match.</code><code class="descname">groupdict</code><span class="sig-paren">(</span><em>default=None</em><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.groupdict" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a dictionary containing all the <em>named</em> subgroups of the match, keyed by
|
||
the subgroup name. The <em>default</em> argument is used for groups that did not
|
||
participate in the match; it defaults to <code class="docutils literal notranslate"><span class="pre">None</span></code>. For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(?P<first_name>\w+) (?P<last_name>\w+)"</span><span class="p">,</span> <span class="s2">"Malcolm Reynolds"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groupdict</span><span class="p">()</span>
|
||
<span class="go">{'first_name': 'Malcolm', 'last_name': 'Reynolds'}</span>
|
||
</pre></div>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Match.start">
|
||
<code class="descclassname">Match.</code><code class="descname">start</code><span class="sig-paren">(</span><span class="optional">[</span><em>group</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.start" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="re.Match.end">
|
||
<code class="descclassname">Match.</code><code class="descname">end</code><span class="sig-paren">(</span><span class="optional">[</span><em>group</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.end" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the indices of the start and end of the substring matched by <em>group</em>;
|
||
<em>group</em> defaults to zero (meaning the whole matched substring). Return <code class="docutils literal notranslate"><span class="pre">-1</span></code> if
|
||
<em>group</em> exists but did not contribute to the match. For a match object <em>m</em>, and
|
||
a group <em>g</em> that did contribute to the match, the substring matched by group <em>g</em>
|
||
(equivalent to <code class="docutils literal notranslate"><span class="pre">m.group(g)</span></code>) is</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">m</span><span class="o">.</span><span class="n">string</span><span class="p">[</span><span class="n">m</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">g</span><span class="p">):</span><span class="n">m</span><span class="o">.</span><span class="n">end</span><span class="p">(</span><span class="n">g</span><span class="p">)]</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Note that <code class="docutils literal notranslate"><span class="pre">m.start(group)</span></code> will equal <code class="docutils literal notranslate"><span class="pre">m.end(group)</span></code> if <em>group</em> matched a
|
||
null string. For example, after <code class="docutils literal notranslate"><span class="pre">m</span> <span class="pre">=</span> <span class="pre">re.search('b(c?)',</span> <span class="pre">'cba')</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">m.start(0)</span></code> is 1, <code class="docutils literal notranslate"><span class="pre">m.end(0)</span></code> is 2, <code class="docutils literal notranslate"><span class="pre">m.start(1)</span></code> and <code class="docutils literal notranslate"><span class="pre">m.end(1)</span></code> are both
|
||
2, and <code class="docutils literal notranslate"><span class="pre">m.start(2)</span></code> raises an <a class="reference internal" href="exceptions.html#IndexError" title="IndexError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">IndexError</span></code></a> exception.</p>
|
||
<p>An example that will remove <em>remove_this</em> from email addresses:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">email</span> <span class="o">=</span> <span class="s2">"tony@tiremove_thisger.net"</span>
|
||
<span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"remove_this"</span><span class="p">,</span> <span class="n">email</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">email</span><span class="p">[:</span><span class="n">m</span><span class="o">.</span><span class="n">start</span><span class="p">()]</span> <span class="o">+</span> <span class="n">email</span><span class="p">[</span><span class="n">m</span><span class="o">.</span><span class="n">end</span><span class="p">():]</span>
|
||
<span class="go">'tony@tiger.net'</span>
|
||
</pre></div>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="method">
|
||
<dt id="re.Match.span">
|
||
<code class="descclassname">Match.</code><code class="descname">span</code><span class="sig-paren">(</span><span class="optional">[</span><em>group</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#re.Match.span" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>For a match <em>m</em>, return the 2-tuple <code class="docutils literal notranslate"><span class="pre">(m.start(group),</span> <span class="pre">m.end(group))</span></code>. Note
|
||
that if <em>group</em> did not contribute to the match, this is <code class="docutils literal notranslate"><span class="pre">(-1,</span> <span class="pre">-1)</span></code>.
|
||
<em>group</em> defaults to zero, the entire match.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Match.pos">
|
||
<code class="descclassname">Match.</code><code class="descname">pos</code><a class="headerlink" href="#re.Match.pos" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The value of <em>pos</em> which was passed to the <a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> or
|
||
<a class="reference internal" href="#re.Pattern.match" title="re.Pattern.match"><code class="xref py py-meth docutils literal notranslate"><span class="pre">match()</span></code></a> method of a <a class="reference internal" href="#re-objects"><span class="std std-ref">regex object</span></a>. This is
|
||
the index into the string at which the RE engine started looking for a match.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Match.endpos">
|
||
<code class="descclassname">Match.</code><code class="descname">endpos</code><a class="headerlink" href="#re.Match.endpos" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The value of <em>endpos</em> which was passed to the <a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> or
|
||
<a class="reference internal" href="#re.Pattern.match" title="re.Pattern.match"><code class="xref py py-meth docutils literal notranslate"><span class="pre">match()</span></code></a> method of a <a class="reference internal" href="#re-objects"><span class="std std-ref">regex object</span></a>. This is
|
||
the index into the string beyond which the RE engine will not go.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Match.lastindex">
|
||
<code class="descclassname">Match.</code><code class="descname">lastindex</code><a class="headerlink" href="#re.Match.lastindex" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The integer index of the last matched capturing group, or <code class="docutils literal notranslate"><span class="pre">None</span></code> if no group
|
||
was matched at all. For example, the expressions <code class="docutils literal notranslate"><span class="pre">(a)b</span></code>, <code class="docutils literal notranslate"><span class="pre">((a)(b))</span></code>, and
|
||
<code class="docutils literal notranslate"><span class="pre">((ab))</span></code> will have <code class="docutils literal notranslate"><span class="pre">lastindex</span> <span class="pre">==</span> <span class="pre">1</span></code> if applied to the string <code class="docutils literal notranslate"><span class="pre">'ab'</span></code>, while
|
||
the expression <code class="docutils literal notranslate"><span class="pre">(a)(b)</span></code> will have <code class="docutils literal notranslate"><span class="pre">lastindex</span> <span class="pre">==</span> <span class="pre">2</span></code>, if applied to the same
|
||
string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Match.lastgroup">
|
||
<code class="descclassname">Match.</code><code class="descname">lastgroup</code><a class="headerlink" href="#re.Match.lastgroup" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The name of the last matched capturing group, or <code class="docutils literal notranslate"><span class="pre">None</span></code> if the group didn’t
|
||
have a name, or if no group was matched at all.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Match.re">
|
||
<code class="descclassname">Match.</code><code class="descname">re</code><a class="headerlink" href="#re.Match.re" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The <a class="reference internal" href="#re-objects"><span class="std std-ref">regular expression object</span></a> whose <a class="reference internal" href="#re.Pattern.match" title="re.Pattern.match"><code class="xref py py-meth docutils literal notranslate"><span class="pre">match()</span></code></a> or
|
||
<a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a> method produced this match instance.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="attribute">
|
||
<dt id="re.Match.string">
|
||
<code class="descclassname">Match.</code><code class="descname">string</code><a class="headerlink" href="#re.Match.string" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The string passed to <a class="reference internal" href="#re.Pattern.match" title="re.Pattern.match"><code class="xref py py-meth docutils literal notranslate"><span class="pre">match()</span></code></a> or <a class="reference internal" href="#re.Pattern.search" title="re.Pattern.search"><code class="xref py py-meth docutils literal notranslate"><span class="pre">search()</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Added support of <a class="reference internal" href="copy.html#copy.copy" title="copy.copy"><code class="xref py py-func docutils literal notranslate"><span class="pre">copy.copy()</span></code></a> and <a class="reference internal" href="copy.html#copy.deepcopy" title="copy.deepcopy"><code class="xref py py-func docutils literal notranslate"><span class="pre">copy.deepcopy()</span></code></a>. Match objects
|
||
are considered atomic.</p>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="regular-expression-examples">
|
||
<span id="re-examples"></span><h2>Regular Expression Examples<a class="headerlink" href="#regular-expression-examples" title="Permalink to this headline">¶</a></h2>
|
||
<div class="section" id="checking-for-a-pair">
|
||
<h3>Checking for a Pair<a class="headerlink" href="#checking-for-a-pair" title="Permalink to this headline">¶</a></h3>
|
||
<p>In this example, we’ll use the following helper function to display match
|
||
objects a little more gracefully:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">displaymatch</span><span class="p">(</span><span class="n">match</span><span class="p">):</span>
|
||
<span class="k">if</span> <span class="n">match</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||
<span class="k">return</span> <span class="kc">None</span>
|
||
<span class="k">return</span> <span class="s1">'<Match: </span><span class="si">%r</span><span class="s1">, groups=</span><span class="si">%r</span><span class="s1">>'</span> <span class="o">%</span> <span class="p">(</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(),</span> <span class="n">match</span><span class="o">.</span><span class="n">groups</span><span class="p">())</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Suppose you are writing a poker program where a player’s hand is represented as
|
||
a 5-character string with each character representing a card, “a” for ace, “k”
|
||
for king, “q” for queen, “j” for jack, “t” for 10, and “2” through “9”
|
||
representing the card with that value.</p>
|
||
<p>To see if a given string is a valid hand, one could do the following:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">valid</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"^[a2-9tjqk]</span><span class="si">{5}</span><span class="s2">$"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"akt5q"</span><span class="p">))</span> <span class="c1"># Valid.</span>
|
||
<span class="go">"<Match: 'akt5q', groups=()>"</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"akt5e"</span><span class="p">))</span> <span class="c1"># Invalid.</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"akt"</span><span class="p">))</span> <span class="c1"># Invalid.</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"727ak"</span><span class="p">))</span> <span class="c1"># Valid.</span>
|
||
<span class="go">"<Match: '727ak', groups=()>"</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>That last hand, <code class="docutils literal notranslate"><span class="pre">"727ak"</span></code>, contained a pair, or two of the same valued cards.
|
||
To match this with a regular expression, one could use backreferences as such:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">pair</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">".*(.).*\1"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"717ak"</span><span class="p">))</span> <span class="c1"># Pair of 7s.</span>
|
||
<span class="go">"<Match: '717', groups=('7',)>"</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"718ak"</span><span class="p">))</span> <span class="c1"># No pairs.</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"354aa"</span><span class="p">))</span> <span class="c1"># Pair of aces.</span>
|
||
<span class="go">"<Match: '354aa', groups=('a',)>"</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>To find out what card the pair consists of, one could use the
|
||
<a class="reference internal" href="#re.Match.group" title="re.Match.group"><code class="xref py py-meth docutils literal notranslate"><span class="pre">group()</span></code></a> method of the match object in the following manner:</p>
|
||
<div class="highlight-pycon3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"717ak"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">'7'</span>
|
||
|
||
<span class="go"># Error because re.match() returns None, which doesn't have a group() method:</span>
|
||
<span class="gp">>>> </span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"718ak"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="gt">Traceback (most recent call last):</span>
|
||
File <span class="nb">"<pyshell#23>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
|
||
<span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">".*(.).*\1"</span><span class="p">,</span> <span class="s2">"718ak"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="gr">AttributeError</span>: <span class="n">'NoneType' object has no attribute 'group'</span>
|
||
|
||
<span class="gp">>>> </span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"354aa"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">'a'</span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="simulating-scanf">
|
||
<h3>Simulating scanf()<a class="headerlink" href="#simulating-scanf" title="Permalink to this headline">¶</a></h3>
|
||
<p id="index-38">Python does not currently have an equivalent to <code class="xref c c-func docutils literal notranslate"><span class="pre">scanf()</span></code>. Regular
|
||
expressions are generally more powerful, though also more verbose, than
|
||
<code class="xref c c-func docutils literal notranslate"><span class="pre">scanf()</span></code> format strings. The table below offers some more-or-less
|
||
equivalent mappings between <code class="xref c c-func docutils literal notranslate"><span class="pre">scanf()</span></code> format tokens and regular
|
||
expressions.</p>
|
||
<table class="docutils align-center">
|
||
<colgroup>
|
||
<col style="width: 42%" />
|
||
<col style="width: 58%" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr class="row-odd"><th class="head"><p><code class="xref c c-func docutils literal notranslate"><span class="pre">scanf()</span></code> Token</p></th>
|
||
<th class="head"><p>Regular Expression</p></th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">%c</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">.</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">%5c</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">.{5}</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">%d</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">[-+]?\d+</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">%e</span></code>, <code class="docutils literal notranslate"><span class="pre">%E</span></code>, <code class="docutils literal notranslate"><span class="pre">%f</span></code>, <code class="docutils literal notranslate"><span class="pre">%g</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">%i</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">%o</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">[-+]?[0-7]+</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">%s</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">\S+</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">%u</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">\d+</span></code></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">%x</span></code>, <code class="docutils literal notranslate"><span class="pre">%X</span></code></p></td>
|
||
<td><p><code class="docutils literal notranslate"><span class="pre">[-+]?(0[xX])?[\dA-Fa-f]+</span></code></p></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>To extract the filename and numbers from a string like</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">sbin</span><span class="o">/</span><span class="n">sendmail</span> <span class="o">-</span> <span class="mi">0</span> <span class="n">errors</span><span class="p">,</span> <span class="mi">4</span> <span class="n">warnings</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>you would use a <code class="xref c c-func docutils literal notranslate"><span class="pre">scanf()</span></code> format like</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="o">%</span><span class="n">s</span> <span class="o">-</span> <span class="o">%</span><span class="n">d</span> <span class="n">errors</span><span class="p">,</span> <span class="o">%</span><span class="n">d</span> <span class="n">warnings</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>The equivalent regular expression would be</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="p">(</span>\<span class="n">S</span><span class="o">+</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span>\<span class="n">d</span><span class="o">+</span><span class="p">)</span> <span class="n">errors</span><span class="p">,</span> <span class="p">(</span>\<span class="n">d</span><span class="o">+</span><span class="p">)</span> <span class="n">warnings</span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="search-vs-match">
|
||
<span id="id3"></span><h3>search() vs. match()<a class="headerlink" href="#search-vs-match" title="Permalink to this headline">¶</a></h3>
|
||
<p>Python offers two different primitive operations based on regular expressions:
|
||
<a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal notranslate"><span class="pre">re.match()</span></code></a> checks for a match only at the beginning of the string, while
|
||
<a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">re.search()</span></code></a> checks for a match anywhere in the string (this is what Perl
|
||
does by default).</p>
|
||
<p>For example:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># Match</span>
|
||
<span class="go"><re.Match object; span=(2, 3), match='c'></span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Regular expressions beginning with <code class="docutils literal notranslate"><span class="pre">'^'</span></code> can be used with <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">search()</span></code></a> to
|
||
restrict the match at the beginning of the string:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"^c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"^a"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># Match</span>
|
||
<span class="go"><re.Match object; span=(0, 1), match='a'></span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Note however that in <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal notranslate"><span class="pre">MULTILINE</span></code></a> mode <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal notranslate"><span class="pre">match()</span></code></a> only matches at the
|
||
beginning of the string, whereas using <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">search()</span></code></a> with a regular expression
|
||
beginning with <code class="docutils literal notranslate"><span class="pre">'^'</span></code> will match at the beginning of each line.</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s1">'X'</span><span class="p">,</span> <span class="s1">'A</span><span class="se">\n</span><span class="s1">B</span><span class="se">\n</span><span class="s1">X'</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">MULTILINE</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'^X'</span><span class="p">,</span> <span class="s1">'A</span><span class="se">\n</span><span class="s1">B</span><span class="se">\n</span><span class="s1">X'</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">MULTILINE</span><span class="p">)</span> <span class="c1"># Match</span>
|
||
<span class="go"><re.Match object; span=(4, 5), match='X'></span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="making-a-phonebook">
|
||
<h3>Making a Phonebook<a class="headerlink" href="#making-a-phonebook" title="Permalink to this headline">¶</a></h3>
|
||
<p><a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal notranslate"><span class="pre">split()</span></code></a> splits a string into a list delimited by the passed pattern. The
|
||
method is invaluable for converting textual data into data structures that can be
|
||
easily read and modified by Python as demonstrated in the following example that
|
||
creates a phonebook.</p>
|
||
<p>First, here is the input. Normally it may come from a file, here we are using
|
||
triple-quoted string syntax:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"""Ross McFluff: 834.345.1254 155 Elm Street</span>
|
||
<span class="gp">...</span><span class="s2"></span>
|
||
<span class="gp">... </span><span class="s2">Ronald Heathmore: 892.345.3428 436 Finley Avenue</span>
|
||
<span class="gp">... </span><span class="s2">Frank Burger: 925.541.7625 662 South Dogwood Way</span>
|
||
<span class="gp">...</span><span class="s2"></span>
|
||
<span class="gp">...</span><span class="s2"></span>
|
||
<span class="gp">... </span><span class="s2">Heather Albrecht: 548.326.4584 919 Park Place"""</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>The entries are separated by one or more newlines. Now we convert the string
|
||
into a list with each nonempty line having its own entry:</p>
|
||
<div class="highlight-pycon3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">entries</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">+"</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">entries</span>
|
||
<span class="go">['Ross McFluff: 834.345.1254 155 Elm Street',</span>
|
||
<span class="go">'Ronald Heathmore: 892.345.3428 436 Finley Avenue',</span>
|
||
<span class="go">'Frank Burger: 925.541.7625 662 South Dogwood Way',</span>
|
||
<span class="go">'Heather Albrecht: 548.326.4584 919 Park Place']</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>Finally, split each entry into a list with first name, last name, telephone
|
||
number, and address. We use the <code class="docutils literal notranslate"><span class="pre">maxsplit</span></code> parameter of <a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal notranslate"><span class="pre">split()</span></code></a>
|
||
because the address has spaces, our splitting pattern, in it:</p>
|
||
<div class="highlight-pycon3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":? "</span><span class="p">,</span> <span class="n">entry</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">]</span>
|
||
<span class="go">[['Ross', 'McFluff', '834.345.1254', '155 Elm Street'],</span>
|
||
<span class="go">['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'],</span>
|
||
<span class="go">['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'],</span>
|
||
<span class="go">['Heather', 'Albrecht', '548.326.4584', '919 Park Place']]</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>The <code class="docutils literal notranslate"><span class="pre">:?</span></code> pattern matches the colon after the last name, so that it does not
|
||
occur in the result list. With a <code class="docutils literal notranslate"><span class="pre">maxsplit</span></code> of <code class="docutils literal notranslate"><span class="pre">4</span></code>, we could separate the
|
||
house number from the street name:</p>
|
||
<div class="highlight-pycon3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":? "</span><span class="p">,</span> <span class="n">entry</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">]</span>
|
||
<span class="go">[['Ross', 'McFluff', '834.345.1254', '155', 'Elm Street'],</span>
|
||
<span class="go">['Ronald', 'Heathmore', '892.345.3428', '436', 'Finley Avenue'],</span>
|
||
<span class="go">['Frank', 'Burger', '925.541.7625', '662', 'South Dogwood Way'],</span>
|
||
<span class="go">['Heather', 'Albrecht', '548.326.4584', '919', 'Park Place']]</span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="text-munging">
|
||
<h3>Text Munging<a class="headerlink" href="#text-munging" title="Permalink to this headline">¶</a></h3>
|
||
<p><a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal notranslate"><span class="pre">sub()</span></code></a> replaces every occurrence of a pattern with a string or the
|
||
result of a function. This example demonstrates using <a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal notranslate"><span class="pre">sub()</span></code></a> with
|
||
a function to “munge” text, or randomize the order of all the characters
|
||
in each word of a sentence except for the first and last characters:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">repl</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
|
||
<span class="gp">... </span> <span class="n">inner_word</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
|
||
<span class="gp">... </span> <span class="n">random</span><span class="o">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">inner_word</span><span class="p">)</span>
|
||
<span class="gp">... </span> <span class="k">return</span> <span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="s2">""</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">inner_word</span><span class="p">)</span> <span class="o">+</span> <span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"Professor Abdolmalek, please report your absences promptly."</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(\w)(\w+)(\w)"</span><span class="p">,</span> <span class="n">repl</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="go">'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s2">"(\w)(\w+)(\w)"</span><span class="p">,</span> <span class="n">repl</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="go">'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'</span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="finding-all-adverbs">
|
||
<h3>Finding all Adverbs<a class="headerlink" href="#finding-all-adverbs" title="Permalink to this headline">¶</a></h3>
|
||
<p><a class="reference internal" href="#re.findall" title="re.findall"><code class="xref py py-func docutils literal notranslate"><span class="pre">findall()</span></code></a> matches <em>all</em> occurrences of a pattern, not just the first
|
||
one as <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal notranslate"><span class="pre">search()</span></code></a> does. For example, if a writer wanted to
|
||
find all of the adverbs in some text, they might use <a class="reference internal" href="#re.findall" title="re.findall"><code class="xref py py-func docutils literal notranslate"><span class="pre">findall()</span></code></a> in
|
||
the following manner:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"He was carefully disguised but captured quickly by police."</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\w+ly"</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="go">['carefully', 'quickly']</span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="finding-all-adverbs-and-their-positions">
|
||
<h3>Finding all Adverbs and their Positions<a class="headerlink" href="#finding-all-adverbs-and-their-positions" title="Permalink to this headline">¶</a></h3>
|
||
<p>If one wants more information about all matches of a pattern than the matched
|
||
text, <a class="reference internal" href="#re.finditer" title="re.finditer"><code class="xref py py-func docutils literal notranslate"><span class="pre">finditer()</span></code></a> is useful as it provides <a class="reference internal" href="#match-objects"><span class="std std-ref">match objects</span></a> instead of strings. Continuing with the previous example, if
|
||
a writer wanted to find all of the adverbs <em>and their positions</em> in
|
||
some text, they would use <a class="reference internal" href="#re.finditer" title="re.finditer"><code class="xref py py-func docutils literal notranslate"><span class="pre">finditer()</span></code></a> in the following manner:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"He was carefully disguised but captured quickly by police."</span>
|
||
<span class="gp">>>> </span><span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\w+ly"</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
|
||
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="si">%02d</span><span class="s1">-</span><span class="si">%02d</span><span class="s1">: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">m</span><span class="o">.</span><span class="n">start</span><span class="p">(),</span> <span class="n">m</span><span class="o">.</span><span class="n">end</span><span class="p">(),</span> <span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)))</span>
|
||
<span class="go">07-16: carefully</span>
|
||
<span class="go">40-47: quickly</span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="raw-string-notation">
|
||
<h3>Raw String Notation<a class="headerlink" href="#raw-string-notation" title="Permalink to this headline">¶</a></h3>
|
||
<p>Raw string notation (<code class="docutils literal notranslate"><span class="pre">r"text"</span></code>) keeps regular expressions sane. Without it,
|
||
every backslash (<code class="docutils literal notranslate"><span class="pre">'\'</span></code>) in a regular expression would have to be prefixed with
|
||
another one to escape it. For example, the two following lines of code are
|
||
functionally identical:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\W(.)\1\W"</span><span class="p">,</span> <span class="s2">" ff "</span><span class="p">)</span>
|
||
<span class="go"><re.Match object; span=(0, 4), match=' ff '></span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"</span><span class="se">\\</span><span class="s2">W(.)</span><span class="se">\\</span><span class="s2">1</span><span class="se">\\</span><span class="s2">W"</span><span class="p">,</span> <span class="s2">" ff "</span><span class="p">)</span>
|
||
<span class="go"><re.Match object; span=(0, 4), match=' ff '></span>
|
||
</pre></div>
|
||
</div>
|
||
<p>When one wants to match a literal backslash, it must be escaped in the regular
|
||
expression. With raw string notation, this means <code class="docutils literal notranslate"><span class="pre">r"\\"</span></code>. Without raw string
|
||
notation, one must use <code class="docutils literal notranslate"><span class="pre">"\\\\"</span></code>, making the following lines of code
|
||
functionally identical:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="sa">r</span><span class="s2">"</span><span class="se">\\</span><span class="s2">"</span><span class="p">,</span> <span class="sa">r</span><span class="s2">"</span><span class="se">\\</span><span class="s2">"</span><span class="p">)</span>
|
||
<span class="go"><re.Match object; span=(0, 1), match='\\'></span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"</span><span class="se">\\\\</span><span class="s2">"</span><span class="p">,</span> <span class="sa">r</span><span class="s2">"</span><span class="se">\\</span><span class="s2">"</span><span class="p">)</span>
|
||
<span class="go"><re.Match object; span=(0, 1), match='\\'></span>
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="writing-a-tokenizer">
|
||
<h3>Writing a Tokenizer<a class="headerlink" href="#writing-a-tokenizer" title="Permalink to this headline">¶</a></h3>
|
||
<p>A <a class="reference external" href="https://en.wikipedia.org/wiki/Lexical_analysis">tokenizer or scanner</a>
|
||
analyzes a string to categorize groups of characters. This is a useful first
|
||
step in writing a compiler or interpreter.</p>
|
||
<p>The text categories are specified with regular expressions. The technique is
|
||
to combine those into a single master regular expression and to loop over
|
||
successive matches:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">collections</span>
|
||
<span class="kn">import</span> <span class="nn">re</span>
|
||
|
||
<span class="n">Token</span> <span class="o">=</span> <span class="n">collections</span><span class="o">.</span><span class="n">namedtuple</span><span class="p">(</span><span class="s1">'Token'</span><span class="p">,</span> <span class="p">[</span><span class="s1">'type'</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">,</span> <span class="s1">'line'</span><span class="p">,</span> <span class="s1">'column'</span><span class="p">])</span>
|
||
|
||
<span class="k">def</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">code</span><span class="p">):</span>
|
||
<span class="n">keywords</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'IF'</span><span class="p">,</span> <span class="s1">'THEN'</span><span class="p">,</span> <span class="s1">'ENDIF'</span><span class="p">,</span> <span class="s1">'FOR'</span><span class="p">,</span> <span class="s1">'NEXT'</span><span class="p">,</span> <span class="s1">'GOSUB'</span><span class="p">,</span> <span class="s1">'RETURN'</span><span class="p">}</span>
|
||
<span class="n">token_specification</span> <span class="o">=</span> <span class="p">[</span>
|
||
<span class="p">(</span><span class="s1">'NUMBER'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'\d+(\.\d*)?'</span><span class="p">),</span> <span class="c1"># Integer or decimal number</span>
|
||
<span class="p">(</span><span class="s1">'ASSIGN'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">':='</span><span class="p">),</span> <span class="c1"># Assignment operator</span>
|
||
<span class="p">(</span><span class="s1">'END'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">';'</span><span class="p">),</span> <span class="c1"># Statement terminator</span>
|
||
<span class="p">(</span><span class="s1">'ID'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'[A-Za-z]+'</span><span class="p">),</span> <span class="c1"># Identifiers</span>
|
||
<span class="p">(</span><span class="s1">'OP'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'[+\-*/]'</span><span class="p">),</span> <span class="c1"># Arithmetic operators</span>
|
||
<span class="p">(</span><span class="s1">'NEWLINE'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'\n'</span><span class="p">),</span> <span class="c1"># Line endings</span>
|
||
<span class="p">(</span><span class="s1">'SKIP'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'[ \t]+'</span><span class="p">),</span> <span class="c1"># Skip over spaces and tabs</span>
|
||
<span class="p">(</span><span class="s1">'MISMATCH'</span><span class="p">,</span> <span class="sa">r</span><span class="s1">'.'</span><span class="p">),</span> <span class="c1"># Any other character</span>
|
||
<span class="p">]</span>
|
||
<span class="n">tok_regex</span> <span class="o">=</span> <span class="s1">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s1">'(?P<</span><span class="si">%s</span><span class="s1">></span><span class="si">%s</span><span class="s1">)'</span> <span class="o">%</span> <span class="n">pair</span> <span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">token_specification</span><span class="p">)</span>
|
||
<span class="n">line_num</span> <span class="o">=</span> <span class="mi">1</span>
|
||
<span class="n">line_start</span> <span class="o">=</span> <span class="mi">0</span>
|
||
<span class="k">for</span> <span class="n">mo</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">tok_regex</span><span class="p">,</span> <span class="n">code</span><span class="p">):</span>
|
||
<span class="n">kind</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">lastgroup</span>
|
||
<span class="n">value</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
|
||
<span class="n">column</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">start</span><span class="p">()</span> <span class="o">-</span> <span class="n">line_start</span>
|
||
<span class="k">if</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'NUMBER'</span><span class="p">:</span>
|
||
<span class="n">value</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="k">if</span> <span class="s1">'.'</span> <span class="ow">in</span> <span class="n">value</span> <span class="k">else</span> <span class="nb">int</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
|
||
<span class="k">elif</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'ID'</span> <span class="ow">and</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">keywords</span><span class="p">:</span>
|
||
<span class="n">kind</span> <span class="o">=</span> <span class="n">value</span>
|
||
<span class="k">elif</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'NEWLINE'</span><span class="p">:</span>
|
||
<span class="n">line_start</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">end</span><span class="p">()</span>
|
||
<span class="n">line_num</span> <span class="o">+=</span> <span class="mi">1</span>
|
||
<span class="k">continue</span>
|
||
<span class="k">elif</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'SKIP'</span><span class="p">:</span>
|
||
<span class="k">continue</span>
|
||
<span class="k">elif</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'MISMATCH'</span><span class="p">:</span>
|
||
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="n">f</span><span class="s1">'</span><span class="si">{value!r}</span><span class="s1"> unexpected on line </span><span class="si">{line_num}</span><span class="s1">'</span><span class="p">)</span>
|
||
<span class="k">yield</span> <span class="n">Token</span><span class="p">(</span><span class="n">kind</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">line_num</span><span class="p">,</span> <span class="n">column</span><span class="p">)</span>
|
||
|
||
<span class="n">statements</span> <span class="o">=</span> <span class="s1">'''</span>
|
||
<span class="s1"> IF quantity THEN</span>
|
||
<span class="s1"> total := total + price * quantity;</span>
|
||
<span class="s1"> tax := price * 0.05;</span>
|
||
<span class="s1"> ENDIF;</span>
|
||
<span class="s1">'''</span>
|
||
|
||
<span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">tokenize</span><span class="p">(</span><span class="n">statements</span><span class="p">):</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>The tokenizer produces the following output:</p>
|
||
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'IF'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'IF'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'quantity'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'THEN'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'THEN'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'total'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ASSIGN'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">':='</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">14</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'total'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">17</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'OP'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'+'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">23</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'price'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'OP'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'*'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">31</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'quantity'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">33</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'END'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">';'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">41</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'tax'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ASSIGN'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">':='</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'price'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'OP'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'*'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">21</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'NUMBER'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">23</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'END'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">';'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">27</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'ENDIF'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'ENDIF'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s1">'END'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">';'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">9</span><span class="p">)</span>
|
||
</pre></div>
|
||
</div>
|
||
<dl class="citation">
|
||
<dt class="label" id="frie09"><span class="brackets"><a class="fn-backref" href="#id1">Frie09</a></span></dt>
|
||
<dd><p>Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed., O’Reilly
|
||
Media, 2009. The third edition of the book no longer covers Python at all,
|
||
but the first edition covered writing good regular expression patterns in
|
||
great detail.</p>
|
||
</dd>
|
||
</dl>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||
<div class="sphinxsidebarwrapper">
|
||
<h3><a href="../contents.html">Table of Contents</a></h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#"><code class="xref py py-mod docutils literal notranslate"><span class="pre">re</span></code> — Regular expression operations</a><ul>
|
||
<li><a class="reference internal" href="#regular-expression-syntax">Regular Expression Syntax</a></li>
|
||
<li><a class="reference internal" href="#module-contents">Module Contents</a></li>
|
||
<li><a class="reference internal" href="#regular-expression-objects">Regular Expression Objects</a></li>
|
||
<li><a class="reference internal" href="#match-objects">Match Objects</a></li>
|
||
<li><a class="reference internal" href="#regular-expression-examples">Regular Expression Examples</a><ul>
|
||
<li><a class="reference internal" href="#checking-for-a-pair">Checking for a Pair</a></li>
|
||
<li><a class="reference internal" href="#simulating-scanf">Simulating scanf()</a></li>
|
||
<li><a class="reference internal" href="#search-vs-match">search() vs. match()</a></li>
|
||
<li><a class="reference internal" href="#making-a-phonebook">Making a Phonebook</a></li>
|
||
<li><a class="reference internal" href="#text-munging">Text Munging</a></li>
|
||
<li><a class="reference internal" href="#finding-all-adverbs">Finding all Adverbs</a></li>
|
||
<li><a class="reference internal" href="#finding-all-adverbs-and-their-positions">Finding all Adverbs and their Positions</a></li>
|
||
<li><a class="reference internal" href="#raw-string-notation">Raw String Notation</a></li>
|
||
<li><a class="reference internal" href="#writing-a-tokenizer">Writing a Tokenizer</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
<h4>Previous topic</h4>
|
||
<p class="topless"><a href="string.html"
|
||
title="previous chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">string</span></code> — Common string operations</a></p>
|
||
<h4>Next topic</h4>
|
||
<p class="topless"><a href="difflib.html"
|
||
title="next chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">difflib</span></code> — Helpers for computing deltas</a></p>
|
||
<div role="note" aria-label="source link">
|
||
<h3>This Page</h3>
|
||
<ul class="this-page-menu">
|
||
<li><a href="../bugs.html">Report a Bug</a></li>
|
||
<li>
|
||
<a href="https://github.com/python/cpython/blob/3.7/Doc/library/re.rst"
|
||
rel="nofollow">Show Source
|
||
</a>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="clearer"></div>
|
||
</div>
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
>index</a></li>
|
||
<li class="right" >
|
||
<a href="../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="difflib.html" title="difflib — Helpers for computing deltas"
|
||
>next</a> |</li>
|
||
<li class="right" >
|
||
<a href="string.html" title="string — Common string operations"
|
||
>previous</a> |</li>
|
||
<li><img src="../_static/py.png" alt=""
|
||
style="vertical-align: middle; margin-top: -1px"/></li>
|
||
<li><a href="https://www.python.org/">Python</a> »</li>
|
||
<li>
|
||
<span class="language_switcher_placeholder">en</span>
|
||
<span class="version_switcher_placeholder">3.7.4</span>
|
||
<a href="../index.html">Documentation </a> »
|
||
</li>
|
||
|
||
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="text.html" >Text Processing Services</a> »</li>
|
||
<li class="right">
|
||
|
||
|
||
<div class="inline-search" style="display: none" role="search">
|
||
<form class="inline-search" action="../search.html" method="get">
|
||
<input placeholder="Quick search" type="text" name="q" />
|
||
<input type="submit" value="Go" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
<script type="text/javascript">$('.inline-search').show(0);</script>
|
||
|
|
||
</li>
|
||
|
||
</ul>
|
||
</div>
|
||
<div class="footer">
|
||
© <a href="../copyright.html">Copyright</a> 2001-2019, Python Software Foundation.
|
||
<br />
|
||
The Python Software Foundation is a non-profit corporation.
|
||
<a href="https://www.python.org/psf/donations/">Please donate.</a>
|
||
<br />
|
||
Last updated on Jul 13, 2019.
|
||
<a href="../bugs.html">Found a bug</a>?
|
||
<br />
|
||
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 2.0.1.
|
||
</div>
|
||
|
||
</body>
|
||
</html> |