347 lines
21 KiB
HTML
347 lines
21 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<title>unicodedata — Unicode Database — Python 3.7.4 documentation</title>
|
||
<link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
|
||
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
|
||
|
||
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
|
||
<script type="text/javascript" src="../_static/jquery.js"></script>
|
||
<script type="text/javascript" src="../_static/underscore.js"></script>
|
||
<script type="text/javascript" src="../_static/doctools.js"></script>
|
||
<script type="text/javascript" src="../_static/language_data.js"></script>
|
||
|
||
<script type="text/javascript" src="../_static/sidebar.js"></script>
|
||
|
||
<link rel="search" type="application/opensearchdescription+xml"
|
||
title="Search within Python 3.7.4 documentation"
|
||
href="../_static/opensearch.xml"/>
|
||
<link rel="author" title="About these documents" href="../about.html" />
|
||
<link rel="index" title="Index" href="../genindex.html" />
|
||
<link rel="search" title="Search" href="../search.html" />
|
||
<link rel="copyright" title="Copyright" href="../copyright.html" />
|
||
<link rel="next" title="stringprep — Internet String Preparation" href="stringprep.html" />
|
||
<link rel="prev" title="textwrap — Text wrapping and filling" href="textwrap.html" />
|
||
<link rel="shortcut icon" type="image/png" href="../_static/py.png" />
|
||
<link rel="canonical" href="https://docs.python.org/3/library/unicodedata.html" />
|
||
|
||
<script type="text/javascript" src="../_static/copybutton.js"></script>
|
||
<script type="text/javascript" src="../_static/switchers.js"></script>
|
||
|
||
|
||
|
||
<style>
|
||
@media only screen {
|
||
table.full-width-table {
|
||
width: 100%;
|
||
}
|
||
}
|
||
</style>
|
||
|
||
|
||
</head><body>
|
||
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
accesskey="I">index</a></li>
|
||
<li class="right" >
|
||
<a href="../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="stringprep.html" title="stringprep — Internet String Preparation"
|
||
accesskey="N">next</a> |</li>
|
||
<li class="right" >
|
||
<a href="textwrap.html" title="textwrap — Text wrapping and filling"
|
||
accesskey="P">previous</a> |</li>
|
||
<li><img src="../_static/py.png" alt=""
|
||
style="vertical-align: middle; margin-top: -1px"/></li>
|
||
<li><a href="https://www.python.org/">Python</a> »</li>
|
||
<li>
|
||
<span class="language_switcher_placeholder">en</span>
|
||
<span class="version_switcher_placeholder">3.7.4</span>
|
||
<a href="../index.html">Documentation </a> »
|
||
</li>
|
||
|
||
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="text.html" accesskey="U">Text Processing Services</a> »</li>
|
||
<li class="right">
|
||
|
||
|
||
<div class="inline-search" style="display: none" role="search">
|
||
<form class="inline-search" action="../search.html" method="get">
|
||
<input placeholder="Quick search" type="text" name="q" />
|
||
<input type="submit" value="Go" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
<script type="text/javascript">$('.inline-search').show(0);</script>
|
||
|
|
||
</li>
|
||
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="document">
|
||
<div class="documentwrapper">
|
||
<div class="bodywrapper">
|
||
<div class="body" role="main">
|
||
|
||
<div class="section" id="module-unicodedata">
|
||
<span id="unicodedata-unicode-database"></span><h1><a class="reference internal" href="#module-unicodedata" title="unicodedata: Access the Unicode Database."><code class="xref py py-mod docutils literal notranslate"><span class="pre">unicodedata</span></code></a> — Unicode Database<a class="headerlink" href="#module-unicodedata" title="Permalink to this headline">¶</a></h1>
|
||
<hr class="docutils" id="index-0" />
|
||
<p>This module provides access to the Unicode Character Database (UCD) which
|
||
defines character properties for all Unicode characters. The data contained in
|
||
this database is compiled from the <a class="reference external" href="http://www.unicode.org/Public/11.0.0/ucd">UCD version 11.0.0</a>.</p>
|
||
<p>The module uses the same names and symbols as defined by Unicode
|
||
Standard Annex #44, <a class="reference external" href="http://www.unicode.org/reports/tr44/tr44-6.html">“Unicode Character Database”</a>. It defines the
|
||
following functions:</p>
|
||
<dl class="function">
|
||
<dt id="unicodedata.lookup">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">lookup</code><span class="sig-paren">(</span><em>name</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.lookup" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Look up character by name. If a character with the given name is found, return
|
||
the corresponding character. If not found, <a class="reference internal" href="exceptions.html#KeyError" title="KeyError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">KeyError</span></code></a> is raised.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.3: </span>Support for name aliases <a class="footnote-reference brackets" href="#id3" id="id1">1</a> and named sequences <a class="footnote-reference brackets" href="#id4" id="id2">2</a> has been added.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.name">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">name</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.name" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the name assigned to the character <em>chr</em> as a string. If no
|
||
name is defined, <em>default</em> is returned, or, if not given, <a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> is
|
||
raised.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.decimal">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">decimal</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.decimal" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the decimal value assigned to the character <em>chr</em> as integer.
|
||
If no such value is defined, <em>default</em> is returned, or, if not given,
|
||
<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> is raised.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.digit">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">digit</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.digit" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the digit value assigned to the character <em>chr</em> as integer.
|
||
If no such value is defined, <em>default</em> is returned, or, if not given,
|
||
<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> is raised.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.numeric">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">numeric</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.numeric" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the numeric value assigned to the character <em>chr</em> as float.
|
||
If no such value is defined, <em>default</em> is returned, or, if not given,
|
||
<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> is raised.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.category">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">category</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.category" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the general category assigned to the character <em>chr</em> as
|
||
string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.bidirectional">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">bidirectional</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.bidirectional" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the bidirectional class assigned to the character <em>chr</em> as
|
||
string. If no such value is defined, an empty string is returned.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.combining">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">combining</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.combining" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the canonical combining class assigned to the character <em>chr</em>
|
||
as integer. Returns <code class="docutils literal notranslate"><span class="pre">0</span></code> if no combining class is defined.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.east_asian_width">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">east_asian_width</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.east_asian_width" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the east asian width assigned to the character <em>chr</em> as
|
||
string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.mirrored">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">mirrored</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.mirrored" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the mirrored property assigned to the character <em>chr</em> as
|
||
integer. Returns <code class="docutils literal notranslate"><span class="pre">1</span></code> if the character has been identified as a “mirrored”
|
||
character in bidirectional text, <code class="docutils literal notranslate"><span class="pre">0</span></code> otherwise.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.decomposition">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">decomposition</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.decomposition" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Returns the character decomposition mapping assigned to the character
|
||
<em>chr</em> as string. An empty string is returned in case no such mapping is
|
||
defined.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="unicodedata.normalize">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">normalize</code><span class="sig-paren">(</span><em>form</em>, <em>unistr</em><span class="sig-paren">)</span><a class="headerlink" href="#unicodedata.normalize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the normal form <em>form</em> for the Unicode string <em>unistr</em>. Valid values for
|
||
<em>form</em> are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.</p>
|
||
<p>The Unicode standard defines various normalization forms of a Unicode string,
|
||
based on the definition of canonical equivalence and compatibility equivalence.
|
||
In Unicode, several characters can be expressed in various way. For example, the
|
||
character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as
|
||
the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).</p>
|
||
<p>For each character, there are two normal forms: normal form C and normal form D.
|
||
Normal form D (NFD) is also known as canonical decomposition, and translates
|
||
each character into its decomposed form. Normal form C (NFC) first applies a
|
||
canonical decomposition, then composes pre-combined characters again.</p>
|
||
<p>In addition to these two forms, there are two additional normal forms based on
|
||
compatibility equivalence. In Unicode, certain characters are supported which
|
||
normally would be unified with other characters. For example, U+2160 (ROMAN
|
||
NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I).
|
||
However, it is supported in Unicode for compatibility with existing character
|
||
sets (e.g. gb2312).</p>
|
||
<p>The normal form KD (NFKD) will apply the compatibility decomposition, i.e.
|
||
replace all compatibility characters with their equivalents. The normal form KC
|
||
(NFKC) first applies the compatibility decomposition, followed by the canonical
|
||
composition.</p>
|
||
<p>Even if two unicode strings are normalized and look the same to
|
||
a human reader, if one has combining characters and the other
|
||
doesn’t, they may not compare equal.</p>
|
||
</dd></dl>
|
||
|
||
<p>In addition, the module exposes the following constant:</p>
|
||
<dl class="data">
|
||
<dt id="unicodedata.unidata_version">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">unidata_version</code><a class="headerlink" href="#unicodedata.unidata_version" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>The version of the Unicode database used in this module.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="data">
|
||
<dt id="unicodedata.ucd_3_2_0">
|
||
<code class="descclassname">unicodedata.</code><code class="descname">ucd_3_2_0</code><a class="headerlink" href="#unicodedata.ucd_3_2_0" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>This is an object that has the same methods as the entire module, but uses the
|
||
Unicode database version 3.2 instead, for applications that require this
|
||
specific version of the Unicode database (such as IDNA).</p>
|
||
</dd></dl>
|
||
|
||
<p>Examples:</p>
|
||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">unicodedata</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="s1">'LEFT CURLY BRACKET'</span><span class="p">)</span>
|
||
<span class="go">'{'</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
|
||
<span class="go">'SOLIDUS'</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'9'</span><span class="p">)</span>
|
||
<span class="go">9</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span>
|
||
<span class="gt">Traceback (most recent call last):</span>
|
||
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
|
||
<span class="gr">ValueError</span>: <span class="n">not a decimal</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">category</span><span class="p">(</span><span class="s1">'A'</span><span class="p">)</span> <span class="c1"># 'L'etter, 'u'ppercase</span>
|
||
<span class="go">'Lu'</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">bidirectional</span><span class="p">(</span><span class="s1">'</span><span class="se">\u0660</span><span class="s1">'</span><span class="p">)</span> <span class="c1"># 'A'rabic, 'N'umber</span>
|
||
<span class="go">'AN'</span>
|
||
</pre></div>
|
||
</div>
|
||
<p class="rubric">Footnotes</p>
|
||
<dl class="footnote brackets">
|
||
<dt class="label" id="id3"><span class="brackets"><a class="fn-backref" href="#id1">1</a></span></dt>
|
||
<dd><p><a class="reference external" href="http://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt">http://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt</a></p>
|
||
</dd>
|
||
<dt class="label" id="id4"><span class="brackets"><a class="fn-backref" href="#id2">2</a></span></dt>
|
||
<dd><p><a class="reference external" href="http://www.unicode.org/Public/11.0.0/ucd/NamedSequences.txt">http://www.unicode.org/Public/11.0.0/ucd/NamedSequences.txt</a></p>
|
||
</dd>
|
||
</dl>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||
<div class="sphinxsidebarwrapper">
|
||
<h4>Previous topic</h4>
|
||
<p class="topless"><a href="textwrap.html"
|
||
title="previous chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">textwrap</span></code> — Text wrapping and filling</a></p>
|
||
<h4>Next topic</h4>
|
||
<p class="topless"><a href="stringprep.html"
|
||
title="next chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">stringprep</span></code> — Internet String Preparation</a></p>
|
||
<div role="note" aria-label="source link">
|
||
<h3>This Page</h3>
|
||
<ul class="this-page-menu">
|
||
<li><a href="../bugs.html">Report a Bug</a></li>
|
||
<li>
|
||
<a href="https://github.com/python/cpython/blob/3.7/Doc/library/unicodedata.rst"
|
||
rel="nofollow">Show Source
|
||
</a>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="clearer"></div>
|
||
</div>
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
>index</a></li>
|
||
<li class="right" >
|
||
<a href="../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="stringprep.html" title="stringprep — Internet String Preparation"
|
||
>next</a> |</li>
|
||
<li class="right" >
|
||
<a href="textwrap.html" title="textwrap — Text wrapping and filling"
|
||
>previous</a> |</li>
|
||
<li><img src="../_static/py.png" alt=""
|
||
style="vertical-align: middle; margin-top: -1px"/></li>
|
||
<li><a href="https://www.python.org/">Python</a> »</li>
|
||
<li>
|
||
<span class="language_switcher_placeholder">en</span>
|
||
<span class="version_switcher_placeholder">3.7.4</span>
|
||
<a href="../index.html">Documentation </a> »
|
||
</li>
|
||
|
||
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="text.html" >Text Processing Services</a> »</li>
|
||
<li class="right">
|
||
|
||
|
||
<div class="inline-search" style="display: none" role="search">
|
||
<form class="inline-search" action="../search.html" method="get">
|
||
<input placeholder="Quick search" type="text" name="q" />
|
||
<input type="submit" value="Go" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
<script type="text/javascript">$('.inline-search').show(0);</script>
|
||
|
|
||
</li>
|
||
|
||
</ul>
|
||
</div>
|
||
<div class="footer">
|
||
© <a href="../copyright.html">Copyright</a> 2001-2019, Python Software Foundation.
|
||
<br />
|
||
The Python Software Foundation is a non-profit corporation.
|
||
<a href="https://www.python.org/psf/donations/">Please donate.</a>
|
||
<br />
|
||
Last updated on Jul 13, 2019.
|
||
<a href="../bugs.html">Found a bug</a>?
|
||
<br />
|
||
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 2.0.1.
|
||
</div>
|
||
|
||
</body>
|
||
</html> |