2020 lines
188 KiB
HTML
2020 lines
188 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<title>Unicode Objects and Codecs — Python 3.7.4 documentation</title>
|
||
<link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
|
||
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
|
||
|
||
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
|
||
<script type="text/javascript" src="../_static/jquery.js"></script>
|
||
<script type="text/javascript" src="../_static/underscore.js"></script>
|
||
<script type="text/javascript" src="../_static/doctools.js"></script>
|
||
<script type="text/javascript" src="../_static/language_data.js"></script>
|
||
|
||
<script type="text/javascript" src="../_static/sidebar.js"></script>
|
||
|
||
<link rel="search" type="application/opensearchdescription+xml"
|
||
title="Search within Python 3.7.4 documentation"
|
||
href="../_static/opensearch.xml"/>
|
||
<link rel="author" title="About these documents" href="../about.html" />
|
||
<link rel="index" title="Index" href="../genindex.html" />
|
||
<link rel="search" title="Search" href="../search.html" />
|
||
<link rel="copyright" title="Copyright" href="../copyright.html" />
|
||
<link rel="next" title="Tuple Objects" href="tuple.html" />
|
||
<link rel="prev" title="Byte Array Objects" href="bytearray.html" />
|
||
<link rel="shortcut icon" type="image/png" href="../_static/py.png" />
|
||
<link rel="canonical" href="https://docs.python.org/3/c-api/unicode.html" />
|
||
|
||
<script type="text/javascript" src="../_static/copybutton.js"></script>
|
||
<script type="text/javascript" src="../_static/switchers.js"></script>
|
||
|
||
|
||
|
||
<style>
|
||
@media only screen {
|
||
table.full-width-table {
|
||
width: 100%;
|
||
}
|
||
}
|
||
</style>
|
||
|
||
|
||
</head><body>
|
||
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
accesskey="I">index</a></li>
|
||
<li class="right" >
|
||
<a href="../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="tuple.html" title="Tuple Objects"
|
||
accesskey="N">next</a> |</li>
|
||
<li class="right" >
|
||
<a href="bytearray.html" title="Byte Array Objects"
|
||
accesskey="P">previous</a> |</li>
|
||
<li><img src="../_static/py.png" alt=""
|
||
style="vertical-align: middle; margin-top: -1px"/></li>
|
||
<li><a href="https://www.python.org/">Python</a> »</li>
|
||
<li>
|
||
<span class="language_switcher_placeholder">en</span>
|
||
<span class="version_switcher_placeholder">3.7.4</span>
|
||
<a href="../index.html">Documentation </a> »
|
||
</li>
|
||
|
||
<li class="nav-item nav-item-1"><a href="index.html" >Python/C API Reference Manual</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="concrete.html" accesskey="U">Concrete Objects Layer</a> »</li>
|
||
<li class="right">
|
||
|
||
|
||
<div class="inline-search" style="display: none" role="search">
|
||
<form class="inline-search" action="../search.html" method="get">
|
||
<input placeholder="Quick search" type="text" name="q" />
|
||
<input type="submit" value="Go" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
<script type="text/javascript">$('.inline-search').show(0);</script>
|
||
|
|
||
</li>
|
||
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="document">
|
||
<div class="documentwrapper">
|
||
<div class="bodywrapper">
|
||
<div class="body" role="main">
|
||
|
||
<div class="section" id="unicode-objects-and-codecs">
|
||
<span id="unicodeobjects"></span><h1>Unicode Objects and Codecs<a class="headerlink" href="#unicode-objects-and-codecs" title="Permalink to this headline">¶</a></h1>
|
||
<div class="section" id="unicode-objects">
|
||
<h2>Unicode Objects<a class="headerlink" href="#unicode-objects" title="Permalink to this headline">¶</a></h2>
|
||
<p>Since the implementation of <span class="target" id="index-0"></span><a class="pep reference external" href="https://www.python.org/dev/peps/pep-0393"><strong>PEP 393</strong></a> in Python 3.3, Unicode objects internally
|
||
use a variety of representations, in order to allow handling the complete range
|
||
of Unicode characters while staying memory efficient. There are special cases
|
||
for strings where all code points are below 128, 256, or 65536; otherwise, code
|
||
points must be below 1114112 (which is the full Unicode range).</p>
|
||
<p><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code></a> and UTF-8 representations are created on demand and cached
|
||
in the Unicode object. The <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code></a> representation is deprecated
|
||
and inefficient; it should be avoided in performance- or memory-sensitive
|
||
situations.</p>
|
||
<p>Due to the transition between the old APIs and the new APIs, Unicode objects
|
||
can internally be in two states depending on how they were created:</p>
|
||
<ul class="simple">
|
||
<li><p>“canonical” Unicode objects are all objects created by a non-deprecated
|
||
Unicode API. They use the most efficient representation allowed by the
|
||
implementation.</p></li>
|
||
<li><p>“legacy” Unicode objects have been created through one of the deprecated
|
||
APIs (typically <a class="reference internal" href="#c.PyUnicode_FromUnicode" title="PyUnicode_FromUnicode"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode()</span></code></a>) and only bear the
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code></a> representation; you will have to call
|
||
<a class="reference internal" href="#c.PyUnicode_READY" title="PyUnicode_READY"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></a> on them before calling any other API.</p></li>
|
||
</ul>
|
||
<div class="section" id="unicode-type">
|
||
<h3>Unicode Type<a class="headerlink" href="#unicode-type" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the basic Unicode object types used for the Unicode implementation in
|
||
Python:</p>
|
||
<dl class="type">
|
||
<dt id="c.Py_UCS4">
|
||
<code class="descname">Py_UCS4</code><a class="headerlink" href="#c.Py_UCS4" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.Py_UCS2">
|
||
<code class="descname">Py_UCS2</code><a class="headerlink" href="#c.Py_UCS2" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.Py_UCS1">
|
||
<code class="descname">Py_UCS1</code><a class="headerlink" href="#c.Py_UCS1" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>These types are typedefs for unsigned integer types wide enough to contain
|
||
characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
|
||
single Unicode characters, use <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UCS4</span></code></a>.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="type">
|
||
<dt id="c.Py_UNICODE">
|
||
<code class="descname">Py_UNICODE</code><a class="headerlink" href="#c.Py_UNICODE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>This is a typedef of <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code>, which is a 16-bit type or 32-bit type
|
||
depending on the platform.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.3: </span>In previous versions, this was a 16-bit type or a 32-bit type depending on
|
||
whether you selected a “narrow” or “wide” Unicode version of Python at
|
||
build time.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="type">
|
||
<dt id="c.PyASCIIObject">
|
||
<code class="descname">PyASCIIObject</code><a class="headerlink" href="#c.PyASCIIObject" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyCompactUnicodeObject">
|
||
<code class="descname">PyCompactUnicodeObject</code><a class="headerlink" href="#c.PyCompactUnicodeObject" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicodeObject">
|
||
<code class="descname">PyUnicodeObject</code><a class="headerlink" href="#c.PyUnicodeObject" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>These subtypes of <a class="reference internal" href="structures.html#c.PyObject" title="PyObject"><code class="xref c c-type docutils literal notranslate"><span class="pre">PyObject</span></code></a> represent a Python Unicode object. In
|
||
almost all cases, they shouldn’t be used directly, since all API functions
|
||
that deal with Unicode objects take and return <a class="reference internal" href="structures.html#c.PyObject" title="PyObject"><code class="xref c c-type docutils literal notranslate"><span class="pre">PyObject</span></code></a> pointers.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="var">
|
||
<dt id="c.PyUnicode_Type">
|
||
<a class="reference internal" href="type.html#c.PyTypeObject" title="PyTypeObject">PyTypeObject</a> <code class="descname">PyUnicode_Type</code><a class="headerlink" href="#c.PyUnicode_Type" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>This instance of <a class="reference internal" href="type.html#c.PyTypeObject" title="PyTypeObject"><code class="xref c c-type docutils literal notranslate"><span class="pre">PyTypeObject</span></code></a> represents the Python Unicode type. It
|
||
is exposed to Python code as <code class="docutils literal notranslate"><span class="pre">str</span></code>.</p>
|
||
</dd></dl>
|
||
|
||
<p>The following APIs are really C macros and can be used to do fast checks and to
|
||
access internal read-only data of Unicode objects:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Check">
|
||
int <code class="descname">PyUnicode_Check</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Check" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return true if the object <em>o</em> is a Unicode object or an instance of a Unicode
|
||
subtype.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_CheckExact">
|
||
int <code class="descname">PyUnicode_CheckExact</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_CheckExact" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return true if the object <em>o</em> is a Unicode object, but not an instance of a
|
||
subtype.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_READY">
|
||
int <code class="descname">PyUnicode_READY</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_READY" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Ensure the string object <em>o</em> is in the “canonical” representation. This is
|
||
required before using any of the access macros described below.</p>
|
||
<p>Returns <code class="docutils literal notranslate"><span class="pre">0</span></code> on success and <code class="docutils literal notranslate"><span class="pre">-1</span></code> with an exception set on failure, which in
|
||
particular happens if memory allocation fails.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_GET_LENGTH">
|
||
Py_ssize_t <code class="descname">PyUnicode_GET_LENGTH</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_GET_LENGTH" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the length of the Unicode string, in code points. <em>o</em> has to be a
|
||
Unicode object in the “canonical” representation (not checked).</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_1BYTE_DATA">
|
||
<a class="reference internal" href="#c.Py_UCS1" title="Py_UCS1">Py_UCS1</a>* <code class="descname">PyUnicode_1BYTE_DATA</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_1BYTE_DATA" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicode_2BYTE_DATA">
|
||
<a class="reference internal" href="#c.Py_UCS2" title="Py_UCS2">Py_UCS2</a>* <code class="descname">PyUnicode_2BYTE_DATA</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_2BYTE_DATA" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicode_4BYTE_DATA">
|
||
<a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a>* <code class="descname">PyUnicode_4BYTE_DATA</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_4BYTE_DATA" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
|
||
integer types for direct character access. No checks are performed if the
|
||
canonical representation has the correct character size; use
|
||
<a class="reference internal" href="#c.PyUnicode_KIND" title="PyUnicode_KIND"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_KIND()</span></code></a> to select the right macro. Make sure
|
||
<a class="reference internal" href="#c.PyUnicode_READY" title="PyUnicode_READY"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></a> has been called before accessing this.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="macro">
|
||
<dt id="c.PyUnicode_WCHAR_KIND">
|
||
<code class="descname">PyUnicode_WCHAR_KIND</code><a class="headerlink" href="#c.PyUnicode_WCHAR_KIND" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicode_1BYTE_KIND">
|
||
<code class="descname">PyUnicode_1BYTE_KIND</code><a class="headerlink" href="#c.PyUnicode_1BYTE_KIND" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicode_2BYTE_KIND">
|
||
<code class="descname">PyUnicode_2BYTE_KIND</code><a class="headerlink" href="#c.PyUnicode_2BYTE_KIND" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicode_4BYTE_KIND">
|
||
<code class="descname">PyUnicode_4BYTE_KIND</code><a class="headerlink" href="#c.PyUnicode_4BYTE_KIND" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return values of the <a class="reference internal" href="#c.PyUnicode_KIND" title="PyUnicode_KIND"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_KIND()</span></code></a> macro.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_KIND">
|
||
int <code class="descname">PyUnicode_KIND</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_KIND" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return one of the PyUnicode kind constants (see above) that indicate how many
|
||
bytes per character this Unicode object uses to store its data. <em>o</em> has to
|
||
be a Unicode object in the “canonical” representation (not checked).</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DATA">
|
||
void* <code class="descname">PyUnicode_DATA</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DATA" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a void pointer to the raw Unicode buffer. <em>o</em> has to be a Unicode
|
||
object in the “canonical” representation (not checked).</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_WRITE">
|
||
void <code class="descname">PyUnicode_WRITE</code><span class="sig-paren">(</span>int<em> kind</em>, void<em> *data</em>, Py_ssize_t<em> index</em>, <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a><em> value</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_WRITE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Write into a canonical representation <em>data</em> (as obtained with
|
||
<a class="reference internal" href="#c.PyUnicode_DATA" title="PyUnicode_DATA"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DATA()</span></code></a>). This macro does not do any sanity checks and is
|
||
intended for usage in loops. The caller should cache the <em>kind</em> value and
|
||
<em>data</em> pointer as obtained from other macro calls. <em>index</em> is the index in
|
||
the string (starts at 0) and <em>value</em> is the new code point value which should
|
||
be written to that location.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_READ">
|
||
<a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a> <code class="descname">PyUnicode_READ</code><span class="sig-paren">(</span>int<em> kind</em>, void<em> *data</em>, Py_ssize_t<em> index</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_READ" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Read a code point from a canonical representation <em>data</em> (as obtained with
|
||
<a class="reference internal" href="#c.PyUnicode_DATA" title="PyUnicode_DATA"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DATA()</span></code></a>). No checks or ready calls are performed.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_READ_CHAR">
|
||
<a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a> <code class="descname">PyUnicode_READ_CHAR</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em>, Py_ssize_t<em> index</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_READ_CHAR" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Read a character from a Unicode object <em>o</em>, which must be in the “canonical”
|
||
representation. This is less efficient than <a class="reference internal" href="#c.PyUnicode_READ" title="PyUnicode_READ"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_READ()</span></code></a> if you
|
||
do multiple consecutive reads.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_MAX_CHAR_VALUE">
|
||
<code class="descname">PyUnicode_MAX_CHAR_VALUE</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_MAX_CHAR_VALUE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the maximum code point that is suitable for creating another string
|
||
based on <em>o</em>, which must be in the “canonical” representation. This is
|
||
always an approximation but more efficient than iterating over the string.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_ClearFreeList">
|
||
int <code class="descname">PyUnicode_ClearFreeList</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_ClearFreeList" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Clear the free list. Return the total number of freed items.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_GET_SIZE">
|
||
Py_ssize_t <code class="descname">PyUnicode_GET_SIZE</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_GET_SIZE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the size of the deprecated <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> representation, in
|
||
code units (this includes surrogate pairs as 2 units). <em>o</em> has to be a
|
||
Unicode object (not checked).</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style Unicode API, please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_GET_LENGTH" title="PyUnicode_GET_LENGTH"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_GET_LENGTH()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_GET_DATA_SIZE">
|
||
Py_ssize_t <code class="descname">PyUnicode_GET_DATA_SIZE</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_GET_DATA_SIZE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the size of the deprecated <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> representation in
|
||
bytes. <em>o</em> has to be a Unicode object (not checked).</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style Unicode API, please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_GET_LENGTH" title="PyUnicode_GET_LENGTH"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_GET_LENGTH()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AS_UNICODE">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a>* <code class="descname">PyUnicode_AS_UNICODE</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AS_UNICODE" title="Permalink to this definition">¶</a></dt>
|
||
<dt id="c.PyUnicode_AS_DATA">
|
||
const char* <code class="descname">PyUnicode_AS_DATA</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *o</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AS_DATA" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a pointer to a <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> representation of the object. The
|
||
returned buffer is always terminated with an extra null code point. It
|
||
may also contain embedded null code points, which would cause the string
|
||
to be truncated when used in most C functions. The <code class="docutils literal notranslate"><span class="pre">AS_DATA</span></code> form
|
||
casts the pointer to <code class="xref c c-type docutils literal notranslate"><span class="pre">const</span> <span class="pre">char</span> <span class="pre">*</span></code>. The <em>o</em> argument has to be
|
||
a Unicode object (not checked).</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.3: </span>This macro is now inefficient – because in many cases the
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> representation does not exist and needs to be created
|
||
– and can fail (return <em>NULL</em> with an exception set). Try to port the
|
||
code to use the new <code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_nBYTE_DATA()</span></code> macros or use
|
||
<a class="reference internal" href="#c.PyUnicode_WRITE" title="PyUnicode_WRITE"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_WRITE()</span></code></a> or <a class="reference internal" href="#c.PyUnicode_READ" title="PyUnicode_READ"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_READ()</span></code></a>.</p>
|
||
</div>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style Unicode API, please migrate to using the
|
||
<code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_nBYTE_DATA()</span></code> family of macros.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="unicode-character-properties">
|
||
<h3>Unicode Character Properties<a class="headerlink" href="#unicode-character-properties" title="Permalink to this headline">¶</a></h3>
|
||
<p>Unicode provides many different character properties. The most often needed ones
|
||
are available through these macros which are mapped to C functions depending on
|
||
the Python configuration.</p>
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISSPACE">
|
||
int <code class="descname">Py_UNICODE_ISSPACE</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISSPACE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a whitespace character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISLOWER">
|
||
int <code class="descname">Py_UNICODE_ISLOWER</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISLOWER" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a lowercase character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISUPPER">
|
||
int <code class="descname">Py_UNICODE_ISUPPER</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISUPPER" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is an uppercase character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISTITLE">
|
||
int <code class="descname">Py_UNICODE_ISTITLE</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISTITLE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a titlecase character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISLINEBREAK">
|
||
int <code class="descname">Py_UNICODE_ISLINEBREAK</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISLINEBREAK" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a linebreak character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISDECIMAL">
|
||
int <code class="descname">Py_UNICODE_ISDECIMAL</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISDECIMAL" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a decimal character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISDIGIT">
|
||
int <code class="descname">Py_UNICODE_ISDIGIT</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISDIGIT" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a digit character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISNUMERIC">
|
||
int <code class="descname">Py_UNICODE_ISNUMERIC</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISNUMERIC" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a numeric character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISALPHA">
|
||
int <code class="descname">Py_UNICODE_ISALPHA</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISALPHA" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is an alphabetic character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISALNUM">
|
||
int <code class="descname">Py_UNICODE_ISALNUM</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISALNUM" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is an alphanumeric character.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_ISPRINTABLE">
|
||
int <code class="descname">Py_UNICODE_ISPRINTABLE</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_ISPRINTABLE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> or <code class="docutils literal notranslate"><span class="pre">0</span></code> depending on whether <em>ch</em> is a printable character.
|
||
Nonprintable characters are those characters defined in the Unicode character
|
||
database as “Other” or “Separator”, excepting the ASCII space (0x20) which is
|
||
considered printable. (Note that printable characters in this context are
|
||
those which should not be escaped when <a class="reference internal" href="../library/functions.html#repr" title="repr"><code class="xref py py-func docutils literal notranslate"><span class="pre">repr()</span></code></a> is invoked on a string.
|
||
It has no bearing on the handling of strings written to <a class="reference internal" href="../library/sys.html#sys.stdout" title="sys.stdout"><code class="xref py py-data docutils literal notranslate"><span class="pre">sys.stdout</span></code></a> or
|
||
<a class="reference internal" href="../library/sys.html#sys.stderr" title="sys.stderr"><code class="xref py py-data docutils literal notranslate"><span class="pre">sys.stderr</span></code></a>.)</p>
|
||
</dd></dl>
|
||
|
||
<p>These APIs can be used for fast direct character conversions:</p>
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_TOLOWER">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a> <code class="descname">Py_UNICODE_TOLOWER</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_TOLOWER" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the character <em>ch</em> converted to lower case.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified deprecated">Deprecated since version 3.3: </span>This function uses simple case mappings.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_TOUPPER">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a> <code class="descname">Py_UNICODE_TOUPPER</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_TOUPPER" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the character <em>ch</em> converted to upper case.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified deprecated">Deprecated since version 3.3: </span>This function uses simple case mappings.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_TOTITLE">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a> <code class="descname">Py_UNICODE_TOTITLE</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_TOTITLE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the character <em>ch</em> converted to title case.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified deprecated">Deprecated since version 3.3: </span>This function uses simple case mappings.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_TODECIMAL">
|
||
int <code class="descname">Py_UNICODE_TODECIMAL</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_TODECIMAL" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the character <em>ch</em> converted to a decimal positive integer. Return
|
||
<code class="docutils literal notranslate"><span class="pre">-1</span></code> if this is not possible. This macro does not raise exceptions.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_TODIGIT">
|
||
int <code class="descname">Py_UNICODE_TODIGIT</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_TODIGIT" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the character <em>ch</em> converted to a single digit integer. Return <code class="docutils literal notranslate"><span class="pre">-1</span></code> if
|
||
this is not possible. This macro does not raise exceptions.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.Py_UNICODE_TONUMERIC">
|
||
double <code class="descname">Py_UNICODE_TONUMERIC</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> ch</em><span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_TONUMERIC" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the character <em>ch</em> converted to a double. Return <code class="docutils literal notranslate"><span class="pre">-1.0</span></code> if this is not
|
||
possible. This macro does not raise exceptions.</p>
|
||
</dd></dl>
|
||
|
||
<p>These APIs can be used to work with surrogates:</p>
|
||
<dl class="macro">
|
||
<dt id="c.Py_UNICODE_IS_SURROGATE">
|
||
<code class="descname">Py_UNICODE_IS_SURROGATE</code><span class="sig-paren">(</span>ch<span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_IS_SURROGATE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Check if <em>ch</em> is a surrogate (<code class="docutils literal notranslate"><span class="pre">0xD800</span> <span class="pre"><=</span> <span class="pre">ch</span> <span class="pre"><=</span> <span class="pre">0xDFFF</span></code>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="macro">
|
||
<dt id="c.Py_UNICODE_IS_HIGH_SURROGATE">
|
||
<code class="descname">Py_UNICODE_IS_HIGH_SURROGATE</code><span class="sig-paren">(</span>ch<span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_IS_HIGH_SURROGATE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Check if <em>ch</em> is a high surrogate (<code class="docutils literal notranslate"><span class="pre">0xD800</span> <span class="pre"><=</span> <span class="pre">ch</span> <span class="pre"><=</span> <span class="pre">0xDBFF</span></code>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="macro">
|
||
<dt id="c.Py_UNICODE_IS_LOW_SURROGATE">
|
||
<code class="descname">Py_UNICODE_IS_LOW_SURROGATE</code><span class="sig-paren">(</span>ch<span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_IS_LOW_SURROGATE" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Check if <em>ch</em> is a low surrogate (<code class="docutils literal notranslate"><span class="pre">0xDC00</span> <span class="pre"><=</span> <span class="pre">ch</span> <span class="pre"><=</span> <span class="pre">0xDFFF</span></code>).</p>
|
||
</dd></dl>
|
||
|
||
<dl class="macro">
|
||
<dt id="c.Py_UNICODE_JOIN_SURROGATES">
|
||
<code class="descname">Py_UNICODE_JOIN_SURROGATES</code><span class="sig-paren">(</span>high, low<span class="sig-paren">)</span><a class="headerlink" href="#c.Py_UNICODE_JOIN_SURROGATES" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Join two surrogate characters and return a single Py_UCS4 value.
|
||
<em>high</em> and <em>low</em> are respectively the leading and trailing surrogates in a
|
||
surrogate pair.</p>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="creating-and-accessing-unicode-strings">
|
||
<h3>Creating and accessing Unicode strings<a class="headerlink" href="#creating-and-accessing-unicode-strings" title="Permalink to this headline">¶</a></h3>
|
||
<p>To create Unicode objects and access their basic sequence properties, use these
|
||
APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_New">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_New</code><span class="sig-paren">(</span>Py_ssize_t<em> size</em>, <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a><em> maxchar</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_New" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a new Unicode object. <em>maxchar</em> should be the true maximum code point
|
||
to be placed in the string. As an approximation, it can be rounded up to the
|
||
nearest value in the sequence 127, 255, 65535, 1114111.</p>
|
||
<p>This is the recommended way to allocate a new Unicode object. Objects
|
||
created using this function are not resizable.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromKindAndData">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromKindAndData</code><span class="sig-paren">(</span>int<em> kind</em>, const void<em> *buffer</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromKindAndData" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a new Unicode object with the given <em>kind</em> (possible values are
|
||
<a class="reference internal" href="#c.PyUnicode_1BYTE_KIND" title="PyUnicode_1BYTE_KIND"><code class="xref c c-macro docutils literal notranslate"><span class="pre">PyUnicode_1BYTE_KIND</span></code></a> etc., as returned by
|
||
<a class="reference internal" href="#c.PyUnicode_KIND" title="PyUnicode_KIND"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_KIND()</span></code></a>). The <em>buffer</em> must point to an array of <em>size</em>
|
||
units of 1, 2 or 4 bytes per character, as given by the kind.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromStringAndSize">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromStringAndSize</code><span class="sig-paren">(</span>const char<em> *u</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromStringAndSize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object from the char buffer <em>u</em>. The bytes will be
|
||
interpreted as being UTF-8 encoded. The buffer is copied into the new
|
||
object. If the buffer is not <em>NULL</em>, the return value might be a shared
|
||
object, i.e. modification of the data is not allowed.</p>
|
||
<p>If <em>u</em> is <em>NULL</em>, this function behaves like <a class="reference internal" href="#c.PyUnicode_FromUnicode" title="PyUnicode_FromUnicode"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FromUnicode()</span></code></a>
|
||
with the buffer set to <em>NULL</em>. This usage is deprecated in favor of
|
||
<a class="reference internal" href="#c.PyUnicode_New" title="PyUnicode_New"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_New()</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a> *<code class="descname">PyUnicode_FromString</code><span class="sig-paren">(</span>const char<em> *u</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object from a UTF-8 encoded null-terminated char buffer
|
||
<em>u</em>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromFormat">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromFormat</code><span class="sig-paren">(</span>const char<em> *format</em>, ...<span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromFormat" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Take a C <code class="xref c c-func docutils literal notranslate"><span class="pre">printf()</span></code>-style <em>format</em> string and a variable number of
|
||
arguments, calculate the size of the resulting Python Unicode string and return
|
||
a string with the values formatted into it. The variable arguments must be C
|
||
types and must correspond exactly to the format characters in the <em>format</em>
|
||
ASCII-encoded string. The following format characters are allowed:</p>
|
||
<table class="docutils align-center">
|
||
<colgroup>
|
||
<col style="width: 26%" />
|
||
<col style="width: 29%" />
|
||
<col style="width: 44%" />
|
||
</colgroup>
|
||
<thead>
|
||
<tr class="row-odd"><th class="head"><p>Format Characters</p></th>
|
||
<th class="head"><p>Type</p></th>
|
||
<th class="head"><p>Comment</p></th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%%</span></code></p></td>
|
||
<td><p><em>n/a</em></p></td>
|
||
<td><p>The literal % character.</p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%c</span></code></p></td>
|
||
<td><p>int</p></td>
|
||
<td><p>A single character,
|
||
represented as a C int.</p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%d</span></code></p></td>
|
||
<td><p>int</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%d")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id1">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%u</span></code></p></td>
|
||
<td><p>unsigned int</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%u")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id2">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%ld</span></code></p></td>
|
||
<td><p>long</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%ld")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id3">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%li</span></code></p></td>
|
||
<td><p>long</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%li")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id4">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%lu</span></code></p></td>
|
||
<td><p>unsigned long</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%lu")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id5">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%lld</span></code></p></td>
|
||
<td><p>long long</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%lld")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id6">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%lli</span></code></p></td>
|
||
<td><p>long long</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%lli")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id7">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%llu</span></code></p></td>
|
||
<td><p>unsigned long long</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%llu")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id8">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%zd</span></code></p></td>
|
||
<td><p>Py_ssize_t</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%zd")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id9">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%zi</span></code></p></td>
|
||
<td><p>Py_ssize_t</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%zi")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id10">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%zu</span></code></p></td>
|
||
<td><p>size_t</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%zu")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id11">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%i</span></code></p></td>
|
||
<td><p>int</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%i")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id12">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%x</span></code></p></td>
|
||
<td><p>int</p></td>
|
||
<td><p>Equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%x")</span></code>. <a class="footnote-reference brackets" href="#id14" id="id13">1</a></p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%s</span></code></p></td>
|
||
<td><p>const char*</p></td>
|
||
<td><p>A null-terminated C character
|
||
array.</p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%p</span></code></p></td>
|
||
<td><p>const void*</p></td>
|
||
<td><p>The hex representation of a C
|
||
pointer. Mostly equivalent to
|
||
<code class="docutils literal notranslate"><span class="pre">printf("%p")</span></code> except that
|
||
it is guaranteed to start with
|
||
the literal <code class="docutils literal notranslate"><span class="pre">0x</span></code> regardless
|
||
of what the platform’s
|
||
<code class="docutils literal notranslate"><span class="pre">printf</span></code> yields.</p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%A</span></code></p></td>
|
||
<td><p>PyObject*</p></td>
|
||
<td><p>The result of calling
|
||
<a class="reference internal" href="../library/functions.html#ascii" title="ascii"><code class="xref py py-func docutils literal notranslate"><span class="pre">ascii()</span></code></a>.</p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%U</span></code></p></td>
|
||
<td><p>PyObject*</p></td>
|
||
<td><p>A Unicode object.</p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%V</span></code></p></td>
|
||
<td><p>PyObject*,
|
||
const char*</p></td>
|
||
<td><p>A Unicode object (which may be
|
||
<em>NULL</em>) and a null-terminated
|
||
C character array as a second
|
||
parameter (which will be used,
|
||
if the first parameter is
|
||
<em>NULL</em>).</p></td>
|
||
</tr>
|
||
<tr class="row-even"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%S</span></code></p></td>
|
||
<td><p>PyObject*</p></td>
|
||
<td><p>The result of calling
|
||
<a class="reference internal" href="object.html#c.PyObject_Str" title="PyObject_Str"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyObject_Str()</span></code></a>.</p></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><p><code class="xref py py-attr docutils literal notranslate"><span class="pre">%R</span></code></p></td>
|
||
<td><p>PyObject*</p></td>
|
||
<td><p>The result of calling
|
||
<a class="reference internal" href="object.html#c.PyObject_Repr" title="PyObject_Repr"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyObject_Repr()</span></code></a>.</p></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>An unrecognized format character causes all the rest of the format string to be
|
||
copied as-is to the result string, and any extra arguments discarded.</p>
|
||
<div class="admonition note">
|
||
<p class="admonition-title">Note</p>
|
||
<p>The width formatter unit is number of characters rather than bytes.
|
||
The precision formatter unit is number of bytes for <code class="docutils literal notranslate"><span class="pre">"%s"</span></code> and
|
||
<code class="docutils literal notranslate"><span class="pre">"%V"</span></code> (if the <code class="docutils literal notranslate"><span class="pre">PyObject*</span></code> argument is NULL), and a number of
|
||
characters for <code class="docutils literal notranslate"><span class="pre">"%A"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%U"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%S"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%R"</span></code> and <code class="docutils literal notranslate"><span class="pre">"%V"</span></code>
|
||
(if the <code class="docutils literal notranslate"><span class="pre">PyObject*</span></code> argument is not NULL).</p>
|
||
</div>
|
||
<dl class="footnote brackets">
|
||
<dt class="label" id="id14"><span class="brackets">1</span><span class="fn-backref">(<a href="#id1">1</a>,<a href="#id2">2</a>,<a href="#id3">3</a>,<a href="#id4">4</a>,<a href="#id5">5</a>,<a href="#id6">6</a>,<a href="#id7">7</a>,<a href="#id8">8</a>,<a href="#id9">9</a>,<a href="#id10">10</a>,<a href="#id11">11</a>,<a href="#id12">12</a>,<a href="#id13">13</a>)</span></dt>
|
||
<dd><p>For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
|
||
zu, i, x): the 0-conversion flag has effect even when a precision is given.</p>
|
||
</dd>
|
||
</dl>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.2: </span>Support for <code class="docutils literal notranslate"><span class="pre">"%lld"</span></code> and <code class="docutils literal notranslate"><span class="pre">"%llu"</span></code> added.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.3: </span>Support for <code class="docutils literal notranslate"><span class="pre">"%li"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%lli"</span></code> and <code class="docutils literal notranslate"><span class="pre">"%zi"</span></code> added.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.4: </span>Support width and precision formatter for <code class="docutils literal notranslate"><span class="pre">"%s"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%A"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%U"</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">"%V"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%S"</span></code>, <code class="docutils literal notranslate"><span class="pre">"%R"</span></code> added.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromFormatV">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromFormatV</code><span class="sig-paren">(</span>const char<em> *format</em>, va_list<em> vargs</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromFormatV" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Identical to <a class="reference internal" href="#c.PyUnicode_FromFormat" title="PyUnicode_FromFormat"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FromFormat()</span></code></a> except that it takes exactly two
|
||
arguments.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromEncodedObject">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromEncodedObject</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *obj</em>, const char<em> *encoding</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromEncodedObject" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Decode an encoded object <em>obj</em> to a Unicode object.</p>
|
||
<p><a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a>, <a class="reference internal" href="../library/stdtypes.html#bytearray" title="bytearray"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytearray</span></code></a> and other
|
||
<a class="reference internal" href="../glossary.html#term-bytes-like-object"><span class="xref std std-term">bytes-like objects</span></a>
|
||
are decoded according to the given <em>encoding</em> and using the error handling
|
||
defined by <em>errors</em>. Both can be <em>NULL</em> to have the interface use the default
|
||
values (see <a class="reference internal" href="#builtincodecs"><span class="std std-ref">Built-in Codecs</span></a> for details).</p>
|
||
<p>All other objects, including Unicode objects, cause a <a class="reference internal" href="../library/exceptions.html#TypeError" title="TypeError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">TypeError</span></code></a> to be
|
||
set.</p>
|
||
<p>The API returns <em>NULL</em> if there was an error. The caller is responsible for
|
||
decref’ing the returned objects.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_GetLength">
|
||
Py_ssize_t <code class="descname">PyUnicode_GetLength</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_GetLength" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the length of the Unicode object, in code points.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_CopyCharacters">
|
||
Py_ssize_t <code class="descname">PyUnicode_CopyCharacters</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *to</em>, Py_ssize_t<em> to_start</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *from</em>, Py_ssize_t<em> from_start</em>, Py_ssize_t<em> how_many</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_CopyCharacters" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Copy characters from one Unicode object into another. This function performs
|
||
character conversion when necessary and falls back to <code class="xref c c-func docutils literal notranslate"><span class="pre">memcpy()</span></code> if
|
||
possible. Returns <code class="docutils literal notranslate"><span class="pre">-1</span></code> and sets an exception on error, otherwise returns
|
||
the number of copied characters.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Fill">
|
||
Py_ssize_t <code class="descname">PyUnicode_Fill</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, Py_ssize_t<em> start</em>, Py_ssize_t<em> length</em>, <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a><em> fill_char</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Fill" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Fill a string with a character: write <em>fill_char</em> into
|
||
<code class="docutils literal notranslate"><span class="pre">unicode[start:start+length]</span></code>.</p>
|
||
<p>Fail if <em>fill_char</em> is bigger than the string maximum character, or if the
|
||
string has more than 1 reference.</p>
|
||
<p>Return the number of written character, or return <code class="docutils literal notranslate"><span class="pre">-1</span></code> and raise an
|
||
exception on error.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_WriteChar">
|
||
int <code class="descname">PyUnicode_WriteChar</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, Py_ssize_t<em> index</em>, <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a><em> character</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_WriteChar" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Write a character to a string. The string must have been created through
|
||
<a class="reference internal" href="#c.PyUnicode_New" title="PyUnicode_New"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_New()</span></code></a>. Since Unicode strings are supposed to be immutable,
|
||
the string must not be shared, or have been hashed yet.</p>
|
||
<p>This function checks that <em>unicode</em> is a Unicode object, that the index is
|
||
not out of bounds, and that the object can be modified safely (i.e. that it
|
||
its reference count is one).</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_ReadChar">
|
||
<a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a> <code class="descname">PyUnicode_ReadChar</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, Py_ssize_t<em> index</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_ReadChar" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Read a character from a string. This function checks that <em>unicode</em> is a
|
||
Unicode object and the index is not out of bounds, in contrast to the macro
|
||
version <a class="reference internal" href="#c.PyUnicode_READ_CHAR" title="PyUnicode_READ_CHAR"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_READ_CHAR()</span></code></a>.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Substring">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Substring</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, Py_ssize_t<em> start</em>, Py_ssize_t<em> end</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Substring" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Return a substring of <em>str</em>, from character index <em>start</em> (included) to
|
||
character index <em>end</em> (excluded). Negative indices are not supported.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUCS4">
|
||
<a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a>* <code class="descname">PyUnicode_AsUCS4</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *u</em>, <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a><em> *buffer</em>, Py_ssize_t<em> buflen</em>, int<em> copy_null</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUCS4" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Copy the string <em>u</em> into a UCS4 buffer, including a null character, if
|
||
<em>copy_null</em> is set. Returns <em>NULL</em> and sets an exception on error (in
|
||
particular, a <a class="reference internal" href="../library/exceptions.html#SystemError" title="SystemError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">SystemError</span></code></a> if <em>buflen</em> is smaller than the length of
|
||
<em>u</em>). <em>buffer</em> is returned on success.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUCS4Copy">
|
||
<a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a>* <code class="descname">PyUnicode_AsUCS4Copy</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *u</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUCS4Copy" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Copy the string <em>u</em> into a new UCS4 buffer that is allocated using
|
||
<a class="reference internal" href="memory.html#c.PyMem_Malloc" title="PyMem_Malloc"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyMem_Malloc()</span></code></a>. If this fails, <em>NULL</em> is returned with a
|
||
<a class="reference internal" href="../library/exceptions.html#MemoryError" title="MemoryError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">MemoryError</span></code></a> set. The returned buffer always has an extra
|
||
null code point appended.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="deprecated-py-unicode-apis">
|
||
<h3>Deprecated Py_UNICODE APIs<a class="headerlink" href="#deprecated-py-unicode-apis" title="Permalink to this headline">¶</a></h3>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0.</span></p>
|
||
</div>
|
||
<p>These API functions are deprecated with the implementation of <span class="target" id="index-1"></span><a class="pep reference external" href="https://www.python.org/dev/peps/pep-0393"><strong>PEP 393</strong></a>.
|
||
Extension modules can continue using them, as they will not be removed in Python
|
||
3.x, but need to be aware that their use can now cause performance and memory hits.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromUnicode">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromUnicode</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *u</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromUnicode" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object from the Py_UNICODE buffer <em>u</em> of the given size. <em>u</em>
|
||
may be <em>NULL</em> which causes the contents to be undefined. It is the user’s
|
||
responsibility to fill in the needed data. The buffer is copied into the new
|
||
object.</p>
|
||
<p>If the buffer is not <em>NULL</em>, the return value might be a shared object.
|
||
Therefore, modification of the resulting Unicode object is only allowed when
|
||
<em>u</em> is <em>NULL</em>.</p>
|
||
<p>If the buffer is <em>NULL</em>, <a class="reference internal" href="#c.PyUnicode_READY" title="PyUnicode_READY"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_READY()</span></code></a> must be called once the
|
||
string content has been filled before using any of the access macros such as
|
||
<a class="reference internal" href="#c.PyUnicode_KIND" title="PyUnicode_KIND"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_KIND()</span></code></a>.</p>
|
||
<p>Please migrate to using <a class="reference internal" href="#c.PyUnicode_FromKindAndData" title="PyUnicode_FromKindAndData"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FromKindAndData()</span></code></a>,
|
||
<a class="reference internal" href="#c.PyUnicode_FromWideChar" title="PyUnicode_FromWideChar"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FromWideChar()</span></code></a> or <a class="reference internal" href="#c.PyUnicode_New" title="PyUnicode_New"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_New()</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUnicode">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a>* <code class="descname">PyUnicode_AsUnicode</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUnicode" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a read-only pointer to the Unicode object’s internal
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer, or <em>NULL</em> on error. This will create the
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code></a> representation of the object if it is not yet
|
||
available. The buffer is always terminated with an extra null code point.
|
||
Note that the resulting <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> string may also contain
|
||
embedded null code points, which would cause the string to be truncated when
|
||
used in most C functions.</p>
|
||
<p>Please migrate to using <a class="reference internal" href="#c.PyUnicode_AsUCS4" title="PyUnicode_AsUCS4"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUCS4()</span></code></a>,
|
||
<a class="reference internal" href="#c.PyUnicode_AsWideChar" title="PyUnicode_AsWideChar"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsWideChar()</span></code></a>, <a class="reference internal" href="#c.PyUnicode_ReadChar" title="PyUnicode_ReadChar"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_ReadChar()</span></code></a> or similar new
|
||
APIs.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_TransformDecimalToASCII">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_TransformDecimalToASCII</code><span class="sig-paren">(</span><a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_TransformDecimalToASCII" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by replacing all decimal digits in
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> by ASCII digits 0–9
|
||
according to their decimal value. Return <em>NULL</em> if an exception occurs.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUnicodeAndSize">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a>* <code class="descname">PyUnicode_AsUnicodeAndSize</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, Py_ssize_t<em> *size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUnicodeAndSize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Like <a class="reference internal" href="#c.PyUnicode_AsUnicode" title="PyUnicode_AsUnicode"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUnicode()</span></code></a>, but also saves the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_UNICODE()</span></code></a>
|
||
array length (excluding the extra null terminator) in <em>size</em>.
|
||
Note that the resulting <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code></a> string
|
||
may contain embedded null code points, which would cause the string to be
|
||
truncated when used in most C functions.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUnicodeCopy">
|
||
<a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a>* <code class="descname">PyUnicode_AsUnicodeCopy</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUnicodeCopy" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Create a copy of a Unicode string ending with a null code point. Return <em>NULL</em>
|
||
and raise a <a class="reference internal" href="../library/exceptions.html#MemoryError" title="MemoryError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">MemoryError</span></code></a> exception on memory allocation failure,
|
||
otherwise return a new allocated buffer (use <a class="reference internal" href="memory.html#c.PyMem_Free" title="PyMem_Free"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyMem_Free()</span></code></a> to free
|
||
the buffer). Note that the resulting <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE*</span></code></a> string may
|
||
contain embedded null code points, which would cause the string to be
|
||
truncated when used in most C functions.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.2.</span></p>
|
||
</div>
|
||
<p>Please migrate to using <a class="reference internal" href="#c.PyUnicode_AsUCS4Copy" title="PyUnicode_AsUCS4Copy"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUCS4Copy()</span></code></a> or similar new APIs.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_GetSize">
|
||
Py_ssize_t <code class="descname">PyUnicode_GetSize</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_GetSize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the size of the deprecated <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> representation, in
|
||
code units (this includes surrogate pairs as 2 units).</p>
|
||
<p>Please migrate to using <a class="reference internal" href="#c.PyUnicode_GetLength" title="PyUnicode_GetLength"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_GetLength()</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromObject">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromObject</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *obj</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromObject" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Copy an instance of a Unicode subtype to a new true Unicode object if
|
||
necessary. If <em>obj</em> is already a true Unicode object (not a subtype),
|
||
return the reference with incremented refcount.</p>
|
||
<p>Objects other than Unicode or its subtypes will cause a <a class="reference internal" href="../library/exceptions.html#TypeError" title="TypeError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">TypeError</span></code></a>.</p>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="locale-encoding">
|
||
<h3>Locale Encoding<a class="headerlink" href="#locale-encoding" title="Permalink to this headline">¶</a></h3>
|
||
<p>The current locale encoding can be used to decode text from the operating
|
||
system.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeLocaleAndSize">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeLocaleAndSize</code><span class="sig-paren">(</span>const char<em> *str</em>, Py_ssize_t<em> len</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeLocaleAndSize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Decode a string from UTF-8 on Android, or from the current locale encoding
|
||
on other platforms. The supported
|
||
error handlers are <code class="docutils literal notranslate"><span class="pre">"strict"</span></code> and <code class="docutils literal notranslate"><span class="pre">"surrogateescape"</span></code>
|
||
(<span class="target" id="index-2"></span><a class="pep reference external" href="https://www.python.org/dev/peps/pep-0383"><strong>PEP 383</strong></a>). The decoder uses <code class="docutils literal notranslate"><span class="pre">"strict"</span></code> error handler if
|
||
<em>errors</em> is <code class="docutils literal notranslate"><span class="pre">NULL</span></code>. <em>str</em> must end with a null character but
|
||
cannot contain embedded null characters.</p>
|
||
<p>Use <a class="reference internal" href="#c.PyUnicode_DecodeFSDefaultAndSize" title="PyUnicode_DecodeFSDefaultAndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeFSDefaultAndSize()</span></code></a> to decode a string from
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> (the locale encoding read at
|
||
Python startup).</p>
|
||
<p>This function ignores the Python UTF-8 mode.</p>
|
||
<div class="admonition seealso">
|
||
<p class="admonition-title">See also</p>
|
||
<p>The <a class="reference internal" href="sys.html#c.Py_DecodeLocale" title="Py_DecodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_DecodeLocale()</span></code></a> function.</p>
|
||
</div>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>The function now also uses the current locale encoding for the
|
||
<code class="docutils literal notranslate"><span class="pre">surrogateescape</span></code> error handler, except on Android. Previously, <a class="reference internal" href="sys.html#c.Py_DecodeLocale" title="Py_DecodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_DecodeLocale()</span></code></a>
|
||
was used for the <code class="docutils literal notranslate"><span class="pre">surrogateescape</span></code>, and the current locale encoding was
|
||
used for <code class="docutils literal notranslate"><span class="pre">strict</span></code>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeLocale">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeLocale</code><span class="sig-paren">(</span>const char<em> *str</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeLocale" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Similar to <a class="reference internal" href="#c.PyUnicode_DecodeLocaleAndSize" title="PyUnicode_DecodeLocaleAndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeLocaleAndSize()</span></code></a>, but compute the string
|
||
length using <code class="xref c c-func docutils literal notranslate"><span class="pre">strlen()</span></code>.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeLocale">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeLocale</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeLocale" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object to UTF-8 on Android, or to the current locale
|
||
encoding on other platforms. The
|
||
supported error handlers are <code class="docutils literal notranslate"><span class="pre">"strict"</span></code> and <code class="docutils literal notranslate"><span class="pre">"surrogateescape"</span></code>
|
||
(<span class="target" id="index-3"></span><a class="pep reference external" href="https://www.python.org/dev/peps/pep-0383"><strong>PEP 383</strong></a>). The encoder uses <code class="docutils literal notranslate"><span class="pre">"strict"</span></code> error handler if
|
||
<em>errors</em> is <code class="docutils literal notranslate"><span class="pre">NULL</span></code>. Return a <a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> object. <em>unicode</em> cannot
|
||
contain embedded null characters.</p>
|
||
<p>Use <a class="reference internal" href="#c.PyUnicode_EncodeFSDefault" title="PyUnicode_EncodeFSDefault"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_EncodeFSDefault()</span></code></a> to encode a string to
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> (the locale encoding read at
|
||
Python startup).</p>
|
||
<p>This function ignores the Python UTF-8 mode.</p>
|
||
<div class="admonition seealso">
|
||
<p class="admonition-title">See also</p>
|
||
<p>The <a class="reference internal" href="sys.html#c.Py_EncodeLocale" title="Py_EncodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_EncodeLocale()</span></code></a> function.</p>
|
||
</div>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>The function now also uses the current locale encoding for the
|
||
<code class="docutils literal notranslate"><span class="pre">surrogateescape</span></code> error handler, except on Android. Previously,
|
||
<a class="reference internal" href="sys.html#c.Py_EncodeLocale" title="Py_EncodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_EncodeLocale()</span></code></a>
|
||
was used for the <code class="docutils literal notranslate"><span class="pre">surrogateescape</span></code>, and the current locale encoding was
|
||
used for <code class="docutils literal notranslate"><span class="pre">strict</span></code>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="file-system-encoding">
|
||
<h3>File System Encoding<a class="headerlink" href="#file-system-encoding" title="Permalink to this headline">¶</a></h3>
|
||
<p>To encode and decode file names and other environment strings,
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> should be used as the encoding, and
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> should be used as the error handler
|
||
(<span class="target" id="index-4"></span><a class="pep reference external" href="https://www.python.org/dev/peps/pep-0383"><strong>PEP 383</strong></a> and <span class="target" id="index-5"></span><a class="pep reference external" href="https://www.python.org/dev/peps/pep-0529"><strong>PEP 529</strong></a>). To encode file names to <a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> during
|
||
argument parsing, the <code class="docutils literal notranslate"><span class="pre">"O&"</span></code> converter should be used, passing
|
||
<a class="reference internal" href="#c.PyUnicode_FSConverter" title="PyUnicode_FSConverter"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FSConverter()</span></code></a> as the conversion function:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FSConverter">
|
||
int <code class="descname">PyUnicode_FSConverter</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>*<em> obj</em>, void*<em> result</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FSConverter" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>ParseTuple converter: encode <a class="reference internal" href="../library/stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a> objects – obtained directly or
|
||
through the <a class="reference internal" href="../library/os.html#os.PathLike" title="os.PathLike"><code class="xref py py-class docutils literal notranslate"><span class="pre">os.PathLike</span></code></a> interface – to <a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> using
|
||
<a class="reference internal" href="#c.PyUnicode_EncodeFSDefault" title="PyUnicode_EncodeFSDefault"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_EncodeFSDefault()</span></code></a>; <a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> objects are output as-is.
|
||
<em>result</em> must be a <a class="reference internal" href="bytes.html#c.PyBytesObject" title="PyBytesObject"><code class="xref c c-type docutils literal notranslate"><span class="pre">PyBytesObject*</span></code></a> which must be released when it is
|
||
no longer used.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.1.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Accepts a <a class="reference internal" href="../glossary.html#term-path-like-object"><span class="xref std std-term">path-like object</span></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<p>To decode file names to <a class="reference internal" href="../library/stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a> during argument parsing, the <code class="docutils literal notranslate"><span class="pre">"O&"</span></code>
|
||
converter should be used, passing <a class="reference internal" href="#c.PyUnicode_FSDecoder" title="PyUnicode_FSDecoder"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FSDecoder()</span></code></a> as the
|
||
conversion function:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FSDecoder">
|
||
int <code class="descname">PyUnicode_FSDecoder</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>*<em> obj</em>, void*<em> result</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FSDecoder" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>ParseTuple converter: decode <a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> objects – obtained either
|
||
directly or indirectly through the <a class="reference internal" href="../library/os.html#os.PathLike" title="os.PathLike"><code class="xref py py-class docutils literal notranslate"><span class="pre">os.PathLike</span></code></a> interface – to
|
||
<a class="reference internal" href="../library/stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a> using <a class="reference internal" href="#c.PyUnicode_DecodeFSDefaultAndSize" title="PyUnicode_DecodeFSDefaultAndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeFSDefaultAndSize()</span></code></a>; <a class="reference internal" href="../library/stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a>
|
||
objects are output as-is. <em>result</em> must be a <a class="reference internal" href="#c.PyUnicodeObject" title="PyUnicodeObject"><code class="xref c c-type docutils literal notranslate"><span class="pre">PyUnicodeObject*</span></code></a> which
|
||
must be released when it is no longer used.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.2.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Accepts a <a class="reference internal" href="../glossary.html#term-path-like-object"><span class="xref std std-term">path-like object</span></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeFSDefaultAndSize">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeFSDefaultAndSize</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeFSDefaultAndSize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Decode a string using <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> and the
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> error handler.</p>
|
||
<p>If <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> is not set, fall back to the
|
||
locale encoding.</p>
|
||
<p><code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> is initialized at startup from the
|
||
locale encoding and cannot be modified later. If you need to decode a string
|
||
from the current locale encoding, use
|
||
<a class="reference internal" href="#c.PyUnicode_DecodeLocaleAndSize" title="PyUnicode_DecodeLocaleAndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeLocaleAndSize()</span></code></a>.</p>
|
||
<div class="admonition seealso">
|
||
<p class="admonition-title">See also</p>
|
||
<p>The <a class="reference internal" href="sys.html#c.Py_DecodeLocale" title="Py_DecodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_DecodeLocale()</span></code></a> function.</p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Use <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> error handler.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeFSDefault">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeFSDefault</code><span class="sig-paren">(</span>const char<em> *s</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeFSDefault" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Decode a null-terminated string using <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code>
|
||
and the <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> error handler.</p>
|
||
<p>If <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> is not set, fall back to the
|
||
locale encoding.</p>
|
||
<p>Use <a class="reference internal" href="#c.PyUnicode_DecodeFSDefaultAndSize" title="PyUnicode_DecodeFSDefaultAndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeFSDefaultAndSize()</span></code></a> if you know the string length.</p>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Use <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> error handler.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeFSDefault">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeFSDefault</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeFSDefault" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object to <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> with the
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> error handler, and return
|
||
<a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a>. Note that the resulting <a class="reference internal" href="../library/stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> object may contain
|
||
null bytes.</p>
|
||
<p>If <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> is not set, fall back to the
|
||
locale encoding.</p>
|
||
<p><code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> is initialized at startup from the
|
||
locale encoding and cannot be modified later. If you need to encode a string
|
||
to the current locale encoding, use <a class="reference internal" href="#c.PyUnicode_EncodeLocale" title="PyUnicode_EncodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_EncodeLocale()</span></code></a>.</p>
|
||
<div class="admonition seealso">
|
||
<p class="admonition-title">See also</p>
|
||
<p>The <a class="reference internal" href="sys.html#c.Py_EncodeLocale" title="Py_EncodeLocale"><code class="xref c c-func docutils literal notranslate"><span class="pre">Py_EncodeLocale()</span></code></a> function.</p>
|
||
</div>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.2.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.6: </span>Use <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncodeErrors</span></code> error handler.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="wchar-t-support">
|
||
<h3>wchar_t Support<a class="headerlink" href="#wchar-t-support" title="Permalink to this headline">¶</a></h3>
|
||
<p><code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code> support for platforms which support it:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FromWideChar">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_FromWideChar</code><span class="sig-paren">(</span>const wchar_t<em> *w</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FromWideChar" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object from the <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code> buffer <em>w</em> of the given <em>size</em>.
|
||
Passing <code class="docutils literal notranslate"><span class="pre">-1</span></code> as the <em>size</em> indicates that the function must itself compute the length,
|
||
using wcslen.
|
||
Return <em>NULL</em> on failure.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsWideChar">
|
||
Py_ssize_t <code class="descname">PyUnicode_AsWideChar</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, wchar_t<em> *w</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsWideChar" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Copy the Unicode object contents into the <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code> buffer <em>w</em>. At most
|
||
<em>size</em> <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code> characters are copied (excluding a possibly trailing
|
||
null termination character). Return the number of <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code> characters
|
||
copied or <code class="docutils literal notranslate"><span class="pre">-1</span></code> in case of an error. Note that the resulting <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t*</span></code>
|
||
string may or may not be null-terminated. It is the responsibility of the caller
|
||
to make sure that the <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t*</span></code> string is null-terminated in case this is
|
||
required by the application. Also, note that the <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t*</span></code> string
|
||
might contain null characters, which would cause the string to be truncated
|
||
when used with most C functions.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsWideCharString">
|
||
wchar_t* <code class="descname">PyUnicode_AsWideCharString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, Py_ssize_t<em> *size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsWideCharString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Convert the Unicode object to a wide character string. The output string
|
||
always ends with a null character. If <em>size</em> is not <em>NULL</em>, write the number
|
||
of wide characters (excluding the trailing null termination character) into
|
||
<em>*size</em>. Note that the resulting <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t</span></code> string might contain
|
||
null characters, which would cause the string to be truncated when used with
|
||
most C functions. If <em>size</em> is <em>NULL</em> and the <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t*</span></code> string
|
||
contains null characters a <a class="reference internal" href="../library/exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> is raised.</p>
|
||
<p>Returns a buffer allocated by <code class="xref c c-func docutils literal notranslate"><span class="pre">PyMem_Alloc()</span></code> (use
|
||
<a class="reference internal" href="memory.html#c.PyMem_Free" title="PyMem_Free"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyMem_Free()</span></code></a> to free it) on success. On error, returns <em>NULL</em>
|
||
and <em>*size</em> is undefined. Raises a <a class="reference internal" href="../library/exceptions.html#MemoryError" title="MemoryError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">MemoryError</span></code></a> if memory allocation
|
||
is failed.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.2.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>Raises a <a class="reference internal" href="../library/exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> if <em>size</em> is <em>NULL</em> and the <code class="xref c c-type docutils literal notranslate"><span class="pre">wchar_t*</span></code>
|
||
string contains null characters.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
</div>
|
||
<div class="section" id="built-in-codecs">
|
||
<span id="builtincodecs"></span><h2>Built-in Codecs<a class="headerlink" href="#built-in-codecs" title="Permalink to this headline">¶</a></h2>
|
||
<p>Python provides a set of built-in codecs which are written in C for speed. All of
|
||
these codecs are directly usable via the following functions.</p>
|
||
<p>Many of the following APIs take two arguments encoding and errors, and they
|
||
have the same semantics as the ones of the built-in <a class="reference internal" href="../library/stdtypes.html#str" title="str"><code class="xref py py-func docutils literal notranslate"><span class="pre">str()</span></code></a> string object
|
||
constructor.</p>
|
||
<p>Setting encoding to <em>NULL</em> causes the default encoding to be used
|
||
which is ASCII. The file system calls should use
|
||
<a class="reference internal" href="#c.PyUnicode_FSConverter" title="PyUnicode_FSConverter"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FSConverter()</span></code></a> for encoding file names. This uses the
|
||
variable <code class="xref c c-data docutils literal notranslate"><span class="pre">Py_FileSystemDefaultEncoding</span></code> internally. This
|
||
variable should be treated as read-only: on some systems, it will be a
|
||
pointer to a static string, on others, it will change at run-time
|
||
(such as when the application invokes setlocale).</p>
|
||
<p>Error handling is set by errors which may also be set to <em>NULL</em> meaning to use
|
||
the default handling defined for the codec. Default error handling for all
|
||
built-in codecs is “strict” (<a class="reference internal" href="../library/exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">ValueError</span></code></a> is raised).</p>
|
||
<p>The codecs all use a similar interface. Only deviation from the following
|
||
generic ones are documented for simplicity.</p>
|
||
<div class="section" id="generic-codecs">
|
||
<h3>Generic Codecs<a class="headerlink" href="#generic-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the generic codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Decode">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Decode</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *encoding</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Decode" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the encoded string <em>s</em>.
|
||
<em>encoding</em> and <em>errors</em> have the same meaning as the parameters of the same name
|
||
in the <a class="reference internal" href="../library/stdtypes.html#str" title="str"><code class="xref py py-func docutils literal notranslate"><span class="pre">str()</span></code></a> built-in function. The codec to be used is looked up
|
||
using the Python codec registry. Return <em>NULL</em> if an exception was raised by
|
||
the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsEncodedString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsEncodedString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, const char<em> *encoding</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsEncodedString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object and return the result as Python bytes object.
|
||
<em>encoding</em> and <em>errors</em> have the same meaning as the parameters of the same
|
||
name in the Unicode <a class="reference internal" href="../library/stdtypes.html#str.encode" title="str.encode"><code class="xref py py-meth docutils literal notranslate"><span class="pre">encode()</span></code></a> method. The codec to be used is looked up
|
||
using the Python codec registry. Return <em>NULL</em> if an exception was raised by
|
||
the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Encode">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Encode</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *encoding</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Encode" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer <em>s</em> of the given <em>size</em> and return a Python
|
||
bytes object. <em>encoding</em> and <em>errors</em> have the same meaning as the
|
||
parameters of the same name in the Unicode <a class="reference internal" href="../library/stdtypes.html#str.encode" title="str.encode"><code class="xref py py-meth docutils literal notranslate"><span class="pre">encode()</span></code></a> method. The codec
|
||
to be used is looked up using the Python codec registry. Return <em>NULL</em> if an
|
||
exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="utf-8-codecs">
|
||
<h3>UTF-8 Codecs<a class="headerlink" href="#utf-8-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the UTF-8 codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF8">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF8</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF8" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the UTF-8 encoded string
|
||
<em>s</em>. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF8Stateful">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF8Stateful</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, Py_ssize_t<em> *consumed</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF8Stateful" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>If <em>consumed</em> is <em>NULL</em>, behave like <a class="reference internal" href="#c.PyUnicode_DecodeUTF8" title="PyUnicode_DecodeUTF8"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF8()</span></code></a>. If
|
||
<em>consumed</em> is not <em>NULL</em>, trailing incomplete UTF-8 byte sequences will not be
|
||
treated as an error. Those bytes will not be decoded and the number of bytes
|
||
that have been decoded will be stored in <em>consumed</em>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUTF8String">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsUTF8String</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUTF8String" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using UTF-8 and return the result as Python bytes
|
||
object. Error handling is “strict”. Return <em>NULL</em> if an exception was
|
||
raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUTF8AndSize">
|
||
const char* <code class="descname">PyUnicode_AsUTF8AndSize</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, Py_ssize_t<em> *size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUTF8AndSize" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return a pointer to the UTF-8 encoding of the Unicode object, and
|
||
store the size of the encoded representation (in bytes) in <em>size</em>. The
|
||
<em>size</em> argument can be <em>NULL</em>; in this case no size will be stored. The
|
||
returned buffer always has an extra null byte appended (not included in
|
||
<em>size</em>), regardless of whether there are any other null code points.</p>
|
||
<p>In the case of an error, <em>NULL</em> is returned with an exception set and no
|
||
<em>size</em> is stored.</p>
|
||
<p>This caches the UTF-8 representation of the string in the Unicode object, and
|
||
subsequent calls will return a pointer to the same buffer. The caller is not
|
||
responsible for deallocating the buffer.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>The return type is now <code class="docutils literal notranslate"><span class="pre">const</span> <span class="pre">char</span> <span class="pre">*</span></code> rather of <code class="docutils literal notranslate"><span class="pre">char</span> <span class="pre">*</span></code>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUTF8">
|
||
const char* <code class="descname">PyUnicode_AsUTF8</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUTF8" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>As <a class="reference internal" href="#c.PyUnicode_AsUTF8AndSize" title="PyUnicode_AsUTF8AndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUTF8AndSize()</span></code></a>, but does not store the size.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span>The return type is now <code class="docutils literal notranslate"><span class="pre">const</span> <span class="pre">char</span> <span class="pre">*</span></code> rather of <code class="docutils literal notranslate"><span class="pre">char</span> <span class="pre">*</span></code>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeUTF8">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeUTF8</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeUTF8" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer <em>s</em> of the given <em>size</em> using UTF-8 and
|
||
return a Python bytes object. Return <em>NULL</em> if an exception was raised by
|
||
the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsUTF8String" title="PyUnicode_AsUTF8String"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUTF8String()</span></code></a>, <a class="reference internal" href="#c.PyUnicode_AsUTF8AndSize" title="PyUnicode_AsUTF8AndSize"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUTF8AndSize()</span></code></a> or
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="utf-32-codecs">
|
||
<h3>UTF-32 Codecs<a class="headerlink" href="#utf-32-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the UTF-32 codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF32">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF32</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, int<em> *byteorder</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF32" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Decode <em>size</em> bytes from a UTF-32 encoded buffer string and return the
|
||
corresponding Unicode object. <em>errors</em> (if non-<em>NULL</em>) defines the error
|
||
handling. It defaults to “strict”.</p>
|
||
<p>If <em>byteorder</em> is non-<em>NULL</em>, the decoder starts decoding using the given byte
|
||
order:</p>
|
||
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="o">*</span><span class="n">byteorder</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="o">:</span> <span class="n">little</span> <span class="n">endian</span>
|
||
<span class="o">*</span><span class="n">byteorder</span> <span class="o">==</span> <span class="mi">0</span><span class="o">:</span> <span class="n">native</span> <span class="n">order</span>
|
||
<span class="o">*</span><span class="n">byteorder</span> <span class="o">==</span> <span class="mi">1</span><span class="o">:</span> <span class="n">big</span> <span class="n">endian</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If <code class="docutils literal notranslate"><span class="pre">*byteorder</span></code> is zero, and the first four bytes of the input data are a
|
||
byte order mark (BOM), the decoder switches to this byte order and the BOM is
|
||
not copied into the resulting Unicode string. If <code class="docutils literal notranslate"><span class="pre">*byteorder</span></code> is <code class="docutils literal notranslate"><span class="pre">-1</span></code> or
|
||
<code class="docutils literal notranslate"><span class="pre">1</span></code>, any byte order mark is copied to the output.</p>
|
||
<p>After completion, <em>*byteorder</em> is set to the current byte order at the end
|
||
of input data.</p>
|
||
<p>If <em>byteorder</em> is <em>NULL</em>, the codec starts in native order mode.</p>
|
||
<p>Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF32Stateful">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF32Stateful</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, int<em> *byteorder</em>, Py_ssize_t<em> *consumed</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF32Stateful" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>If <em>consumed</em> is <em>NULL</em>, behave like <a class="reference internal" href="#c.PyUnicode_DecodeUTF32" title="PyUnicode_DecodeUTF32"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF32()</span></code></a>. If
|
||
<em>consumed</em> is not <em>NULL</em>, <a class="reference internal" href="#c.PyUnicode_DecodeUTF32Stateful" title="PyUnicode_DecodeUTF32Stateful"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF32Stateful()</span></code></a> will not treat
|
||
trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
|
||
by four) as an error. Those bytes will not be decoded and the number of bytes
|
||
that have been decoded will be stored in <em>consumed</em>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUTF32String">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsUTF32String</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUTF32String" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Return a Python byte string using the UTF-32 encoding in native byte
|
||
order. The string always starts with a BOM mark. Error handling is “strict”.
|
||
Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeUTF32">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeUTF32</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, int<em> byteorder</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeUTF32" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Return a Python bytes object holding the UTF-32 encoded value of the Unicode
|
||
data in <em>s</em>. Output is written according to the following byte order:</p>
|
||
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="n">byteorder</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="o">:</span> <span class="n">little</span> <span class="n">endian</span>
|
||
<span class="n">byteorder</span> <span class="o">==</span> <span class="mi">0</span><span class="o">:</span> <span class="n">native</span> <span class="n">byte</span> <span class="n">order</span> <span class="p">(</span><span class="n">writes</span> <span class="n">a</span> <span class="n">BOM</span> <span class="n">mark</span><span class="p">)</span>
|
||
<span class="n">byteorder</span> <span class="o">==</span> <span class="mi">1</span><span class="o">:</span> <span class="n">big</span> <span class="n">endian</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If byteorder is <code class="docutils literal notranslate"><span class="pre">0</span></code>, the output string will always start with the Unicode BOM
|
||
mark (U+FEFF). In the other two modes, no BOM mark is prepended.</p>
|
||
<p>If <em>Py_UNICODE_WIDE</em> is not defined, surrogate pairs will be output
|
||
as a single code point.</p>
|
||
<p>Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsUTF32String" title="PyUnicode_AsUTF32String"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUTF32String()</span></code></a> or <a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="utf-16-codecs">
|
||
<h3>UTF-16 Codecs<a class="headerlink" href="#utf-16-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the UTF-16 codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF16">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF16</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, int<em> *byteorder</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF16" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Decode <em>size</em> bytes from a UTF-16 encoded buffer string and return the
|
||
corresponding Unicode object. <em>errors</em> (if non-<em>NULL</em>) defines the error
|
||
handling. It defaults to “strict”.</p>
|
||
<p>If <em>byteorder</em> is non-<em>NULL</em>, the decoder starts decoding using the given byte
|
||
order:</p>
|
||
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="o">*</span><span class="n">byteorder</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="o">:</span> <span class="n">little</span> <span class="n">endian</span>
|
||
<span class="o">*</span><span class="n">byteorder</span> <span class="o">==</span> <span class="mi">0</span><span class="o">:</span> <span class="n">native</span> <span class="n">order</span>
|
||
<span class="o">*</span><span class="n">byteorder</span> <span class="o">==</span> <span class="mi">1</span><span class="o">:</span> <span class="n">big</span> <span class="n">endian</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If <code class="docutils literal notranslate"><span class="pre">*byteorder</span></code> is zero, and the first two bytes of the input data are a
|
||
byte order mark (BOM), the decoder switches to this byte order and the BOM is
|
||
not copied into the resulting Unicode string. If <code class="docutils literal notranslate"><span class="pre">*byteorder</span></code> is <code class="docutils literal notranslate"><span class="pre">-1</span></code> or
|
||
<code class="docutils literal notranslate"><span class="pre">1</span></code>, any byte order mark is copied to the output (where it will result in
|
||
either a <code class="docutils literal notranslate"><span class="pre">\ufeff</span></code> or a <code class="docutils literal notranslate"><span class="pre">\ufffe</span></code> character).</p>
|
||
<p>After completion, <em>*byteorder</em> is set to the current byte order at the end
|
||
of input data.</p>
|
||
<p>If <em>byteorder</em> is <em>NULL</em>, the codec starts in native order mode.</p>
|
||
<p>Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF16Stateful">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF16Stateful</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, int<em> *byteorder</em>, Py_ssize_t<em> *consumed</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF16Stateful" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>If <em>consumed</em> is <em>NULL</em>, behave like <a class="reference internal" href="#c.PyUnicode_DecodeUTF16" title="PyUnicode_DecodeUTF16"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF16()</span></code></a>. If
|
||
<em>consumed</em> is not <em>NULL</em>, <a class="reference internal" href="#c.PyUnicode_DecodeUTF16Stateful" title="PyUnicode_DecodeUTF16Stateful"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF16Stateful()</span></code></a> will not treat
|
||
trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
|
||
split surrogate pair) as an error. Those bytes will not be decoded and the
|
||
number of bytes that have been decoded will be stored in <em>consumed</em>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUTF16String">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsUTF16String</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUTF16String" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Return a Python byte string using the UTF-16 encoding in native byte
|
||
order. The string always starts with a BOM mark. Error handling is “strict”.
|
||
Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeUTF16">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeUTF16</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, int<em> byteorder</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeUTF16" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Return a Python bytes object holding the UTF-16 encoded value of the Unicode
|
||
data in <em>s</em>. Output is written according to the following byte order:</p>
|
||
<div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="n">byteorder</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="o">:</span> <span class="n">little</span> <span class="n">endian</span>
|
||
<span class="n">byteorder</span> <span class="o">==</span> <span class="mi">0</span><span class="o">:</span> <span class="n">native</span> <span class="n">byte</span> <span class="n">order</span> <span class="p">(</span><span class="n">writes</span> <span class="n">a</span> <span class="n">BOM</span> <span class="n">mark</span><span class="p">)</span>
|
||
<span class="n">byteorder</span> <span class="o">==</span> <span class="mi">1</span><span class="o">:</span> <span class="n">big</span> <span class="n">endian</span>
|
||
</pre></div>
|
||
</div>
|
||
<p>If byteorder is <code class="docutils literal notranslate"><span class="pre">0</span></code>, the output string will always start with the Unicode BOM
|
||
mark (U+FEFF). In the other two modes, no BOM mark is prepended.</p>
|
||
<p>If <em>Py_UNICODE_WIDE</em> is defined, a single <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> value may get
|
||
represented as a surrogate pair. If it is not defined, each <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a>
|
||
values is interpreted as a UCS-2 character.</p>
|
||
<p>Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsUTF16String" title="PyUnicode_AsUTF16String"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUTF16String()</span></code></a> or <a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="utf-7-codecs">
|
||
<h3>UTF-7 Codecs<a class="headerlink" href="#utf-7-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the UTF-7 codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF7">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF7</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF7" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the UTF-7 encoded string
|
||
<em>s</em>. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUTF7Stateful">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUTF7Stateful</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, Py_ssize_t<em> *consumed</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUTF7Stateful" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>If <em>consumed</em> is <em>NULL</em>, behave like <a class="reference internal" href="#c.PyUnicode_DecodeUTF7" title="PyUnicode_DecodeUTF7"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeUTF7()</span></code></a>. If
|
||
<em>consumed</em> is not <em>NULL</em>, trailing incomplete UTF-7 base-64 sections will not
|
||
be treated as an error. Those bytes will not be decoded and the number of
|
||
bytes that have been decoded will be stored in <em>consumed</em>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeUTF7">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeUTF7</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, int<em> base64SetO</em>, int<em> base64WhiteSpace</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeUTF7" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given size using UTF-7 and
|
||
return a Python bytes object. Return <em>NULL</em> if an exception was raised by
|
||
the codec.</p>
|
||
<p>If <em>base64SetO</em> is nonzero, “Set O” (punctuation that has no otherwise
|
||
special meaning) will be encoded in base-64. If <em>base64WhiteSpace</em> is
|
||
nonzero, whitespace will be encoded in base-64. Both are set to zero for the
|
||
Python “utf-7” codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="unicode-escape-codecs">
|
||
<h3>Unicode-Escape Codecs<a class="headerlink" href="#unicode-escape-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the “Unicode Escape” codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeUnicodeEscape">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeUnicodeEscape</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeUnicodeEscape" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the Unicode-Escape encoded
|
||
string <em>s</em>. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsUnicodeEscapeString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsUnicodeEscapeString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsUnicodeEscapeString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using Unicode-Escape and return the result as a
|
||
bytes object. Error handling is “strict”. Return <em>NULL</em> if an exception was
|
||
raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeUnicodeEscape">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeUnicodeEscape</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeUnicodeEscape" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> using Unicode-Escape and
|
||
return a bytes object. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsUnicodeEscapeString" title="PyUnicode_AsUnicodeEscapeString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsUnicodeEscapeString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="raw-unicode-escape-codecs">
|
||
<h3>Raw-Unicode-Escape Codecs<a class="headerlink" href="#raw-unicode-escape-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the “Raw Unicode Escape” codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeRawUnicodeEscape">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeRawUnicodeEscape</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeRawUnicodeEscape" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the Raw-Unicode-Escape
|
||
encoded string <em>s</em>. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsRawUnicodeEscapeString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsRawUnicodeEscapeString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsRawUnicodeEscapeString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using Raw-Unicode-Escape and return the result as
|
||
a bytes object. Error handling is “strict”. Return <em>NULL</em> if an exception
|
||
was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeRawUnicodeEscape">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeRawUnicodeEscape</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeRawUnicodeEscape" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> using Raw-Unicode-Escape
|
||
and return a bytes object. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsRawUnicodeEscapeString" title="PyUnicode_AsRawUnicodeEscapeString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsRawUnicodeEscapeString()</span></code></a> or
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="latin-1-codecs">
|
||
<h3>Latin-1 Codecs<a class="headerlink" href="#latin-1-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
|
||
ordinals and only these are accepted by the codecs during encoding.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeLatin1">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeLatin1</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeLatin1" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the Latin-1 encoded string
|
||
<em>s</em>. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsLatin1String">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsLatin1String</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsLatin1String" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using Latin-1 and return the result as Python bytes
|
||
object. Error handling is “strict”. Return <em>NULL</em> if an exception was
|
||
raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeLatin1">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeLatin1</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeLatin1" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> using Latin-1 and
|
||
return a Python bytes object. Return <em>NULL</em> if an exception was raised by
|
||
the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsLatin1String" title="PyUnicode_AsLatin1String"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsLatin1String()</span></code></a> or
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="ascii-codecs">
|
||
<h3>ASCII Codecs<a class="headerlink" href="#ascii-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
|
||
codes generate errors.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeASCII">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeASCII</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeASCII" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the ASCII encoded string
|
||
<em>s</em>. Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsASCIIString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsASCIIString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsASCIIString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using ASCII and return the result as Python bytes
|
||
object. Error handling is “strict”. Return <em>NULL</em> if an exception was
|
||
raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeASCII">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeASCII</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeASCII" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> using ASCII and
|
||
return a Python bytes object. Return <em>NULL</em> if an exception was raised by
|
||
the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsASCIIString" title="PyUnicode_AsASCIIString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsASCIIString()</span></code></a> or
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="character-map-codecs">
|
||
<h3>Character Map Codecs<a class="headerlink" href="#character-map-codecs" title="Permalink to this headline">¶</a></h3>
|
||
<p>This codec is special in that it can be used to implement many different codecs
|
||
(and this is in fact what was done to obtain most of the standard codecs
|
||
included in the <code class="xref py py-mod docutils literal notranslate"><span class="pre">encodings</span></code> package). The codec uses mapping to encode and
|
||
decode characters. The mapping objects provided must support the
|
||
<a class="reference internal" href="../reference/datamodel.html#object.__getitem__" title="object.__getitem__"><code class="xref py py-meth docutils literal notranslate"><span class="pre">__getitem__()</span></code></a> mapping interface; dictionaries and sequences work well.</p>
|
||
<p>These are the mapping codec APIs:</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeCharmap">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeCharmap</code><span class="sig-paren">(</span>const char<em> *data</em>, Py_ssize_t<em> size</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *mapping</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeCharmap" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the encoded string <em>s</em>
|
||
using the given <em>mapping</em> object. Return <em>NULL</em> if an exception was raised
|
||
by the codec.</p>
|
||
<p>If <em>mapping</em> is <em>NULL</em>, Latin-1 decoding will be applied. Else
|
||
<em>mapping</em> must map bytes ordinals (integers in the range from 0 to 255)
|
||
to Unicode strings, integers (which are then interpreted as Unicode
|
||
ordinals) or <code class="docutils literal notranslate"><span class="pre">None</span></code>. Unmapped data bytes – ones which cause a
|
||
<a class="reference internal" href="../library/exceptions.html#LookupError" title="LookupError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">LookupError</span></code></a>, as well as ones which get mapped to <code class="docutils literal notranslate"><span class="pre">None</span></code>,
|
||
<code class="docutils literal notranslate"><span class="pre">0xFFFE</span></code> or <code class="docutils literal notranslate"><span class="pre">'\ufffe'</span></code>, are treated as undefined mappings and cause
|
||
an error.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsCharmapString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsCharmapString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *mapping</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsCharmapString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using the given <em>mapping</em> object and return the
|
||
result as a bytes object. Error handling is “strict”. Return <em>NULL</em> if an
|
||
exception was raised by the codec.</p>
|
||
<p>The <em>mapping</em> object must map Unicode ordinal integers to bytes objects,
|
||
integers in the range from 0 to 255 or <code class="docutils literal notranslate"><span class="pre">None</span></code>. Unmapped character
|
||
ordinals (ones which cause a <a class="reference internal" href="../library/exceptions.html#LookupError" title="LookupError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">LookupError</span></code></a>) as well as mapped to
|
||
<code class="docutils literal notranslate"><span class="pre">None</span></code> are treated as “undefined mapping” and cause an error.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeCharmap">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeCharmap</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *mapping</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeCharmap" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> using the given
|
||
<em>mapping</em> object and return the result as a bytes object. Return <em>NULL</em> if
|
||
an exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsCharmapString" title="PyUnicode_AsCharmapString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsCharmapString()</span></code></a> or
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<p>The following codec API is special in that maps Unicode to Unicode.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Translate">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Translate</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *mapping</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Translate" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Translate a Unicode object using the given <em>mapping</em> object and return the
|
||
resulting Unicode object. Return <em>NULL</em> if an exception was raised by the
|
||
codec.</p>
|
||
<p>The <em>mapping</em> object must map Unicode ordinal integers to Unicode strings,
|
||
integers (which are then interpreted as Unicode ordinals) or <code class="docutils literal notranslate"><span class="pre">None</span></code>
|
||
(causing deletion of the character). Unmapped character ordinals (ones
|
||
which cause a <a class="reference internal" href="../library/exceptions.html#LookupError" title="LookupError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">LookupError</span></code></a>) are left untouched and are copied as-is.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_TranslateCharmap">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_TranslateCharmap</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *mapping</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_TranslateCharmap" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Translate a <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> by applying a
|
||
character <em>mapping</em> table to it and return the resulting Unicode object.
|
||
Return <em>NULL</em> when an exception was raised by the codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_Translate" title="PyUnicode_Translate"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_Translate()</span></code></a>. or <a class="reference internal" href="codec.html#codec-registry"><span class="std std-ref">generic codec based API</span></a></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="mbcs-codecs-for-windows">
|
||
<h3>MBCS codecs for Windows<a class="headerlink" href="#mbcs-codecs-for-windows" title="Permalink to this headline">¶</a></h3>
|
||
<p>These are the MBCS codec APIs. They are currently only available on Windows and
|
||
use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
|
||
DBCS) is a class of encodings, not just one. The target encoding is defined by
|
||
the user settings on the machine running the codec.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeMBCS">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeMBCS</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeMBCS" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Create a Unicode object by decoding <em>size</em> bytes of the MBCS encoded string <em>s</em>.
|
||
Return <em>NULL</em> if an exception was raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_DecodeMBCSStateful">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_DecodeMBCSStateful</code><span class="sig-paren">(</span>const char<em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em>, Py_ssize_t<em> *consumed</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_DecodeMBCSStateful" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>If <em>consumed</em> is <em>NULL</em>, behave like <a class="reference internal" href="#c.PyUnicode_DecodeMBCS" title="PyUnicode_DecodeMBCS"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeMBCS()</span></code></a>. If
|
||
<em>consumed</em> is not <em>NULL</em>, <a class="reference internal" href="#c.PyUnicode_DecodeMBCSStateful" title="PyUnicode_DecodeMBCSStateful"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_DecodeMBCSStateful()</span></code></a> will not decode
|
||
trailing lead byte and the number of bytes that have been decoded will be stored
|
||
in <em>consumed</em>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_AsMBCSString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_AsMBCSString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_AsMBCSString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode a Unicode object using MBCS and return the result as Python bytes
|
||
object. Error handling is “strict”. Return <em>NULL</em> if an exception was
|
||
raised by the codec.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeCodePage">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeCodePage</code><span class="sig-paren">(</span>int<em> code_page</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *unicode</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeCodePage" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the Unicode object using the specified code page and return a Python
|
||
bytes object. Return <em>NULL</em> if an exception was raised by the codec. Use
|
||
<code class="xref c c-data docutils literal notranslate"><span class="pre">CP_ACP</span></code> code page to get the MBCS encoder.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_EncodeMBCS">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_EncodeMBCS</code><span class="sig-paren">(</span>const <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE">Py_UNICODE</a><em> *s</em>, Py_ssize_t<em> size</em>, const char<em> *errors</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_EncodeMBCS" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Encode the <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> buffer of the given <em>size</em> using MBCS and return
|
||
a Python bytes object. Return <em>NULL</em> if an exception was raised by the
|
||
codec.</p>
|
||
<div class="deprecated">
|
||
<p><span class="versionmodified">Deprecated since version 3.3, will be removed in version 4.0: </span>Part of the old-style <a class="reference internal" href="#c.Py_UNICODE" title="Py_UNICODE"><code class="xref c c-type docutils literal notranslate"><span class="pre">Py_UNICODE</span></code></a> API; please migrate to using
|
||
<a class="reference internal" href="#c.PyUnicode_AsMBCSString" title="PyUnicode_AsMBCSString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsMBCSString()</span></code></a>, <a class="reference internal" href="#c.PyUnicode_EncodeCodePage" title="PyUnicode_EncodeCodePage"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_EncodeCodePage()</span></code></a> or
|
||
<a class="reference internal" href="#c.PyUnicode_AsEncodedString" title="PyUnicode_AsEncodedString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_AsEncodedString()</span></code></a>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
<div class="section" id="methods-slots">
|
||
<h3>Methods & Slots<a class="headerlink" href="#methods-slots" title="Permalink to this headline">¶</a></h3>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="methods-and-slot-functions">
|
||
<span id="unicodemethodsandslots"></span><h2>Methods and Slot Functions<a class="headerlink" href="#methods-and-slot-functions" title="Permalink to this headline">¶</a></h2>
|
||
<p>The following APIs are capable of handling Unicode objects and strings on input
|
||
(we refer to them as strings in the descriptions) and return Unicode objects or
|
||
integers as appropriate.</p>
|
||
<p>They all return <em>NULL</em> or <code class="docutils literal notranslate"><span class="pre">-1</span></code> if an exception occurs.</p>
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Concat">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Concat</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *left</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *right</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Concat" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Concat two strings giving a new Unicode string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Split">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Split</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *s</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *sep</em>, Py_ssize_t<em> maxsplit</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Split" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Split a string giving a list of Unicode strings. If <em>sep</em> is <em>NULL</em>, splitting
|
||
will be done at all whitespace substrings. Otherwise, splits occur at the given
|
||
separator. At most <em>maxsplit</em> splits will be done. If negative, no limit is
|
||
set. Separators are not included in the resulting list.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Splitlines">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Splitlines</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *s</em>, int<em> keepend</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Splitlines" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Split a Unicode string at line breaks, returning a list of Unicode strings.
|
||
CRLF is considered to be one line break. If <em>keepend</em> is <code class="docutils literal notranslate"><span class="pre">0</span></code>, the Line break
|
||
characters are not included in the resulting strings.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt>
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Translate</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *table</em>, const char<em> *errors</em><span class="sig-paren">)</span></dt>
|
||
<dd><p>Translate a string by applying a character mapping table to it and return the
|
||
resulting Unicode object.</p>
|
||
<p>The mapping table must map Unicode ordinal integers to Unicode ordinal integers
|
||
or <code class="docutils literal notranslate"><span class="pre">None</span></code> (causing deletion of the character).</p>
|
||
<p>Mapping tables need only provide the <a class="reference internal" href="../reference/datamodel.html#object.__getitem__" title="object.__getitem__"><code class="xref py py-meth docutils literal notranslate"><span class="pre">__getitem__()</span></code></a> interface; dictionaries
|
||
and sequences work well. Unmapped character ordinals (ones which cause a
|
||
<a class="reference internal" href="../library/exceptions.html#LookupError" title="LookupError"><code class="xref py py-exc docutils literal notranslate"><span class="pre">LookupError</span></code></a>) are left untouched and are copied as-is.</p>
|
||
<p><em>errors</em> has the usual meaning for codecs. It may be <em>NULL</em> which indicates to
|
||
use the default error handling.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Join">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Join</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *separator</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *seq</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Join" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Join a sequence of strings using the given <em>separator</em> and return the resulting
|
||
Unicode string.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Tailmatch">
|
||
Py_ssize_t <code class="descname">PyUnicode_Tailmatch</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *substr</em>, Py_ssize_t<em> start</em>, Py_ssize_t<em> end</em>, int<em> direction</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Tailmatch" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return <code class="docutils literal notranslate"><span class="pre">1</span></code> if <em>substr</em> matches <code class="docutils literal notranslate"><span class="pre">str[start:end]</span></code> at the given tail end
|
||
(<em>direction</em> == <code class="docutils literal notranslate"><span class="pre">-1</span></code> means to do a prefix match, <em>direction</em> == <code class="docutils literal notranslate"><span class="pre">1</span></code> a suffix match),
|
||
<code class="docutils literal notranslate"><span class="pre">0</span></code> otherwise. Return <code class="docutils literal notranslate"><span class="pre">-1</span></code> if an error occurred.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Find">
|
||
Py_ssize_t <code class="descname">PyUnicode_Find</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *substr</em>, Py_ssize_t<em> start</em>, Py_ssize_t<em> end</em>, int<em> direction</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Find" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the first position of <em>substr</em> in <code class="docutils literal notranslate"><span class="pre">str[start:end]</span></code> using the given
|
||
<em>direction</em> (<em>direction</em> == <code class="docutils literal notranslate"><span class="pre">1</span></code> means to do a forward search, <em>direction</em> == <code class="docutils literal notranslate"><span class="pre">-1</span></code> a
|
||
backward search). The return value is the index of the first match; a value of
|
||
<code class="docutils literal notranslate"><span class="pre">-1</span></code> indicates that no match was found, and <code class="docutils literal notranslate"><span class="pre">-2</span></code> indicates that an error
|
||
occurred and an exception has been set.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_FindChar">
|
||
Py_ssize_t <code class="descname">PyUnicode_FindChar</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, <a class="reference internal" href="#c.Py_UCS4" title="Py_UCS4">Py_UCS4</a><em> ch</em>, Py_ssize_t<em> start</em>, Py_ssize_t<em> end</em>, int<em> direction</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_FindChar" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the first position of the character <em>ch</em> in <code class="docutils literal notranslate"><span class="pre">str[start:end]</span></code> using
|
||
the given <em>direction</em> (<em>direction</em> == <code class="docutils literal notranslate"><span class="pre">1</span></code> means to do a forward search,
|
||
<em>direction</em> == <code class="docutils literal notranslate"><span class="pre">-1</span></code> a backward search). The return value is the index of the
|
||
first match; a value of <code class="docutils literal notranslate"><span class="pre">-1</span></code> indicates that no match was found, and <code class="docutils literal notranslate"><span class="pre">-2</span></code>
|
||
indicates that an error occurred and an exception has been set.</p>
|
||
<div class="versionadded">
|
||
<p><span class="versionmodified added">New in version 3.3.</span></p>
|
||
</div>
|
||
<div class="versionchanged">
|
||
<p><span class="versionmodified changed">Changed in version 3.7: </span><em>start</em> and <em>end</em> are now adjusted to behave like <code class="docutils literal notranslate"><span class="pre">str[start:end]</span></code>.</p>
|
||
</div>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Count">
|
||
Py_ssize_t <code class="descname">PyUnicode_Count</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *substr</em>, Py_ssize_t<em> start</em>, Py_ssize_t<em> end</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Count" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Return the number of non-overlapping occurrences of <em>substr</em> in
|
||
<code class="docutils literal notranslate"><span class="pre">str[start:end]</span></code>. Return <code class="docutils literal notranslate"><span class="pre">-1</span></code> if an error occurred.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Replace">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Replace</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *str</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *substr</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *replstr</em>, Py_ssize_t<em> maxcount</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Replace" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Replace at most <em>maxcount</em> occurrences of <em>substr</em> in <em>str</em> with <em>replstr</em> and
|
||
return the resulting Unicode object. <em>maxcount</em> == <code class="docutils literal notranslate"><span class="pre">-1</span></code> means replace all
|
||
occurrences.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Compare">
|
||
int <code class="descname">PyUnicode_Compare</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *left</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *right</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Compare" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Compare two strings and return <code class="docutils literal notranslate"><span class="pre">-1</span></code>, <code class="docutils literal notranslate"><span class="pre">0</span></code>, <code class="docutils literal notranslate"><span class="pre">1</span></code> for less than, equal, and greater than,
|
||
respectively.</p>
|
||
<p>This function returns <code class="docutils literal notranslate"><span class="pre">-1</span></code> upon failure, so one should call
|
||
<a class="reference internal" href="exceptions.html#c.PyErr_Occurred" title="PyErr_Occurred"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyErr_Occurred()</span></code></a> to check for errors.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_CompareWithASCIIString">
|
||
int <code class="descname">PyUnicode_CompareWithASCIIString</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *uni</em>, const char<em> *string</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_CompareWithASCIIString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Compare a Unicode object, <em>uni</em>, with <em>string</em> and return <code class="docutils literal notranslate"><span class="pre">-1</span></code>, <code class="docutils literal notranslate"><span class="pre">0</span></code>, <code class="docutils literal notranslate"><span class="pre">1</span></code> for less
|
||
than, equal, and greater than, respectively. It is best to pass only
|
||
ASCII-encoded strings, but the function interprets the input string as
|
||
ISO-8859-1 if it contains non-ASCII characters.</p>
|
||
<p>This function does not raise exceptions.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_RichCompare">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_RichCompare</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *left</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *right</em>, int<em> op</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_RichCompare" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Rich compare two Unicode strings and return one of the following:</p>
|
||
<ul class="simple">
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">NULL</span></code> in case an exception was raised</p></li>
|
||
<li><p><code class="xref py py-const docutils literal notranslate"><span class="pre">Py_True</span></code> or <code class="xref py py-const docutils literal notranslate"><span class="pre">Py_False</span></code> for successful comparisons</p></li>
|
||
<li><p><code class="xref py py-const docutils literal notranslate"><span class="pre">Py_NotImplemented</span></code> in case the type combination is unknown</p></li>
|
||
</ul>
|
||
<p>Possible values for <em>op</em> are <code class="xref py py-const docutils literal notranslate"><span class="pre">Py_GT</span></code>, <code class="xref py py-const docutils literal notranslate"><span class="pre">Py_GE</span></code>, <code class="xref py py-const docutils literal notranslate"><span class="pre">Py_EQ</span></code>,
|
||
<code class="xref py py-const docutils literal notranslate"><span class="pre">Py_NE</span></code>, <code class="xref py py-const docutils literal notranslate"><span class="pre">Py_LT</span></code>, and <code class="xref py py-const docutils literal notranslate"><span class="pre">Py_LE</span></code>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Format">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_Format</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *format</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *args</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Format" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>Return a new string object from <em>format</em> and <em>args</em>; this is analogous to
|
||
<code class="docutils literal notranslate"><span class="pre">format</span> <span class="pre">%</span> <span class="pre">args</span></code>.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_Contains">
|
||
int <code class="descname">PyUnicode_Contains</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *container</em>, <a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> *element</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_Contains" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Check whether <em>element</em> is contained in <em>container</em> and return true or false
|
||
accordingly.</p>
|
||
<p><em>element</em> has to coerce to a one element Unicode string. <code class="docutils literal notranslate"><span class="pre">-1</span></code> is returned
|
||
if there was an error.</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_InternInPlace">
|
||
void <code class="descname">PyUnicode_InternInPlace</code><span class="sig-paren">(</span><a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a><em> **string</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_InternInPlace" title="Permalink to this definition">¶</a></dt>
|
||
<dd><p>Intern the argument <em>*string</em> in place. The argument must be the address of a
|
||
pointer variable pointing to a Python Unicode string object. If there is an
|
||
existing interned string that is the same as <em>*string</em>, it sets <em>*string</em> to
|
||
it (decrementing the reference count of the old string object and incrementing
|
||
the reference count of the interned string object), otherwise it leaves
|
||
<em>*string</em> alone and interns it (incrementing its reference count).
|
||
(Clarification: even though there is a lot of talk about reference counts, think
|
||
of this function as reference-count-neutral; you own the object after the call
|
||
if and only if you owned it before the call.)</p>
|
||
</dd></dl>
|
||
|
||
<dl class="function">
|
||
<dt id="c.PyUnicode_InternFromString">
|
||
<a class="reference internal" href="structures.html#c.PyObject" title="PyObject">PyObject</a>* <code class="descname">PyUnicode_InternFromString</code><span class="sig-paren">(</span>const char<em> *v</em><span class="sig-paren">)</span><a class="headerlink" href="#c.PyUnicode_InternFromString" title="Permalink to this definition">¶</a></dt>
|
||
<dd><em class="refcount">Return value: New reference.</em><p>A combination of <a class="reference internal" href="#c.PyUnicode_FromString" title="PyUnicode_FromString"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_FromString()</span></code></a> and
|
||
<a class="reference internal" href="#c.PyUnicode_InternInPlace" title="PyUnicode_InternInPlace"><code class="xref c c-func docutils literal notranslate"><span class="pre">PyUnicode_InternInPlace()</span></code></a>, returning either a new Unicode string
|
||
object that has been interned, or a new (“owned”) reference to an earlier
|
||
interned string object with the same value.</p>
|
||
</dd></dl>
|
||
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
|
||
<div class="sphinxsidebarwrapper">
|
||
<h3><a href="../contents.html">Table of Contents</a></h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Unicode Objects and Codecs</a><ul>
|
||
<li><a class="reference internal" href="#unicode-objects">Unicode Objects</a><ul>
|
||
<li><a class="reference internal" href="#unicode-type">Unicode Type</a></li>
|
||
<li><a class="reference internal" href="#unicode-character-properties">Unicode Character Properties</a></li>
|
||
<li><a class="reference internal" href="#creating-and-accessing-unicode-strings">Creating and accessing Unicode strings</a></li>
|
||
<li><a class="reference internal" href="#deprecated-py-unicode-apis">Deprecated Py_UNICODE APIs</a></li>
|
||
<li><a class="reference internal" href="#locale-encoding">Locale Encoding</a></li>
|
||
<li><a class="reference internal" href="#file-system-encoding">File System Encoding</a></li>
|
||
<li><a class="reference internal" href="#wchar-t-support">wchar_t Support</a></li>
|
||
</ul>
|
||
</li>
|
||
<li><a class="reference internal" href="#built-in-codecs">Built-in Codecs</a><ul>
|
||
<li><a class="reference internal" href="#generic-codecs">Generic Codecs</a></li>
|
||
<li><a class="reference internal" href="#utf-8-codecs">UTF-8 Codecs</a></li>
|
||
<li><a class="reference internal" href="#utf-32-codecs">UTF-32 Codecs</a></li>
|
||
<li><a class="reference internal" href="#utf-16-codecs">UTF-16 Codecs</a></li>
|
||
<li><a class="reference internal" href="#utf-7-codecs">UTF-7 Codecs</a></li>
|
||
<li><a class="reference internal" href="#unicode-escape-codecs">Unicode-Escape Codecs</a></li>
|
||
<li><a class="reference internal" href="#raw-unicode-escape-codecs">Raw-Unicode-Escape Codecs</a></li>
|
||
<li><a class="reference internal" href="#latin-1-codecs">Latin-1 Codecs</a></li>
|
||
<li><a class="reference internal" href="#ascii-codecs">ASCII Codecs</a></li>
|
||
<li><a class="reference internal" href="#character-map-codecs">Character Map Codecs</a></li>
|
||
<li><a class="reference internal" href="#mbcs-codecs-for-windows">MBCS codecs for Windows</a></li>
|
||
<li><a class="reference internal" href="#methods-slots">Methods & Slots</a></li>
|
||
</ul>
|
||
</li>
|
||
<li><a class="reference internal" href="#methods-and-slot-functions">Methods and Slot Functions</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
<h4>Previous topic</h4>
|
||
<p class="topless"><a href="bytearray.html"
|
||
title="previous chapter">Byte Array Objects</a></p>
|
||
<h4>Next topic</h4>
|
||
<p class="topless"><a href="tuple.html"
|
||
title="next chapter">Tuple Objects</a></p>
|
||
<div role="note" aria-label="source link">
|
||
<h3>This Page</h3>
|
||
<ul class="this-page-menu">
|
||
<li><a href="../bugs.html">Report a Bug</a></li>
|
||
<li>
|
||
<a href="https://github.com/python/cpython/blob/3.7/Doc/c-api/unicode.rst"
|
||
rel="nofollow">Show Source
|
||
</a>
|
||
</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="clearer"></div>
|
||
</div>
|
||
<div class="related" role="navigation" aria-label="related navigation">
|
||
<h3>Navigation</h3>
|
||
<ul>
|
||
<li class="right" style="margin-right: 10px">
|
||
<a href="../genindex.html" title="General Index"
|
||
>index</a></li>
|
||
<li class="right" >
|
||
<a href="../py-modindex.html" title="Python Module Index"
|
||
>modules</a> |</li>
|
||
<li class="right" >
|
||
<a href="tuple.html" title="Tuple Objects"
|
||
>next</a> |</li>
|
||
<li class="right" >
|
||
<a href="bytearray.html" title="Byte Array Objects"
|
||
>previous</a> |</li>
|
||
<li><img src="../_static/py.png" alt=""
|
||
style="vertical-align: middle; margin-top: -1px"/></li>
|
||
<li><a href="https://www.python.org/">Python</a> »</li>
|
||
<li>
|
||
<span class="language_switcher_placeholder">en</span>
|
||
<span class="version_switcher_placeholder">3.7.4</span>
|
||
<a href="../index.html">Documentation </a> »
|
||
</li>
|
||
|
||
<li class="nav-item nav-item-1"><a href="index.html" >Python/C API Reference Manual</a> »</li>
|
||
<li class="nav-item nav-item-2"><a href="concrete.html" >Concrete Objects Layer</a> »</li>
|
||
<li class="right">
|
||
|
||
|
||
<div class="inline-search" style="display: none" role="search">
|
||
<form class="inline-search" action="../search.html" method="get">
|
||
<input placeholder="Quick search" type="text" name="q" />
|
||
<input type="submit" value="Go" />
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
</div>
|
||
<script type="text/javascript">$('.inline-search').show(0);</script>
|
||
|
|
||
</li>
|
||
|
||
</ul>
|
||
</div>
|
||
<div class="footer">
|
||
© <a href="../copyright.html">Copyright</a> 2001-2019, Python Software Foundation.
|
||
<br />
|
||
The Python Software Foundation is a non-profit corporation.
|
||
<a href="https://www.python.org/psf/donations/">Please donate.</a>
|
||
<br />
|
||
Last updated on Jul 13, 2019.
|
||
<a href="../bugs.html">Found a bug</a>?
|
||
<br />
|
||
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 2.0.1.
|
||
</div>
|
||
|
||
</body>
|
||
</html> |