python-project/python-3.7.4-docs-html/library/xml.dom.pulldom.html

334 lines
24 KiB
HTML
Raw Normal View History

2019-07-15 11:16:41 -05:00
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>xml.dom.pulldom — Support for building partial DOM trees &#8212; Python 3.7.4 documentation</title>
<link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/language_data.js"></script>
<script type="text/javascript" src="../_static/sidebar.js"></script>
<link rel="search" type="application/opensearchdescription+xml"
title="Search within Python 3.7.4 documentation"
href="../_static/opensearch.xml"/>
<link rel="author" title="About these documents" href="../about.html" />
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="copyright" title="Copyright" href="../copyright.html" />
<link rel="next" title="xml.sax — Support for SAX2 parsers" href="xml.sax.html" />
<link rel="prev" title="xml.dom.minidom — Minimal DOM implementation" href="xml.dom.minidom.html" />
<link rel="shortcut icon" type="image/png" href="../_static/py.png" />
<link rel="canonical" href="https://docs.python.org/3/library/xml.dom.pulldom.html" />
<script type="text/javascript" src="../_static/copybutton.js"></script>
<script type="text/javascript" src="../_static/switchers.js"></script>
<style>
@media only screen {
table.full-width-table {
width: 100%;
}
}
</style>
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="../py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="xml.sax.html" title="xml.sax — Support for SAX2 parsers"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="xml.dom.minidom.html" title="xml.dom.minidom — Minimal DOM implementation"
accesskey="P">previous</a> |</li>
<li><img src="../_static/py.png" alt=""
style="vertical-align: middle; margin-top: -1px"/></li>
<li><a href="https://www.python.org/">Python</a> &#187;</li>
<li>
<span class="language_switcher_placeholder">en</span>
<span class="version_switcher_placeholder">3.7.4</span>
<a href="../index.html">Documentation </a> &#187;
</li>
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="markup.html" accesskey="U">Structured Markup Processing Tools</a> &#187;</li>
<li class="right">
<div class="inline-search" style="display: none" role="search">
<form class="inline-search" action="../search.html" method="get">
<input placeholder="Quick search" type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('.inline-search').show(0);</script>
|
</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="module-xml.dom.pulldom">
<span id="xml-dom-pulldom-support-for-building-partial-dom-trees"></span><h1><a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> — Support for building partial DOM trees<a class="headerlink" href="#module-xml.dom.pulldom" title="Permalink to this headline"></a></h1>
<p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.7/Lib/xml/dom/pulldom.py">Lib/xml/dom/pulldom.py</a></p>
<hr class="docutils" />
<p>The <a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> module provides a “pull parser” which can also be
asked to produce DOM-accessible fragments of the document where necessary. The
basic concept involves pulling “events” from a stream of incoming XML and
processing them. In contrast to SAX which also employs an event-driven
processing model together with callbacks, the user of a pull parser is
responsible for explicitly pulling events from the stream, looping over those
events until either processing is finished or an error condition occurs.</p>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>The <a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> module is not secure against
maliciously constructed data. If you need to parse untrusted or
unauthenticated data see <a class="reference internal" href="xml.html#xml-vulnerabilities"><span class="std std-ref">XML vulnerabilities</span></a>.</p>
</div>
<div class="versionchanged">
<p><span class="versionmodified changed">Changed in version 3.7.1: </span>The SAX parser no longer processes general external entities by default to
increase security by default. To enable processing of external entities,
pass a custom parser instance in:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">xml.dom.pulldom</span> <span class="k">import</span> <span class="n">parse</span>
<span class="kn">from</span> <span class="nn">xml.sax</span> <span class="k">import</span> <span class="n">make_parser</span>
<span class="kn">from</span> <span class="nn">xml.sax.handler</span> <span class="k">import</span> <span class="n">feature_external_ges</span>
<span class="n">parser</span> <span class="o">=</span> <span class="n">make_parser</span><span class="p">()</span>
<span class="n">parser</span><span class="o">.</span><span class="n">setFeature</span><span class="p">(</span><span class="n">feature_external_ges</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span>
<span class="n">parse</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">parser</span><span class="o">=</span><span class="n">parser</span><span class="p">)</span>
</pre></div>
</div>
</div>
<p>Example:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">xml.dom</span> <span class="k">import</span> <span class="n">pulldom</span>
<span class="n">doc</span> <span class="o">=</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s1">&#39;sales_items.xml&#39;</span><span class="p">)</span>
<span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">:</span>
<span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">START_ELEMENT</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">tagName</span> <span class="o">==</span> <span class="s1">&#39;item&#39;</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">int</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">getAttribute</span><span class="p">(</span><span class="s1">&#39;price&#39;</span><span class="p">))</span> <span class="o">&gt;</span> <span class="mi">50</span><span class="p">:</span>
<span class="n">doc</span><span class="o">.</span><span class="n">expandNode</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">toxml</span><span class="p">())</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">event</span></code> is a constant and can be one of:</p>
<ul class="simple">
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">START_ELEMENT</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">END_ELEMENT</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">COMMENT</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">START_DOCUMENT</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">END_DOCUMENT</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">CHARACTERS</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">PROCESSING_INSTRUCTION</span></code></p></li>
<li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">IGNORABLE_WHITESPACE</span></code></p></li>
</ul>
<p><code class="docutils literal notranslate"><span class="pre">node</span></code> is an object of type <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Document</span></code>,
<code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Element</span></code> or <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Text</span></code>.</p>
<p>Since the document is treated as a “flat” stream of events, the document “tree”
is implicitly traversed and the desired elements are found regardless of their
depth in the tree. In other words, one does not need to consider hierarchical
issues such as recursive searching of the document nodes, although if the
context of elements were important, one would either need to maintain some
context-related state (i.e. remembering where one is in the document at any
given point) or to make use of the <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="xml.dom.pulldom.DOMEventStream.expandNode"><code class="xref py py-func docutils literal notranslate"><span class="pre">DOMEventStream.expandNode()</span></code></a> method
and switch to DOM-related processing.</p>
<dl class="class">
<dt id="xml.dom.pulldom.PullDom">
<em class="property">class </em><code class="descclassname">xml.dom.pulldom.</code><code class="descname">PullDom</code><span class="sig-paren">(</span><em>documentFactory=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.PullDom" title="Permalink to this definition"></a></dt>
<dd><p>Subclass of <a class="reference internal" href="xml.sax.handler.html#xml.sax.handler.ContentHandler" title="xml.sax.handler.ContentHandler"><code class="xref py py-class docutils literal notranslate"><span class="pre">xml.sax.handler.ContentHandler</span></code></a>.</p>
</dd></dl>
<dl class="class">
<dt id="xml.dom.pulldom.SAX2DOM">
<em class="property">class </em><code class="descclassname">xml.dom.pulldom.</code><code class="descname">SAX2DOM</code><span class="sig-paren">(</span><em>documentFactory=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.SAX2DOM" title="Permalink to this definition"></a></dt>
<dd><p>Subclass of <a class="reference internal" href="xml.sax.handler.html#xml.sax.handler.ContentHandler" title="xml.sax.handler.ContentHandler"><code class="xref py py-class docutils literal notranslate"><span class="pre">xml.sax.handler.ContentHandler</span></code></a>.</p>
</dd></dl>
<dl class="function">
<dt id="xml.dom.pulldom.parse">
<code class="descclassname">xml.dom.pulldom.</code><code class="descname">parse</code><span class="sig-paren">(</span><em>stream_or_string</em>, <em>parser=None</em>, <em>bufsize=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.parse" title="Permalink to this definition"></a></dt>
<dd><p>Return a <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream" title="xml.dom.pulldom.DOMEventStream"><code class="xref py py-class docutils literal notranslate"><span class="pre">DOMEventStream</span></code></a> from the given input. <em>stream_or_string</em> may be
either a file name, or a file-like object. <em>parser</em>, if given, must be an
<a class="reference internal" href="xml.sax.reader.html#xml.sax.xmlreader.XMLReader" title="xml.sax.xmlreader.XMLReader"><code class="xref py py-class docutils literal notranslate"><span class="pre">XMLReader</span></code></a> object. This function will change the
document handler of the
parser and activate namespace support; other parser configuration (like
setting an entity resolver) must have been done in advance.</p>
</dd></dl>
<p>If you have XML in a string, you can use the <a class="reference internal" href="#xml.dom.pulldom.parseString" title="xml.dom.pulldom.parseString"><code class="xref py py-func docutils literal notranslate"><span class="pre">parseString()</span></code></a> function instead:</p>
<dl class="function">
<dt id="xml.dom.pulldom.parseString">
<code class="descclassname">xml.dom.pulldom.</code><code class="descname">parseString</code><span class="sig-paren">(</span><em>string</em>, <em>parser=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.parseString" title="Permalink to this definition"></a></dt>
<dd><p>Return a <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream" title="xml.dom.pulldom.DOMEventStream"><code class="xref py py-class docutils literal notranslate"><span class="pre">DOMEventStream</span></code></a> that represents the (Unicode) <em>string</em>.</p>
</dd></dl>
<dl class="data">
<dt id="xml.dom.pulldom.default_bufsize">
<code class="descclassname">xml.dom.pulldom.</code><code class="descname">default_bufsize</code><a class="headerlink" href="#xml.dom.pulldom.default_bufsize" title="Permalink to this definition"></a></dt>
<dd><p>Default value for the <em>bufsize</em> parameter to <a class="reference internal" href="#xml.dom.pulldom.parse" title="xml.dom.pulldom.parse"><code class="xref py py-func docutils literal notranslate"><span class="pre">parse()</span></code></a>.</p>
<p>The value of this variable can be changed before calling <a class="reference internal" href="#xml.dom.pulldom.parse" title="xml.dom.pulldom.parse"><code class="xref py py-func docutils literal notranslate"><span class="pre">parse()</span></code></a> and
the new value will take effect.</p>
</dd></dl>
<div class="section" id="domeventstream-objects">
<span id="id1"></span><h2>DOMEventStream Objects<a class="headerlink" href="#domeventstream-objects" title="Permalink to this headline"></a></h2>
<dl class="class">
<dt id="xml.dom.pulldom.DOMEventStream">
<em class="property">class </em><code class="descclassname">xml.dom.pulldom.</code><code class="descname">DOMEventStream</code><span class="sig-paren">(</span><em>stream</em>, <em>parser</em>, <em>bufsize</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream" title="Permalink to this definition"></a></dt>
<dd><dl class="method">
<dt id="xml.dom.pulldom.DOMEventStream.getEvent">
<code class="descname">getEvent</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream.getEvent" title="Permalink to this definition"></a></dt>
<dd><p>Return a tuple containing <em>event</em> and the current <em>node</em> as
<code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Document</span></code> if event equals <code class="xref py py-data docutils literal notranslate"><span class="pre">START_DOCUMENT</span></code>,
<code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Element</span></code> if event equals <code class="xref py py-data docutils literal notranslate"><span class="pre">START_ELEMENT</span></code> or
<code class="xref py py-data docutils literal notranslate"><span class="pre">END_ELEMENT</span></code> or <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Text</span></code> if event equals
<code class="xref py py-data docutils literal notranslate"><span class="pre">CHARACTERS</span></code>.
The current node does not contain information about its children, unless
<a class="reference internal" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="xml.dom.pulldom.DOMEventStream.expandNode"><code class="xref py py-func docutils literal notranslate"><span class="pre">expandNode()</span></code></a> is called.</p>
</dd></dl>
<dl class="method">
<dt id="xml.dom.pulldom.DOMEventStream.expandNode">
<code class="descname">expandNode</code><span class="sig-paren">(</span><em>node</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="Permalink to this definition"></a></dt>
<dd><p>Expands all children of <em>node</em> into <em>node</em>. Example:</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">xml.dom</span> <span class="k">import</span> <span class="n">pulldom</span>
<span class="n">xml</span> <span class="o">=</span> <span class="s1">&#39;&lt;html&gt;&lt;title&gt;Foo&lt;/title&gt; &lt;p&gt;Some text &lt;div&gt;and more&lt;/div&gt;&lt;/p&gt; &lt;/html&gt;&#39;</span>
<span class="n">doc</span> <span class="o">=</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">parseString</span><span class="p">(</span><span class="n">xml</span><span class="p">)</span>
<span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">:</span>
<span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">START_ELEMENT</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">tagName</span> <span class="o">==</span> <span class="s1">&#39;p&#39;</span><span class="p">:</span>
<span class="c1"># Following statement only prints &#39;&lt;p/&gt;&#39;</span>
<span class="nb">print</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">toxml</span><span class="p">())</span>
<span class="n">doc</span><span class="o">.</span><span class="n">expandNode</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
<span class="c1"># Following statement prints node with all its children &#39;&lt;p&gt;Some text &lt;div&gt;and more&lt;/div&gt;&lt;/p&gt;&#39;</span>
<span class="nb">print</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">toxml</span><span class="p">())</span>
</pre></div>
</div>
</dd></dl>
<dl class="method">
<dt id="xml.dom.pulldom.DOMEventStream.reset">
<code class="descname">reset</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream.reset" title="Permalink to this definition"></a></dt>
<dd></dd></dl>
</dd></dl>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="../contents.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code> — Support for building partial DOM trees</a><ul>
<li><a class="reference internal" href="#domeventstream-objects">DOMEventStream Objects</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="xml.dom.minidom.html"
title="previous chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.minidom</span></code> — Minimal DOM implementation</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="xml.sax.html"
title="next chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.sax</span></code> — Support for SAX2 parsers</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="../bugs.html">Report a Bug</a></li>
<li>
<a href="https://github.com/python/cpython/blob/3.7/Doc/library/xml.dom.pulldom.rst"
rel="nofollow">Show Source
</a>
</li>
</ul>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="../py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="xml.sax.html" title="xml.sax — Support for SAX2 parsers"
>next</a> |</li>
<li class="right" >
<a href="xml.dom.minidom.html" title="xml.dom.minidom — Minimal DOM implementation"
>previous</a> |</li>
<li><img src="../_static/py.png" alt=""
style="vertical-align: middle; margin-top: -1px"/></li>
<li><a href="https://www.python.org/">Python</a> &#187;</li>
<li>
<span class="language_switcher_placeholder">en</span>
<span class="version_switcher_placeholder">3.7.4</span>
<a href="../index.html">Documentation </a> &#187;
</li>
<li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> &#187;</li>
<li class="nav-item nav-item-2"><a href="markup.html" >Structured Markup Processing Tools</a> &#187;</li>
<li class="right">
<div class="inline-search" style="display: none" role="search">
<form class="inline-search" action="../search.html" method="get">
<input placeholder="Quick search" type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('.inline-search').show(0);</script>
|
</li>
</ul>
</div>
<div class="footer">
&copy; <a href="../copyright.html">Copyright</a> 2001-2019, Python Software Foundation.
<br />
The Python Software Foundation is a non-profit corporation.
<a href="https://www.python.org/psf/donations/">Please donate.</a>
<br />
Last updated on Jul 13, 2019.
<a href="../bugs.html">Found a bug</a>?
<br />
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 2.0.1.
</div>
</body>
</html>