<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <title>xml.dom.pulldom — Support for building partial DOM trees — Python 3.7.4 documentation</title> <link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" /> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <script type="text/javascript" src="../_static/language_data.js"></script> <script type="text/javascript" src="../_static/sidebar.js"></script> <link rel="search" type="application/opensearchdescription+xml" title="Search within Python 3.7.4 documentation" href="../_static/opensearch.xml"/> <link rel="author" title="About these documents" href="../about.html" /> <link rel="index" title="Index" href="../genindex.html" /> <link rel="search" title="Search" href="../search.html" /> <link rel="copyright" title="Copyright" href="../copyright.html" /> <link rel="next" title="xml.sax — Support for SAX2 parsers" href="xml.sax.html" /> <link rel="prev" title="xml.dom.minidom — Minimal DOM implementation" href="xml.dom.minidom.html" /> <link rel="shortcut icon" type="image/png" href="../_static/py.png" /> <link rel="canonical" href="https://docs.python.org/3/library/xml.dom.pulldom.html" /> <script type="text/javascript" src="../_static/copybutton.js"></script> <script type="text/javascript" src="../_static/switchers.js"></script> <style> @media only screen { table.full-width-table { width: 100%; } } </style> </head><body> <div class="related" role="navigation" aria-label="related navigation"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" accesskey="I">index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="xml.sax.html" title="xml.sax — Support for SAX2 parsers" accesskey="N">next</a> |</li> <li class="right" > <a href="xml.dom.minidom.html" title="xml.dom.minidom — Minimal DOM implementation" accesskey="P">previous</a> |</li> <li><img src="../_static/py.png" alt="" style="vertical-align: middle; margin-top: -1px"/></li> <li><a href="https://www.python.org/">Python</a> »</li> <li> <span class="language_switcher_placeholder">en</span> <span class="version_switcher_placeholder">3.7.4</span> <a href="../index.html">Documentation </a> » </li> <li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li> <li class="nav-item nav-item-2"><a href="markup.html" accesskey="U">Structured Markup Processing Tools</a> »</li> <li class="right"> <div class="inline-search" style="display: none" role="search"> <form class="inline-search" action="../search.html" method="get"> <input placeholder="Quick search" type="text" name="q" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> <script type="text/javascript">$('.inline-search').show(0);</script> | </li> </ul> </div> <div class="document"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body" role="main"> <div class="section" id="module-xml.dom.pulldom"> <span id="xml-dom-pulldom-support-for-building-partial-dom-trees"></span><h1><a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> — Support for building partial DOM trees<a class="headerlink" href="#module-xml.dom.pulldom" title="Permalink to this headline">¶</a></h1> <p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.7/Lib/xml/dom/pulldom.py">Lib/xml/dom/pulldom.py</a></p> <hr class="docutils" /> <p>The <a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> module provides a “pull parser” which can also be asked to produce DOM-accessible fragments of the document where necessary. The basic concept involves pulling “events” from a stream of incoming XML and processing them. In contrast to SAX which also employs an event-driven processing model together with callbacks, the user of a pull parser is responsible for explicitly pulling events from the stream, looping over those events until either processing is finished or an error condition occurs.</p> <div class="admonition warning"> <p class="admonition-title">Warning</p> <p>The <a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see <a class="reference internal" href="xml.html#xml-vulnerabilities"><span class="std std-ref">XML vulnerabilities</span></a>.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.7.1: </span>The SAX parser no longer processes general external entities by default to increase security by default. To enable processing of external entities, pass a custom parser instance in:</p> <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">xml.dom.pulldom</span> <span class="k">import</span> <span class="n">parse</span> <span class="kn">from</span> <span class="nn">xml.sax</span> <span class="k">import</span> <span class="n">make_parser</span> <span class="kn">from</span> <span class="nn">xml.sax.handler</span> <span class="k">import</span> <span class="n">feature_external_ges</span> <span class="n">parser</span> <span class="o">=</span> <span class="n">make_parser</span><span class="p">()</span> <span class="n">parser</span><span class="o">.</span><span class="n">setFeature</span><span class="p">(</span><span class="n">feature_external_ges</span><span class="p">,</span> <span class="kc">True</span><span class="p">)</span> <span class="n">parse</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">parser</span><span class="o">=</span><span class="n">parser</span><span class="p">)</span> </pre></div> </div> </div> <p>Example:</p> <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">xml.dom</span> <span class="k">import</span> <span class="n">pulldom</span> <span class="n">doc</span> <span class="o">=</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="s1">'sales_items.xml'</span><span class="p">)</span> <span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">:</span> <span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">START_ELEMENT</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">tagName</span> <span class="o">==</span> <span class="s1">'item'</span><span class="p">:</span> <span class="k">if</span> <span class="nb">int</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">getAttribute</span><span class="p">(</span><span class="s1">'price'</span><span class="p">))</span> <span class="o">></span> <span class="mi">50</span><span class="p">:</span> <span class="n">doc</span><span class="o">.</span><span class="n">expandNode</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">toxml</span><span class="p">())</span> </pre></div> </div> <p><code class="docutils literal notranslate"><span class="pre">event</span></code> is a constant and can be one of:</p> <ul class="simple"> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">START_ELEMENT</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">END_ELEMENT</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">COMMENT</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">START_DOCUMENT</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">END_DOCUMENT</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">CHARACTERS</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">PROCESSING_INSTRUCTION</span></code></p></li> <li><p><code class="xref py py-data docutils literal notranslate"><span class="pre">IGNORABLE_WHITESPACE</span></code></p></li> </ul> <p><code class="docutils literal notranslate"><span class="pre">node</span></code> is an object of type <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Document</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Element</span></code> or <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Text</span></code>.</p> <p>Since the document is treated as a “flat” stream of events, the document “tree” is implicitly traversed and the desired elements are found regardless of their depth in the tree. In other words, one does not need to consider hierarchical issues such as recursive searching of the document nodes, although if the context of elements were important, one would either need to maintain some context-related state (i.e. remembering where one is in the document at any given point) or to make use of the <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="xml.dom.pulldom.DOMEventStream.expandNode"><code class="xref py py-func docutils literal notranslate"><span class="pre">DOMEventStream.expandNode()</span></code></a> method and switch to DOM-related processing.</p> <dl class="class"> <dt id="xml.dom.pulldom.PullDom"> <em class="property">class </em><code class="descclassname">xml.dom.pulldom.</code><code class="descname">PullDom</code><span class="sig-paren">(</span><em>documentFactory=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.PullDom" title="Permalink to this definition">¶</a></dt> <dd><p>Subclass of <a class="reference internal" href="xml.sax.handler.html#xml.sax.handler.ContentHandler" title="xml.sax.handler.ContentHandler"><code class="xref py py-class docutils literal notranslate"><span class="pre">xml.sax.handler.ContentHandler</span></code></a>.</p> </dd></dl> <dl class="class"> <dt id="xml.dom.pulldom.SAX2DOM"> <em class="property">class </em><code class="descclassname">xml.dom.pulldom.</code><code class="descname">SAX2DOM</code><span class="sig-paren">(</span><em>documentFactory=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.SAX2DOM" title="Permalink to this definition">¶</a></dt> <dd><p>Subclass of <a class="reference internal" href="xml.sax.handler.html#xml.sax.handler.ContentHandler" title="xml.sax.handler.ContentHandler"><code class="xref py py-class docutils literal notranslate"><span class="pre">xml.sax.handler.ContentHandler</span></code></a>.</p> </dd></dl> <dl class="function"> <dt id="xml.dom.pulldom.parse"> <code class="descclassname">xml.dom.pulldom.</code><code class="descname">parse</code><span class="sig-paren">(</span><em>stream_or_string</em>, <em>parser=None</em>, <em>bufsize=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.parse" title="Permalink to this definition">¶</a></dt> <dd><p>Return a <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream" title="xml.dom.pulldom.DOMEventStream"><code class="xref py py-class docutils literal notranslate"><span class="pre">DOMEventStream</span></code></a> from the given input. <em>stream_or_string</em> may be either a file name, or a file-like object. <em>parser</em>, if given, must be an <a class="reference internal" href="xml.sax.reader.html#xml.sax.xmlreader.XMLReader" title="xml.sax.xmlreader.XMLReader"><code class="xref py py-class docutils literal notranslate"><span class="pre">XMLReader</span></code></a> object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.</p> </dd></dl> <p>If you have XML in a string, you can use the <a class="reference internal" href="#xml.dom.pulldom.parseString" title="xml.dom.pulldom.parseString"><code class="xref py py-func docutils literal notranslate"><span class="pre">parseString()</span></code></a> function instead:</p> <dl class="function"> <dt id="xml.dom.pulldom.parseString"> <code class="descclassname">xml.dom.pulldom.</code><code class="descname">parseString</code><span class="sig-paren">(</span><em>string</em>, <em>parser=None</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.parseString" title="Permalink to this definition">¶</a></dt> <dd><p>Return a <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream" title="xml.dom.pulldom.DOMEventStream"><code class="xref py py-class docutils literal notranslate"><span class="pre">DOMEventStream</span></code></a> that represents the (Unicode) <em>string</em>.</p> </dd></dl> <dl class="data"> <dt id="xml.dom.pulldom.default_bufsize"> <code class="descclassname">xml.dom.pulldom.</code><code class="descname">default_bufsize</code><a class="headerlink" href="#xml.dom.pulldom.default_bufsize" title="Permalink to this definition">¶</a></dt> <dd><p>Default value for the <em>bufsize</em> parameter to <a class="reference internal" href="#xml.dom.pulldom.parse" title="xml.dom.pulldom.parse"><code class="xref py py-func docutils literal notranslate"><span class="pre">parse()</span></code></a>.</p> <p>The value of this variable can be changed before calling <a class="reference internal" href="#xml.dom.pulldom.parse" title="xml.dom.pulldom.parse"><code class="xref py py-func docutils literal notranslate"><span class="pre">parse()</span></code></a> and the new value will take effect.</p> </dd></dl> <div class="section" id="domeventstream-objects"> <span id="id1"></span><h2>DOMEventStream Objects<a class="headerlink" href="#domeventstream-objects" title="Permalink to this headline">¶</a></h2> <dl class="class"> <dt id="xml.dom.pulldom.DOMEventStream"> <em class="property">class </em><code class="descclassname">xml.dom.pulldom.</code><code class="descname">DOMEventStream</code><span class="sig-paren">(</span><em>stream</em>, <em>parser</em>, <em>bufsize</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream" title="Permalink to this definition">¶</a></dt> <dd><dl class="method"> <dt id="xml.dom.pulldom.DOMEventStream.getEvent"> <code class="descname">getEvent</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream.getEvent" title="Permalink to this definition">¶</a></dt> <dd><p>Return a tuple containing <em>event</em> and the current <em>node</em> as <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Document</span></code> if event equals <code class="xref py py-data docutils literal notranslate"><span class="pre">START_DOCUMENT</span></code>, <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Element</span></code> if event equals <code class="xref py py-data docutils literal notranslate"><span class="pre">START_ELEMENT</span></code> or <code class="xref py py-data docutils literal notranslate"><span class="pre">END_ELEMENT</span></code> or <code class="xref py py-class docutils literal notranslate"><span class="pre">xml.dom.minidom.Text</span></code> if event equals <code class="xref py py-data docutils literal notranslate"><span class="pre">CHARACTERS</span></code>. The current node does not contain information about its children, unless <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="xml.dom.pulldom.DOMEventStream.expandNode"><code class="xref py py-func docutils literal notranslate"><span class="pre">expandNode()</span></code></a> is called.</p> </dd></dl> <dl class="method"> <dt id="xml.dom.pulldom.DOMEventStream.expandNode"> <code class="descname">expandNode</code><span class="sig-paren">(</span><em>node</em><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="Permalink to this definition">¶</a></dt> <dd><p>Expands all children of <em>node</em> into <em>node</em>. Example:</p> <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">xml.dom</span> <span class="k">import</span> <span class="n">pulldom</span> <span class="n">xml</span> <span class="o">=</span> <span class="s1">'<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'</span> <span class="n">doc</span> <span class="o">=</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">parseString</span><span class="p">(</span><span class="n">xml</span><span class="p">)</span> <span class="k">for</span> <span class="n">event</span><span class="p">,</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">:</span> <span class="k">if</span> <span class="n">event</span> <span class="o">==</span> <span class="n">pulldom</span><span class="o">.</span><span class="n">START_ELEMENT</span> <span class="ow">and</span> <span class="n">node</span><span class="o">.</span><span class="n">tagName</span> <span class="o">==</span> <span class="s1">'p'</span><span class="p">:</span> <span class="c1"># Following statement only prints '<p/>'</span> <span class="nb">print</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">toxml</span><span class="p">())</span> <span class="n">doc</span><span class="o">.</span><span class="n">expandNode</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="c1"># Following statement prints node with all its children '<p>Some text <div>and more</div></p>'</span> <span class="nb">print</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">toxml</span><span class="p">())</span> </pre></div> </div> </dd></dl> <dl class="method"> <dt id="xml.dom.pulldom.DOMEventStream.reset"> <code class="descname">reset</code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#xml.dom.pulldom.DOMEventStream.reset" title="Permalink to this definition">¶</a></dt> <dd></dd></dl> </dd></dl> </div> </div> </div> </div> </div> <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> <div class="sphinxsidebarwrapper"> <h3><a href="../contents.html">Table of Contents</a></h3> <ul> <li><a class="reference internal" href="#"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code> — Support for building partial DOM trees</a><ul> <li><a class="reference internal" href="#domeventstream-objects">DOMEventStream Objects</a></li> </ul> </li> </ul> <h4>Previous topic</h4> <p class="topless"><a href="xml.dom.minidom.html" title="previous chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.minidom</span></code> — Minimal DOM implementation</a></p> <h4>Next topic</h4> <p class="topless"><a href="xml.sax.html" title="next chapter"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.sax</span></code> — Support for SAX2 parsers</a></p> <div role="note" aria-label="source link"> <h3>This Page</h3> <ul class="this-page-menu"> <li><a href="../bugs.html">Report a Bug</a></li> <li> <a href="https://github.com/python/cpython/blob/3.7/Doc/library/xml.dom.pulldom.rst" rel="nofollow">Show Source </a> </li> </ul> </div> </div> </div> <div class="clearer"></div> </div> <div class="related" role="navigation" aria-label="related navigation"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="../genindex.html" title="General Index" >index</a></li> <li class="right" > <a href="../py-modindex.html" title="Python Module Index" >modules</a> |</li> <li class="right" > <a href="xml.sax.html" title="xml.sax — Support for SAX2 parsers" >next</a> |</li> <li class="right" > <a href="xml.dom.minidom.html" title="xml.dom.minidom — Minimal DOM implementation" >previous</a> |</li> <li><img src="../_static/py.png" alt="" style="vertical-align: middle; margin-top: -1px"/></li> <li><a href="https://www.python.org/">Python</a> »</li> <li> <span class="language_switcher_placeholder">en</span> <span class="version_switcher_placeholder">3.7.4</span> <a href="../index.html">Documentation </a> » </li> <li class="nav-item nav-item-1"><a href="index.html" >The Python Standard Library</a> »</li> <li class="nav-item nav-item-2"><a href="markup.html" >Structured Markup Processing Tools</a> »</li> <li class="right"> <div class="inline-search" style="display: none" role="search"> <form class="inline-search" action="../search.html" method="get"> <input placeholder="Quick search" type="text" name="q" /> <input type="submit" value="Go" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> <script type="text/javascript">$('.inline-search').show(0);</script> | </li> </ul> </div> <div class="footer"> © <a href="../copyright.html">Copyright</a> 2001-2019, Python Software Foundation. <br /> The Python Software Foundation is a non-profit corporation. <a href="https://www.python.org/psf/donations/">Please donate.</a> <br /> Last updated on Jul 13, 2019. <a href="../bugs.html">Found a bug</a>? <br /> Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 2.0.1. </div> </body> </html>