Opening and reading large XML files using iterparse incremental parsing

suggest change

Sometimes we don’t want to load the entire XML file in order to get the information we need. In these instances, being able to incrementally load the relevant sections and then delete them when we are finished is useful. With the iterparse function you can edit the element tree that is stored while parsing the XML.

Import the ElementTree object:

import xml.etree.ElementTree as ET

Open the .xml file and iterate over all the elements:

for event, elem in ET.iterparse("yourXMLfile.xml"):
    ... do something ...

Alternatively, we can only look for specific events, such as start/end tags or namespaces. If this option is omitted (as above), only “end” events are returned:

events=("start", "end", "start-ns", "end-ns")
for event, elem in ET.iterparse("yourXMLfile.xml", events=events):
    ... do something ...

Here is the complete example showing how to clear elements from the in-memory tree when we are finished with them:

for event, elem in ET.iterparse("yourXMLfile.xml", events=("start","end")):        
    if elem.tag == "record_tag" and event == "end":
        print elem.text
        elem.clear()
    ... do something else ...

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:


Manipulating XML:
* Opening and reading large XML files using iterparse incremental parsing

Table Of Contents
2 Filter
3 List
7 Loops
22 Reduce
27 Classes
29 Manipulating XML
31 Set
42 Tuple
45 Enum
62 Sockets
89 urllib
92 Idioms
104 Stack
105 Profiling
109 Logging
111 os module
118 Mixins
120 ArcPy
126 Arrays
132 2to3 tool
135 Unicode
138 Neo4j
140 Curses
141 Templates
145 heapq
146 tkinter
154 Audio
155 pyglet
157 ijson
160 Flask
161 Groupby
163 pygame
165 hashlib
166 Gzip
167 ctypes
185 pyaudio
186 shelve