https://github.com/dcwatson/drill
A small python library for quickly traversing XML data.
https://github.com/dcwatson/drill
Last synced: about 1 year ago
JSON representation
A small python library for quickly traversing XML data.
- Host: GitHub
- URL: https://github.com/dcwatson/drill
- Owner: dcwatson
- License: bsd-2-clause
- Created: 2012-07-03T15:30:09.000Z (almost 14 years ago)
- Default Branch: master
- Last Pushed: 2018-12-03T00:30:54.000Z (over 7 years ago)
- Last Synced: 2025-03-21T19:04:56.947Z (about 1 year ago)
- Language: Python
- Size: 40 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Basic Usage
import drill
doc = drill.parse(path_or_url_or_string)
# Drill down to a specific element.
print unicode(doc.book.title)
# Iterate through all "title" tags in the document.
for t in doc.iter('title'):
print t.attrs, t.data
# Find all "bar" nodes with a "baz" child and a "foo" parent.
q = doc.find('//foo/bar[baz]')
# Easily access the first and last elements of matching results.
print q.first(), q.last()
# Iterate over results.
for e in q:
do_something(e)
# Parse only elements matching some path
for e in drill.iterparse(filelike, xpath='root/*/something'):
print e.tagname, e.data
## Features
* Runnable test suite
* Python 3 support
## Advantages
* Pure python
* Faster, more efficient parsing than ElementTree
* Using ElementTree, a ~150 MB XML file (~3,000,000 elements) took ~46 seconds to parse, consuming ~1.3 GB of RAM
* Parsing the same file using drill took ~24 seconds and consumed ~950 MB of RAM
* Very unscientific benchmarks performed on a Core i5 @ 2.8 GHz, running Windows 7. YMMV.
* Lots of convenience methods for accessing elements quickly:
* doc.response.resultCode.data
* root[2].child['attr']
* first/last/prev/next methods for traversing siblings