https://github.com/dcwatson/drill

A small python library for quickly traversing XML data.
https://github.com/dcwatson/drill

Last synced: over 1 year ago
JSON representation

A small python library for quickly traversing XML data.

Host: GitHub
URL: https://github.com/dcwatson/drill
Owner: dcwatson
License: bsd-2-clause
Created: 2012-07-03T15:30:09.000Z (about 14 years ago)
Default Branch: master
Last Pushed: 2018-12-03T00:30:54.000Z (over 7 years ago)
Last Synced: 2025-03-21T19:04:56.947Z (over 1 year ago)
Language: Python
Size: 40 KB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Basic Usage

import drill
doc = drill.parse(path_or_url_or_string)

# Drill down to a specific element.
print unicode(doc.book.title)

# Iterate through all "title" tags in the document.
for t in doc.iter('title'):
print t.attrs, t.data

# Find all "bar" nodes with a "baz" child and a "foo" parent.
q = doc.find('//foo/bar[baz]')
# Easily access the first and last elements of matching results.
print q.first(), q.last()
# Iterate over results.
for e in q:
do_something(e)

# Parse only elements matching some path
for e in drill.iterparse(filelike, xpath='root/*/something'):
print e.tagname, e.data

## Features

* Runnable test suite
* Python 3 support

## Advantages

* Pure python
* Faster, more efficient parsing than ElementTree
* Using ElementTree, a ~150 MB XML file (~3,000,000 elements) took ~46 seconds to parse, consuming ~1.3 GB of RAM
* Parsing the same file using drill took ~24 seconds and consumed ~950 MB of RAM
* Very unscientific benchmarks performed on a Core i5 @ 2.8 GHz, running Windows 7. YMMV.
* Lots of convenience methods for accessing elements quickly:
* doc.response.resultCode.data
* root[2].child['attr']
* first/last/prev/next methods for traversing siblings

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dcwatson/drill

Awesome Lists containing this project

README