https://github.com/sopherapps/xml_stream
A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries
https://github.com/sopherapps/xml_stream
Last synced: 10 months ago
JSON representation
A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries
- Host: GitHub
- URL: https://github.com/sopherapps/xml_stream
- Owner: sopherapps
- License: mit
- Created: 2020-09-26T02:18:41.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-10-08T17:36:11.000Z (over 3 years ago)
- Last Synced: 2025-04-13T05:37:01.322Z (10 months ago)
- Language: Python
- Size: 59.6 KB
- Stars: 9
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# xml_stream
[](https://badge.fury.io/py/xml-stream)  
A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators),
with an option to convert to dictionaries
## Description
`xml_stream` comprises two helper functions:
### read_xml_file
When given a path to a file and the name of the tag that holds the relevant data, it returns an iterator
of the data as `xml.etree.ElementTree.Element` object by default, or as dicts when `to_dict` argument is `True`
### read_xml_string
When given an XML string and the name of the tag that holds the relevant data, it returns an iterator
of the data as `xml.etree.ElementTree.Element` object by default, or as dicts when `to_dict` argument is `True`
## Main Dependencies
- [Python +3.6](https://www.python.org)
## Getting Started
- Install the package
```bash
pip install xml_stream
```
- Import the `read_xml_file` and the `read_xml_string` classes and use accordingly
```python
from xml_stream import read_xml_file, read_xml_string
xml_string = """
Marketing
John Doe
Jane Doe
Peter Doe
Customer Service
Mary Doe
Harry Doe
Paul Doe
"""
file_path = '...' # path to your XML file
# For XML strings, use read_xml_string which returns an iterator
for element in read_xml_string(xml_string, records_tag='staff'):
# returns the element as xml.etree.ElementTree.Element by default
# ...do something with the element
print(element)
# Note that if a tag is namespaced with say _prefix:tag_ and domain is _xmlns:prefix="https://example",
# the records_tag from that tag will be '{https://example}tag'
for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):
# returns the element as dictionary
# ...do something with the element dictionary
print(element_as_dict)
# will print
"""
{
'operations_department': {
'employees': [
[
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'John',
'last_name': 'Doe',
'_value': 'John Doe'
},
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'Jane',
'last_name': 'Doe',
'_value': 'Jane Doe'
},
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'Peter',
'last_name': 'Doe',
'_value': 'Peter Doe'
}, ],
[
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Mary',
'last_name': 'Doe',
'_value': 'Mary Doe'
},
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Harry',
'last_name': 'Doe',
'_value': 'Harry Doe'
},
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Paul',
'last_name': 'Doe',
'_value': 'Paul Doe'
}
],
]
}
}
"""
# For XML files (even really large ones), use read_xml_file which also returns an iterator
for element in read_xml_file(file_path, records_tag='staff'):
# returns the element as xml.etree.ElementTree.Element by default
# ...do something with the element
print(element)
for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):
# returns the element as dictionary
# ...do something with the element dictionary
print(element_as_dict)
# see the print output for read_xml_string
```
## How to test
- Clone the repo and enter its root folder
```bash
git clone https://github.com/sopherapps/xml_stream.git && cd xml_stream
```
- Create a virtual environment and activate it
```bash
virtualenv -p /usr/bin/python3.6 env && source env/bin/activate
```
- Install the dependencies
```bash
pip install -r requirements.txt
```
- Download a huge xml file for test purposes and save it in the `/test` folder as `huge_mock.xml`
```sh
wget http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/SwissProt/SwissProt.xml && mv SwissProt.xml test/huge_mock.xml
```
- Run the test command
```bash
python -m unittest
```
## Acknowledgements
- This [Stack Overflow Answer](https://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary#answer-5807028) about converting XML to dict was very helpful.
- This [Real Python tutorial on publishing packages](https://realpython.com/pypi-publish-python-package/) was very helpful
## License
Copyright (c) 2020 [Martin Ahindura](https://github.com/Tinitto) Licensed under the [MIT License](./LICENSE)
## Gratitude
All glory be to God
