https://github.com/sopherapps/xml_stream

A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries
https://github.com/sopherapps/xml_stream

Last synced: 10 months ago
JSON representation

A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries

Host: GitHub
URL: https://github.com/sopherapps/xml_stream
Owner: sopherapps
License: mit
Created: 2020-09-26T02:18:41.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-10-08T17:36:11.000Z (over 3 years ago)
Last Synced: 2025-04-13T05:37:01.322Z (10 months ago)
Language: Python
Size: 59.6 KB
Stars: 9
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# xml_stream

[![PyPI version](https://badge.fury.io/py/xml-stream.svg)](https://badge.fury.io/py/xml-stream) ![CI](https://github.com/sopherapps/xml_stream/actions/workflows/ci.yml/badge.svg) ![CD](https://github.com/sopherapps/xml_stream/actions/workflows/cd.yml/badge.svg)

A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators),
with an option to convert to dictionaries

## Description

`xml_stream` comprises two helper functions:

### read_xml_file

When given a path to a file and the name of the tag that holds the relevant data, it returns an iterator
of the data as `xml.etree.ElementTree.Element` object by default, or as dicts when `to_dict` argument is `True`

### read_xml_string

When given an XML string and the name of the tag that holds the relevant data, it returns an iterator
of the data as `xml.etree.ElementTree.Element` object by default, or as dicts when `to_dict` argument is `True`

## Main Dependencies

- [Python +3.6](https://www.python.org)

## Getting Started

- Install the package

```bash
pip install xml_stream
```

- Import the `read_xml_file` and the `read_xml_string` classes and use accordingly

```python
from xml_stream import read_xml_file, read_xml_string

xml_string = """

Marketing

John Doe
Jane Doe
Peter Doe

Customer Service

Mary Doe
Harry Doe
Paul Doe

"""

file_path = '...' # path to your XML file

# For XML strings, use read_xml_string which returns an iterator
for element in read_xml_string(xml_string, records_tag='staff'):
# returns the element as xml.etree.ElementTree.Element by default
# ...do something with the element
print(element)

# Note that if a tag is namespaced with say _prefix:tag_ and domain is _xmlns:prefix="https://example",
# the records_tag from that tag will be '{https://example}tag'
for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):
# returns the element as dictionary
# ...do something with the element dictionary
print(element_as_dict)
# will print
"""
{
'operations_department': {
'employees': [
[
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'John',
'last_name': 'Doe',
'_value': 'John Doe'

},
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'Jane',
'last_name': 'Doe',
'_value': 'Jane Doe'

},
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'Peter',
'last_name': 'Doe',
'_value': 'Peter Doe'

}, ],
[
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Mary',
'last_name': 'Doe',
'_value': 'Mary Doe'

},
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Harry',
'last_name': 'Doe',
'_value': 'Harry Doe'

},
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Paul',
'last_name': 'Doe',
'_value': 'Paul Doe'

}
],
]
}
}
"""

# For XML files (even really large ones), use read_xml_file which also returns an iterator
for element in read_xml_file(file_path, records_tag='staff'):
# returns the element as xml.etree.ElementTree.Element by default
# ...do something with the element
print(element)

for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):
# returns the element as dictionary
# ...do something with the element dictionary
print(element_as_dict)
# see the print output for read_xml_string
```

## How to test

- Clone the repo and enter its root folder

```bash
git clone https://github.com/sopherapps/xml_stream.git && cd xml_stream
```

- Create a virtual environment and activate it

```bash
virtualenv -p /usr/bin/python3.6 env && source env/bin/activate
```

- Install the dependencies

```bash
pip install -r requirements.txt
```

- Download a huge xml file for test purposes and save it in the `/test` folder as `huge_mock.xml`

```sh
wget http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/SwissProt/SwissProt.xml && mv SwissProt.xml test/huge_mock.xml
```

- Run the test command

```bash
python -m unittest
```

## Acknowledgements

- This [Stack Overflow Answer](https://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary#answer-5807028) about converting XML to dict was very helpful.
- This [Real Python tutorial on publishing packages](https://realpython.com/pypi-publish-python-package/) was very helpful

## License

## Gratitude

All glory be to God

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sopherapps/xml_stream

Awesome Lists containing this project

README