{"id":18830257,"url":"https://github.com/sopherapps/xml_stream","last_synced_at":"2025-04-14T03:42:17.949Z","repository":{"id":47076101,"uuid":"298720455","full_name":"sopherapps/xml_stream","owner":"sopherapps","description":"A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries","archived":false,"fork":false,"pushed_at":"2022-10-08T17:36:11.000Z","size":61,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T05:37:01.322Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sopherapps.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-26T02:18:41.000Z","updated_at":"2025-04-10T11:32:28.000Z","dependencies_parsed_at":"2022-09-17T05:01:30.356Z","dependency_job_id":null,"html_url":"https://github.com/sopherapps/xml_stream","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sopherapps%2Fxml_stream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sopherapps%2Fxml_stream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sopherapps%2Fxml_stream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sopherapps%2Fxml_stream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sopherapps","download_url":"https://codeload.github.com/sopherapps/xml_stream/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248818760,"owners_count":21166468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T01:48:17.750Z","updated_at":"2025-04-14T03:42:17.923Z","avatar_url":"https://github.com/sopherapps.png","language":"Python","funding_links":["https://www.buymeacoffee.com/martinahinJ"],"categories":[],"sub_categories":[],"readme":"# xml_stream\n\n[![PyPI version](https://badge.fury.io/py/xml-stream.svg)](https://badge.fury.io/py/xml-stream) ![CI](https://github.com/sopherapps/xml_stream/actions/workflows/ci.yml/badge.svg) ![CD](https://github.com/sopherapps/xml_stream/actions/workflows/cd.yml/badge.svg)\n\nA simple XML file and string reader that is able to read big XML files and strings by using streams (iterators),\nwith an option to convert to dictionaries\n\n## Description\n\n`xml_stream` comprises two helper functions:\n\n### read_xml_file\n\nWhen given a path to a file and the name of the tag that holds the relevant data, it returns an iterator\nof the data as `xml.etree.ElementTree.Element` object by default, or as dicts when `to_dict` argument is `True`\n\n### read_xml_string\n\nWhen given an XML string and the name of the tag that holds the relevant data, it returns an iterator\nof the data as `xml.etree.ElementTree.Element` object by default, or as dicts when `to_dict` argument is `True`\n\n## Main Dependencies\n\n- [Python +3.6](https://www.python.org)\n\n## Getting Started\n\n- Install the package\n\n  ```bash\n  pip install xml_stream\n  ```\n\n- Import the `read_xml_file` and the `read_xml_string` classes and use accordingly\n\n  ```python\n  from xml_stream import read_xml_file, read_xml_string\n  \n  xml_string = \"\"\"\n  \u003ccompany\u003e\n        \u003cstaff\u003e\n            \u003coperations_department\u003e\n                \u003cemployees\u003e\n                    \u003cteam\u003eMarketing\u003c/team\u003e\n                    \u003clocation name=\"head office\" address=\"Kampala, Uganda\" /\u003e\n                    \u003cbio first_name=\"John\" last_name=\"Doe\"\u003eJohn Doe\u003c/bio\u003e\n                    \u003cbio first_name=\"Jane\" last_name=\"Doe\"\u003eJane Doe\u003c/bio\u003e\n                    \u003cbio first_name=\"Peter\" last_name=\"Doe\"\u003ePeter Doe\u003c/bio\u003e\n                \u003c/employees\u003e\n                \u003cemployees\u003e\n                    \u003cteam\u003eCustomer Service\u003c/team\u003e\n                    \u003clocation name=\"Kampala branch\" address=\"Kampala, Uganda\" /\u003e\n                    \u003cbio first_name=\"Mary\" last_name=\"Doe\"\u003eMary Doe\u003c/bio\u003e\n                    \u003cbio first_name=\"Harry\" last_name=\"Doe\"\u003eHarry Doe\u003c/bio\u003e\n                    \u003cbio first_name=\"Paul\" last_name=\"Doe\"\u003ePaul Doe\u003c/bio\u003e\n                \u003c/employees\u003e\n            \u003c/operations_department\u003e\n        \u003c/staff\u003e\n  \u003c/company\u003e\n  \"\"\"\n  \n  file_path = '...' # path to your XML file\n  \n  # For XML strings, use read_xml_string which returns an iterator  \n  for element in read_xml_string(xml_string, records_tag='staff'):\n      # returns the element as xml.etree.ElementTree.Element by default\n      # ...do something with the element\n      print(element)\n  \n  # Note that if a tag is namespaced with say _prefix:tag_ and domain is _xmlns:prefix=\"https://example\",\n  # the records_tag from that tag will be '{https://example}tag'\n  for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):\n      # returns the element as dictionary\n      # ...do something with the element dictionary\n      print(element_as_dict)\n      # will print\n      \"\"\"\n      {\n            'operations_department': {\n                'employees': [\n                    [\n                        {\n                            'team': 'Marketing',\n                            'location': {\n                                'name': 'head office',\n                                'address': 'Kampala, Uganda'\n                            },\n                            'first_name': 'John',\n                            'last_name': 'Doe',\n                            '_value': 'John Doe'\n\n                        },\n                        {\n                            'team': 'Marketing',\n                            'location': {\n                                'name': 'head office',\n                                'address': 'Kampala, Uganda'\n                            },\n                            'first_name': 'Jane',\n                            'last_name': 'Doe',\n                            '_value': 'Jane Doe'\n\n                        },\n                        {\n                            'team': 'Marketing',\n                            'location': {\n                                'name': 'head office',\n                                'address': 'Kampala, Uganda'\n                            },\n                            'first_name': 'Peter',\n                            'last_name': 'Doe',\n                            '_value': 'Peter Doe'\n\n                        }, ],\n                    [\n                        {\n                            'team': 'Customer Service',\n                            'location': {\n                                'name': 'Kampala branch',\n                                'address': 'Kampala, Uganda'\n                            },\n                            'first_name': 'Mary',\n                            'last_name': 'Doe',\n                            '_value': 'Mary Doe'\n\n                        },\n                        {\n                            'team': 'Customer Service',\n                            'location': {\n                                'name': 'Kampala branch',\n                                'address': 'Kampala, Uganda'\n                            },\n                            'first_name': 'Harry',\n                            'last_name': 'Doe',\n                            '_value': 'Harry Doe'\n\n                        },\n                        {\n                            'team': 'Customer Service',\n                            'location': {\n                                'name': 'Kampala branch',\n                                'address': 'Kampala, Uganda'\n                            },\n                            'first_name': 'Paul',\n                            'last_name': 'Doe',\n                            '_value': 'Paul Doe'\n\n                        }\n                    ],\n                ]\n            }\n      }\n      \"\"\"\n  \n  # For XML files (even really large ones), use read_xml_file which also returns an iterator  \n  for element in read_xml_file(file_path, records_tag='staff'):\n      # returns the element as xml.etree.ElementTree.Element by default\n      # ...do something with the element\n      print(element)\n  \n  for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):\n      # returns the element as dictionary\n      # ...do something with the element dictionary\n      print(element_as_dict)\n      # see the print output for read_xml_string\n  ```\n\n## How to test\n\n- Clone the repo and enter its root folder\n\n  ```bash\n  git clone https://github.com/sopherapps/xml_stream.git \u0026\u0026 cd xml_stream\n  ```\n\n- Create a virtual environment and activate it\n\n  ```bash\n  virtualenv -p /usr/bin/python3.6 env \u0026\u0026 source env/bin/activate\n  ```\n\n- Install the dependencies\n\n  ```bash\n  pip install -r requirements.txt\n  ```\n  \n- Download a huge xml file for test purposes and save it in the `/test` folder as `huge_mock.xml`\n\n  ```sh\n  wget http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/SwissProt/SwissProt.xml \u0026\u0026 mv SwissProt.xml test/huge_mock.xml\n  ```\n\n- Run the test command\n\n  ```bash\n  python -m unittest\n  ```\n\n## Acknowledgements\n\n- This [Stack Overflow Answer](https://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary#answer-5807028) about converting XML to dict was very helpful.\n- This [Real Python tutorial on publishing packages](https://realpython.com/pypi-publish-python-package/) was very helpful\n\n## License\n\nCopyright (c) 2020 [Martin Ahindura](https://github.com/Tinitto) Licensed under the [MIT License](./LICENSE)\n\n## Gratitude\n\nAll glory be to God\n\n\u003ca href=\"https://www.buymeacoffee.com/martinahinJ\" target=\"_blank\"\u003e\u003cimg src=\"https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png\" alt=\"Buy Me A Coffee\" style=\"height: 60px !important;width: 217px !important;\" \u003e\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsopherapps%2Fxml_stream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsopherapps%2Fxml_stream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsopherapps%2Fxml_stream/lists"}