{"id":29858275,"url":"https://github.com/mysterious-ben/xmlrecords","last_synced_at":"2025-07-30T01:41:12.432Z","repository":{"id":48590216,"uuid":"268072982","full_name":"mysterious-ben/xmlrecords","owner":"mysterious-ben","description":"Utilities to extract tabular data from XML","archived":false,"fork":false,"pushed_at":"2023-06-15T12:17:00.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-06-15T13:26:23.182Z","etag":null,"topics":["parser","xml"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mysterious-ben.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-30T12:14:21.000Z","updated_at":"2023-06-15T11:47:48.000Z","dependencies_parsed_at":"2022-08-27T11:12:19.819Z","dependency_job_id":null,"html_url":"https://github.com/mysterious-ben/xmlrecords","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/mysterious-ben/xmlrecords","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mysterious-ben%2Fxmlrecords","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mysterious-ben%2Fxmlrecords/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mysterious-ben%2Fxmlrecords/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mysterious-ben%2Fxmlrecords/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mysterious-ben","download_url":"https://codeload.github.com/mysterious-ben/xmlrecords/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mysterious-ben%2Fxmlrecords/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267794154,"owners_count":24145161,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parser","xml"],"created_at":"2025-07-30T01:40:46.876Z","updated_at":"2025-07-30T01:41:12.407Z","avatar_url":"https://github.com/mysterious-ben.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# XML Records\n\n`xmlrecords` is a user-friendly wrapper of `lxml` package for extraction of tabular data from XML files.\n\n\u003e\u003e\u003e This data provider sends all his data in... XML. You know nothing about XML, except that it looks kind of weird and you would *definitely* never use it for tabular data. How could you just transform all this XML nightmare into a sensible tabular format, like a DataFrame? Don't worry: you are in the right place!\n\n\n# Installation\n\n```shell script\npip install xmlrecords\n```\n\nThe package requires `python 3.7+` and one external dependency `lxml`.\n\n# Usage\n\n## Basic example\n\nUsually, you only need to specify path to table rows; optionally, you can specify paths to any extra data you'd like to add to your table:\n\n```python\n# XML object\nxml_bytes = b\"\"\"\\\n\u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\n\u003cCatalog\u003e\n    \u003cLibrary\u003e\n        \u003cName\u003eVirtual Shore\u003c/Name\u003e\n    \u003c/Library\u003e\n    \u003cShelf\u003e\n        \u003cTimestamp\u003e2020-02-02T05:12:22\u003c/Timestamp\u003e\n        \u003cBook\u003e\n            \u003cTitle\u003eSunny Night\u003c/Title\u003e\n            \u003cAuthor alive=\"no\" name=\"Mysterious Mark\"/\u003e\n            \u003cYear\u003e2017\u003c/Year\u003e\n            \u003cPrice\u003e112.34\u003c/Price\u003e\n        \u003c/Book\u003e\n        \u003cBook\u003e\n            \u003cTitle\u003eBabel-17\u003c/Title\u003e\n            \u003cAuthor alive=\"yes\" name=\"Samuel R. Delany\"/\u003e\n            \u003cYear\u003e1963\u003c/Year\u003e\n            \u003cPrice\u003e10\u003c/Price\u003e\n        \u003c/Book\u003e\n    \u003c/Shelf\u003e\n\u003c/Catalog\u003e\n\"\"\"\n\n# Transform XML to records (= a list of key-value pairs)\nimport xmlrecords\nrecords = xmlrecords.parse(\n    xml=xml_bytes, \n    records_path=['Shelf', 'Book'],  # The rows are XML nodes with the repeating tag \u003cBook\u003e\n    meta_paths=[['Library', 'Name'], ['Shelf', 'Timestamp']],  # Add additional \"meta\" nodes\n)\nfor r in records:\n    print(r)\n\n# Output:\n# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Sunny Night', 'alive': 'no', 'name': 'Mysterious Mark', 'Year': '2017', 'Price': '112.34'}\n# {'Name': 'Virtual Shore', 'Timestamp': '2020-02-02T05:12:22', 'Title': 'Babel-17', 'alive': 'yes', 'name': 'Samuel R. Delany', 'Year': '1963', 'Price': '10'}\n\n# Validate record keys\nxmlrecords.validate(\n    records, \n    expected_keys=['Name', 'Timestamp', 'Title', 'alive', 'name', 'Year', 'Price'],\n)\n``` \n\n## With Pandas\n\nYou can easily transform records to a pandas DataFrame:\n\n```python\nimport pandas as pd\ndf = pd.DataFrame(records)\n```\n\n## With SQL\n\nYou can use records directly with INSERT statements if your SQL database is [PEP 249 compliant](https://www.python.org/dev/peps/pep-0249/). Most SQL databases are.\n\nSQLite is an exception. There, you'll have to transform records (= a list of dictionaries) into a list of lists:\n\n```python\nimport sqlite3\nwith sqlite3.connect('maindev.db') as conn:\n    c = conn.cursor()\n    c.execute(\"\"\"\\\n        CREATE TABLE BOOKS (\n           LIBRARY_NAME TEXT,\n           SHELF_TIMESTAMP TEXT,\n           TITLE TEXT,\n           AUTHOR_ALIVE TEXT,\n           AUTHOR_NAME TEXT,\n           YEAR INT,\n           PRICE FLOAT,\n           PRIMARY KEY (TITLE, AUTHOR_NAME)\n        )\n        \"\"\"\n    )\n    c.executemany(\n        \"\"\"INSERT INTO BOOKS VALUES (?,?,?,?,?,?,?)\"\"\",\n        [list(x.values()) for x in records],\n    )\n    conn.commit()\n```\n\n\n# FAQ\n\n1. **Why not `xmltodict`?** `xmltodict` can convert arbitrary XML to a python dict. However, it is 2-3 times slower than `xmlrecords` and does not support some features specific for tablular data.\n\n2. **Why not `xml` or `lxml`**? `xmlrecords` uses `lxml` under the hood. Using `xml` or `lxml` directly is a viable option too - in case this package doesn't cover your particular use case.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmysterious-ben%2Fxmlrecords","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmysterious-ben%2Fxmlrecords","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmysterious-ben%2Fxmlrecords/lists"}