{"id":21499222,"url":"https://github.com/cre-dev/xml2db","last_synced_at":"2025-07-15T21:31:20.130Z","repository":{"id":237036017,"uuid":"775377185","full_name":"cre-dev/xml2db","owner":"cre-dev","description":"A Python package to load complex XML files into a relational database","archived":false,"fork":false,"pushed_at":"2024-10-04T13:27:52.000Z","size":488,"stargazers_count":6,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-23T10:59:46.077Z","etag":null,"topics":["data-engineering","data-loader","database","duckdb","elt","etl","lxml","mssql","mysql","postgresql","python","relational-databases","sqlalchemy","xml","xmlschema","xsd"],"latest_commit_sha":null,"homepage":"https://cre-dev.github.io/xml2db/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cre-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-21T09:24:29.000Z","updated_at":"2024-10-29T17:28:45.000Z","dependencies_parsed_at":"2024-04-29T18:06:06.358Z","dependency_job_id":"6061bc1d-aeff-4e0e-a33d-df9755a1060b","html_url":"https://github.com/cre-dev/xml2db","commit_stats":null,"previous_names":["cre-dev/xml2db"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cre-dev%2Fxml2db","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cre-dev%2Fxml2db/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cre-dev%2Fxml2db/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cre-dev%2Fxml2db/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cre-dev","download_url":"https://codeload.github.com/cre-dev/xml2db/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226071047,"owners_count":17569103,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","data-loader","database","duckdb","elt","etl","lxml","mssql","mysql","postgresql","python","relational-databases","sqlalchemy","xml","xmlschema","xsd"],"created_at":"2024-11-23T17:14:59.260Z","updated_at":"2024-11-23T17:14:59.884Z","avatar_url":"https://github.com/cre-dev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Loading XML files into a relational database\n\n`xml2db` is a Python package which allows parsing and loading XML files into a relational database. It handles complex \nXML files which cannot be denormalized to flat tables, and works out of the box, without any custom mapping rules.\n\nIt can be used within an [Extract, Load, Transform](https://docs.getdbt.com/terms/elt) data pipeline pattern as it \nallows loading XML files into a relational data model which is very close from the source data, yet easy to work with.\n\nStarting from an XSD schema which represents a given XML structure, `xml2db` builds a data model, i.e. a set of database \ntables linked to each other by foreign keys relationships. Then, it allows parsing and loading XML files into the \ndatabase, and getting them back from the database into XML format if needed.\n\nLoading XML files into a relational database with `xml2db` can be as simple as:\n\n```python\nfrom xml2db import DataModel\n\n# Create a data model of tables with relations based on the XSD file\ndata_model = DataModel(\n    xsd_file=\"path/to/file.xsd\", \n    connection_string=\"postgresql+psycopg2://testuser:testuser@localhost:5432/testdb\",\n)\n# Parse an XML file based on this XSD\ndocument = data_model.parse_xml(\n    xml_file=\"path/to/file.xml\"\n)\n# Insert the document content into the database\ndocument.insert_into_target_tables()\n```\n\nThe data model created by `xml2db` will be close to the XSD schema. However, `xml2db` will perform a few systematic \nsimplifications aimed at limiting the complexity of the resulting data model and the storage footprint. The resulting \ndata model can be configured, but the above code will work out of the box, with reasonable defaults.\n\nThe raw data loaded into the database can then be processed if need be, using for instance [DBT](https://www.getdbt.com/),\nSQL views or stored procedures aimed at extracting, correcting and formatting the data into more user-friendly tables.\n\nThis package uses `sqlalchemy` to interact with the database, so it should work with different database backends. \nAutomated integration tests run against PostgreSQL, MySQL, MS SQL Server and DuckDB. You may have to install additional \npackages to connect to your database (e.g. `psycopg2` for PostgreSQL, `pymysql` for MySQL, `pyodbc` for MS SQL Server or\n`duckdb_engine` for DuckDB).\n\n**Please read the [package documentation website](https://cre-dev.github.io/xml2db) for all the details!**\n\n## Installation\n\nThe package can be installed, preferably in a virtual environment, using `pip`:\n\n``` bash\npip install xml2db\n```\n\n## Testing\n\nRunning the tests requires installing additional development dependencies, after cloning the repo, with:\n\n```bash\npip install -e .[tests,docs]\n```\n\nRun all tests with the following command:\n\n```bash\npython -m pytest\n```\n\nIntegration tests require write access to a PostgreSQL or MS SQL Server database; the connection string is provided as an\nenvironment variable `DB_STRING`. If you want to run only conversion tests that do not require a database you can run:\n\n```bash\npytest -m \"not dbtest\"\n`````\n\n## Contributing\n\n`xml2db` is developed and used at the [French energy regulation authority (CRE)](https://www.cre.fr/) to process complex \nXML data.\n\nContributions are welcome, as well as bug reports, starting on the project's \n[issue page](https://github.com/cre-dev/xml2db/issues).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcre-dev%2Fxml2db","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcre-dev%2Fxml2db","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcre-dev%2Fxml2db/lists"}