{"id":29258042,"url":"https://github.com/tobywf/xml_dataclasses","last_synced_at":"2025-07-04T05:09:46.751Z","repository":{"id":40328425,"uuid":"240862949","full_name":"tobywf/xml_dataclasses","owner":"tobywf","description":"UNSUPPORTED (De)serialize XML documents into Python dataclasses","archived":false,"fork":false,"pushed_at":"2022-05-14T18:45:57.000Z","size":71,"stargazers_count":19,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-01T22:53:20.649Z","etag":null,"topics":["dataclass","dataclasses","python","python-dataclasses","python3","serde","xml","xml-dataclasses"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tobywf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-16T09:19:06.000Z","updated_at":"2024-01-21T21:16:03.000Z","dependencies_parsed_at":"2022-08-09T17:21:01.829Z","dependency_job_id":null,"html_url":"https://github.com/tobywf/xml_dataclasses","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/tobywf/xml_dataclasses","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobywf%2Fxml_dataclasses","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobywf%2Fxml_dataclasses/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobywf%2Fxml_dataclasses/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobywf%2Fxml_dataclasses/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tobywf","download_url":"https://codeload.github.com/tobywf/xml_dataclasses/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tobywf%2Fxml_dataclasses/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263222460,"owners_count":23433026,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataclass","dataclasses","python","python-dataclasses","python3","serde","xml","xml-dataclasses"],"created_at":"2025-07-04T05:09:43.847Z","updated_at":"2025-07-04T05:09:46.724Z","avatar_url":"https://github.com/tobywf.png","language":"Python","readme":"# XML dataclasses\n\n[![License: MPL 2.0](https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg)](https://opensource.org/licenses/MPL-2.0) ![Build](https://github.com/tobywf/xml_dataclasses/workflows/Build/badge.svg?branch=master)\n\n[XML dataclasses on PyPI](https://pypi.org/project/xml-dataclasses/)\n\nThis library maps XML to and from Python dataclasses. It build on normal dataclasses from the standard library and uses [`lxml`](https://pypi.org/project/lxml/) for parsing/generating XML.\n\nIt's currently in alpha. It isn't ready for production if you aren't willing to do your own evaluation/quality assurance.\n\nRequires Python 3.7 or higher.\n\n## Features\n\n* Convert XML documents to well-defined dataclasses, which work with Mypy or IDE auto-completion\n* XML dataclasses are dataclasses\n* Full control of parsing and generating XML via `lxml`\n* Loading and dumping of attributes, child elements, and text content\n* Required and optional attributes/child elements\n* Lists of child elements are supported, as are unions and lists or unions\n* Inheritance does work, but has the same limitations as dataclasses. Inheriting from base classes with required fields and declaring optional fields doesn't work due to field order. This isn't recommended\n* Namespace support is decent as long as correctly declared. I've tried on several real-world examples, although they were known to be valid. `lxml` does a great job at expanding namespace information when loading and simplifying it when saving\n* Post-load validation hook `xml_validate`\n* Fields not required in the constructor are ignored by this library (via `ignored()` or `init=False`)\n\n## Limitations\n\n* Whitespace and comments aren't supported in the data model. They must be stripped when loading the XML\n* So far, I haven't found any examples where XML can't be mapped to a dataclass, but it's likely possible given how complex XML is\n* No typing/type conversions. Since XML is untyped, only string values are currently allowed. Type conversions are tricky to implement in a type-safe and extensible manner.\n* Dataclasses must be written by hand, no tools are provided to generate these from, DTDs, XML schema definitions, or RELAX NG schemas\n\n## Security\n\nThe caveats concerning untrusted content are roughly the same as with `lxml`, since that does the parsing. This is good, since `lxml`'s behaviour to XML attacks are well-understood. This library recursively resolves data structures, which may have memory implications for unbounded payloads. Because loading is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code (not a guarantee, see license). If you must deal with untrusted content, a workaround is to [use `lxml` to validate](https://lxml.de/validation.html) untrusted content with a strict schema, which you may already be doing.\n\n## Patterns\n\n### Defining attributes\n\nAttributes can be either `str` or `Optional[str]`. Using any other type won't work. Attributes can be renamed or have their namespace modified via the `rename` function. It can be used either on its own, or with an existing field definition:\n\n```python\n@xml_dataclass\nclass Foo:\n    __ns__ = None\n    required: str\n    optional: Optional[str] = None\n    renamed_with_default: str = rename(default=None, name=\"renamed-with-default\")\n    namespaced: str = rename(ns=\"http://www.w3.org/XML/1998/namespace\")\n    existing_field: str = rename(field(...), name=\"existing-field\")\n```\n\nFor now, you can work around this limitation with properties that do the conversion, and perform post-load validation.\n\nBy default, unknown attributes raise an error. This can be disabled by passing `Options` to `load` with `ignore_unknown_attributes`.\n\n### Defining text\n\nLike attributes, text can be either `str` or `Optional[str]`. You must declare text content with the `text` function. Similar to `rename`, this function can use an existing field definition, or take the `default` argument. Text cannot be renamed or namespaced. Every class can only have one field defining text content. If a class has text content, it cannot have any children.\n\n```python\n@xml_dataclass\nclass Foo:\n    __ns__ = None\n    value: str = text()\n\n@xml_dataclass\nclass Foo:\n    __ns__ = None\n    content: Optional[str] = text(default=None)\n\n@xml_dataclass\nclass Foo:\n    __ns__ = None\n    uuid: str = text(field(default_factory=lambda: str(uuid4())))\n```\n\n### Defining children/child elements\n\nChildren must ultimately be other XML dataclasses. However, they can also be `Optional`, `List`, and `Union` types:\n\n* `Optional` must be at the top level. Valid: `Optional[List[XmlDataclass]]`. Invalid: `List[Optional[XmlDataclass]]`\n* Next, `List` should be defined (if multiple child elements are allowed). Valid: `List[Union[XmlDataclass1, XmlDataclass2]]`. Invalid: `Union[List[XmlDataclass1], XmlDataclass2]`\n* Finally, if `Optional` or `List` were used, a union type should be the inner-most (again, if needed)\n\nIf a class has children, it cannot have text content.\n\nChildren can be renamed via the `rename` function. However, attempting to set a namespace is invalid, since the namespace is provided by the child type's XML dataclass. Also, unions of XML dataclasses must have the same namespace (you can use different fields with renaming if they have different namespaces, since the XML names will be resolved as a combination of namespace and name).\n\nBy default, unknown children raise an error. This can be disabled by passing `Options` to `load` with `ignore_unknown_children`.\n\n### Defining post-load validation\n\nSimply implement an instance method called `xml_validate` with no parameters, and no return value (if you're using type hints):\n\n```python\ndef xml_validate(self) -\u003e None:\n    pass\n```\n\nIf defined, the `load` function will call it after all values have been loaded and assigned to the XML dataclass. You can validate the fields you want inside this method. Return values are ignored; instead raise and catch exceptions.\n\n### Ignored fields\n\nFields not required in the constructor are ignored by this library (new in version 0.0.6). This is useful if you want to populate a field via post-load validation.\n\nYou can simply set `init=False`, although you may also want to exclude the field from comparisons. The `ignored` function does this, and can also be used.\n\nThe name doesn't matter, but it might be useful to use the `_` prefix as a convention.\n\n## Example (fully type hinted)\n\n(This is a simplified real world example - the container can also include optional `links` child elements.)\n\n```xml\n\u003c?xml version=\"1.0\"?\u003e\n\u003ccontainer version=\"1.0\" xmlns=\"urn:oasis:names:tc:opendocument:xmlns:container\"\u003e\n  \u003crootfiles\u003e\n    \u003crootfile full-path=\"OEBPS/content.opf\" media-type=\"application/oebps-package+xml\" /\u003e\n  \u003c/rootfiles\u003e\n\u003c/container\u003e\n```\n\n```python\nfrom dataclasses import dataclass\nfrom typing import List\nfrom lxml import etree  # type: ignore\nfrom xml_dataclasses import xml_dataclass, rename, load, dump, NsMap, XmlDataclass\n\nCONTAINER_NS = \"urn:oasis:names:tc:opendocument:xmlns:container\"\n\n\n@xml_dataclass\n@dataclass\nclass RootFile:\n    __ns__ = CONTAINER_NS\n    full_path: str = rename(name=\"full-path\")\n    media_type: str = rename(name=\"media-type\")\n\n\n@xml_dataclass\n@dataclass\nclass RootFiles:\n    __ns__ = CONTAINER_NS\n    rootfile: List[RootFile]\n\n\n# see Gotchas, this workaround is required for type hinting\n@xml_dataclass\n@dataclass\nclass Container(XmlDataclass):\n    __ns__ = CONTAINER_NS\n    version: str\n    rootfiles: RootFiles\n    # WARNING: this is an incomplete implementation of an OPF container\n\n    def xml_validate(self) -\u003e None:\n        if self.version != \"1.0\":\n            raise ValueError(f\"Unknown container version '{self.version}'\")\n\n\nif __name__ == \"__main__\":\n    nsmap: NsMap = {None: CONTAINER_NS}\n    # see Gotchas, stripping whitespace and comments is highly recommended\n    parser = etree.XMLParser(remove_blank_text=True, remove_comments=True)\n    lxml_el_in = etree.parse(\"container.xml\", parser).getroot()\n    container = load(Container, lxml_el_in, \"container\")\n    lxml_el_out = dump(container, \"container\", nsmap)\n    print(etree.tostring(lxml_el_out, encoding=\"unicode\", pretty_print=True))\n```\n\n## Gotchas\n\n### Type hinting\n\nThis can be a real pain to get right. Unfortunately, if you need this, you may have to resort to:\n\n```python\n@xml_dataclass\n@dataclass\nclass Child:\n    __ns__ = None\n    pass\n\n@xml_dataclass\n@dataclass\nclass Parent(XmlDataclass):\n    __ns__ = None\n    children: Child\n```\n\nIt's important that `@dataclass` be the *last* decorator, i.e. the closest to the class definition (and so the first to be applied). Luckily, only the root class you intend to pass to `load`/`dump` has to inherit from `XmlDataclass`, but all classes should have the `@dataclass` decorator applied.\n\n### Whitespace and comments\n\nIf you are able to, it is strongly recommended you strip whitespace and comments from the input via `lxml`:\n\n```python\nparser = etree.XMLParser(remove_blank_text=True, remove_comments=True)\n```\n\nBy default, `lxml` preserves whitespace. This can cause a problem when checking if elements have no text. The library does attempt to strip these; literally via Python's `strip()`. But `lxml` is likely faster and more robust.\n\nSimilarly, comments are included by default, and because loading is strict, they will be considered as nodes that the dataclass has not declared. It is recommended to omit them during parsing.\n\n### Optional vs required\n\nOn dataclasses, optional fields also usually have a default value to be useful. But this isn't required; `Optional` is just a type hint to say `None` is allowed. This would occur e.g. if an element has no children.\n\nFor loading XML dataclasses, whether or not a field is required is determined by if it has a `default`/`default_factory` defined. If so, and it's missing, that default is used. Otherwise, an error is raised.\n\nFor dumping, the default isn't considered. Instead, if a value is marked as `Optional` and the value is `None`, it isn't written.\n\nThis makes sense in many cases, but possibly not every case.\n\n## Changelog\n\n### [Unreleased] - 2022-05-14\n\n* Stringified type hints and postponed annotations should now resolve correctly.\n* Allow `NsMap` to be `None`/optional - thanks [bphunter1972](https://github.com/bphunter1972)!\n\n### [0.0.9] - 2022-02-10\n\n* Fix issue passing options when loading children - thanks [tim-lansen](https://github.com/tim-lansen)!\n\n### [0.0.7] and [0.0.8] - 2021-04-08\n\n* Warn if comments are found/don't treat comments as child elements in error messages\n* Allow lenient loading of undeclared attributes or children\n\n### [0.0.6] - 2020-03-25\n\n* Allow ignored fields via `init=false` or the `ignored` function\n\n### [0.0.5] - 2020-02-18\n\n* Fixed type hinting for consumers. While the library passed mypy validation, it was hard to get XML dataclasses in a codebase to pass mypy validation\n\n### [0.0.4] - 2020-02-16\n\n* Improved type resolving. This lead to easier field definitions, as `attr` and `child` are no longer needed because the type of the field is inferred\n\n### [0.0.3] - 2020-02-16\n\n* Added support for union types on children\n\n## Development\n\nThis project uses [pre-commit](https://pre-commit.com/) to run some linting hooks when committing. When you first clone the repo, please run:\n\n```\npre-commit install\n```\n\nYou may also run the hooks at any time:\n\n```\npre-commit run --all-files\n```\n\nDependencies are managed via [poetry](https://python-poetry.org/). To install all dependencies, use:\n\n```\npoetry install\n```\n\nThis will also install development dependencies such as `black`, `isort`, `pylint`, `mypy`, and `pytest`. Pre-defined tasks make it easy to run these, for example\n\n* `poetry run task lint` - this runs `black`, `isort`, `mypy`, and `pylint`\n* `poetry run task test` - this runs `pytest` with coverage\n\nFor a full list of tasks, see `poetry run task --list`.\n\n## License\n\nThis library is licensed under the Mozilla Public License Version 2.0. For more information, see `LICENSE`.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobywf%2Fxml_dataclasses","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftobywf%2Fxml_dataclasses","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftobywf%2Fxml_dataclasses/lists"}