{"id":16220046,"url":"https://github.com/kuba2k2/datastruct","last_synced_at":"2025-09-17T10:32:38.296Z","repository":{"id":110303021,"uuid":"585872234","full_name":"kuba2k2/datastruct","owner":"kuba2k2","description":"Combination of struct and dataclasses for easy parsing of binary formats","archived":false,"fork":false,"pushed_at":"2024-10-13T16:58:47.000Z","size":492,"stargazers_count":6,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-30T03:42:10.636Z","etag":null,"topics":["binary","construct","dataclass","dataclasses","python","struct","structure"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuba2k2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-06T10:03:16.000Z","updated_at":"2024-10-13T16:58:45.000Z","dependencies_parsed_at":"2023-11-18T15:05:25.771Z","dependency_job_id":"a3987c2c-c7a9-43ba-8cde-ce13b1cb6db0","html_url":"https://github.com/kuba2k2/datastruct","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuba2k2%2Fdatastruct","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuba2k2%2Fdatastruct/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuba2k2%2Fdatastruct/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuba2k2%2Fdatastruct/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuba2k2","download_url":"https://codeload.github.com/kuba2k2/datastruct/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233371289,"owners_count":18666201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary","construct","dataclass","dataclasses","python","struct","structure"],"created_at":"2024-10-10T11:56:57.063Z","updated_at":"2025-09-17T10:32:32.904Z","avatar_url":"https://github.com/kuba2k2.png","language":"Python","readme":"# py-datastruct\n\nThis is a (relatively) simple, **pure-Python, no dependency** library, aiming to simplify parsing and building binary data structures. It uses **[`dataclasses`](https://docs.python.org/3/library/dataclasses.html)** as its main container type, and **[`struct`](https://docs.python.org/3/library/struct.html)-compatible format specifiers** for writing field definitions.\n\nThe way of composing structures is somewhat similar to (and inspired by) [Construct](https://github.com/construct/construct). While probably not as powerful, it should give more flexibility and control over the data, as well as **full IDE type hinting**.\n\n## Installation\n\n```shell\npip install py-datastruct\n```\n\n**NOTE:** `pip install datastruct` installs a **different package** by the same name!\n\n## Breaking changes in v2.0.0\n\nIn DataStruct v2.0.0, the field type validation methods have been rewritten. They are now stricter, which means that the type hints will more closely represent the actual possible field values.\n\nThe new mechanism allows using **union types** (`int | float | bytes`), as well as **optional fields** (`MyStruct | None`) for fields, which wasn't previously possible. This is particularly useful for `cond()` and `switch()` fields.\n\nDue to this new logic, there are a few **breaking changes** in v2.0.0:\n\n\u003cdetails\u003e\n\n\u003csummary\u003e`cond()` field default `if_not=` value is now `None` (breaking)\u003c/summary\u003e\n\nPreviously, if the `cond()` field evaluated to `False`, its value was set to the wrapped field's default value (unless otherwise specified using the `if_not=` argument). For `subfield()`, the structure was created using default values.\n\nStarting in v2.0.0, the field's value will be set to `None`. You can still use `if_not=` to change that (which you should do, if you rely on that field's default value in any way). This means, that the `cond()` field's type specification **must** now include `None` as one of its types.\n\nIf your structure was:\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    var: int = cond(lambda ctx: ctx.my_condition)(field(\"I\"))\n```\n\nit must now be changed to either:\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    var: int | None = cond(lambda ctx: ctx.my_condition)(field(\"I\"))\n```\n\nor, using `if_not=`:\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    var: int = cond(lambda ctx: ctx.my_condition, if_not=0)(field(\"I\"))\n```\n\nThe same change applies to `subfield()` wrapped in `cond()`.\n\nNote that you **cannot** use `Any` for the `cond()` field, unless it wraps a `switch()` field (in which case the `cond()` field's type is transparently proxied to the `switch()` field).\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e`switch()` field's type must now account for all possible cases (possibly breaking)\u003c/summary\u003e\n\nSince union types are now usable with `switch()` fields, it is required to include all possible cases in the union.\n\nThe following structure demonstrates various ways of using the `switch()` field correctly:\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    var1: int = switch(False)(\n        false=(int, field(\"H\")),\n        true=(int, field(\"I\")),\n    )\n    var2: int | bool = switch(False)(\n        false=(int, field(\"H\")),\n        true=(bool, field(\"B\")),\n    )\n    var3: Any = switch(False)(\n        false=(int, field(\"H\")),\n        true=(bool, field(\"B\")),\n    )\n    var4: Any = switch(False)(\n        false=(..., padding(4)),\n        true=(int, field(\"I\")),\n    )\n    var5: ... = switch(False)(\n        false=(..., padding(4)),\n        true=(int, field(\"I\")),\n    )\n```\n\nNote that the usage of Ellipsis (`...`) is restricted for `switch()` fields that have at least one case using the `...` type.\n\nBy the examples above, if you have a `switch()` field that uses union types, but doesn't list all possible cases, you should either add the missing types or change the type to `Any`.\n\nIf your `switch()` field uses `subfield()` cases, and you don't want to use the `Any` type, and you don't want to list all possible types, consider using a base class (this is now possible!), like this:\n\n```python\n@dataclass\nclass MyBase(DataStruct):\n    # you can optionally add fields here - they will be *before* any subclass' fields\n    pass\n\n@dataclass\nclass MyStruct1(MyBase):  # note - no DataStruct here!\n    pass\n\n@dataclass\nclass MyStruct2(MyBase):\n    pass\n\n@dataclass\nclass MySwitchStruct(MyBase):\n    var1: MyBase = switch(False)(\n        false=(MyStruct1, subfield()),\n        true=(MyStruct2, subfield()),\n    )\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003eThe minimum required Python version is now 3.8\u003c/summary\u003e\n\nWhile it *may* still work on 3.7, it is recommended to use 3.10 at least. It *should* work on 3.8, but I can't reliably test everything on old versions to make sure it's fine.\n\n\u003c/details\u003e\n\n## Examples\n\nBefore you read this \"documentation\", be aware that it is by no means complete, and will probably be not enough for you to understand everything you need.\n\nHere are a few projects that are using `datastruct`:\n\n- https://github.com/tuya-cloudcutter/cloudcutter-universal/blob/master/cloudcutter/modules/dhcp/structs.py\n- https://github.com/tuya-cloudcutter/bk7231tools/blob/main/bk7231tools/analysis/kvstorage.py\n- https://github.com/libretiny-eu/ltchiptool/blob/master/uf2tool/models/partition.py\n- https://github.com/libretiny-eu/ltchiptool/blob/master/ltchiptool/soc/ambz2/util/models/images.py\n\nIf you want your project on this list, feel free to submit a PR.\n\n## Usage\n\nThis simple example illustrates creating a 24-byte long structure, consisting of a 32-bit integer, an 8-byte 0xFF-filled padding, and a 12-byte `bytes` string.\n\n```python\nfrom hexdump import hexdump\nfrom dataclasses import dataclass\nfrom datastruct import DataStruct\nfrom datastruct.fields import field, padding\n\n@dataclass\nclass MyStruct(DataStruct):\n    my_number: int = field(\"I\", default=123)\n    _1: ... = padding(8)\n    my_binary: bytes = field(\"12s\")\n\nmy_object = MyStruct(my_binary=b\"Hello Python\")\nprint(my_object)\n# MyStruct(my_number=123, my_binary=b'Hello World!')\n\nmy_object = MyStruct(my_number=5, my_binary=b\"Hello World!\")\nprint(my_object)\n# MyStruct(my_number=5, my_binary=b'Hello World!')\n\npacked = my_object.pack()\nhexdump(packed)\n# 00000000: 05 00 00 00 FF FF FF FF  FF FF FF FF 48 65 6C 6C  ............Hell\n# 00000010: 6F 20 57 6F 72 6C 64 21                           o World!\n\nunpacked = MyStruct.unpack(packed)\nprint(unpacked)\n# MyStruct(my_number=5, my_binary=b'Hello World!')\nprint(my_object == unpacked)\n# True\n```\n\nYou might also pass a stream (file/BytesIO/etc.) to `pack()` and `unpack()`. Otherwise, `pack()` will create a BytesIO stream and return its contents after packing; `unpack()` will accept a `bytes` object as its parameter.\n\n`pack()` and `unpack()` also accept custom, keyword-only arguments, that are available in the Context, throughout the entire operation.\n\n### Context\n\nUpon starting a pack/unpack operation, a `Context` object is created. The context is a container scoped to the currently processed structure. It's composed of the following main elements:\n\n- all values of the current structure - when packing; during unpacking, it contains all values of fields that were already processes (the context \"grows\")\n- all keyword arguments passed to `pack()`/`unpack()` (for the root context only)\n- all keyword arguments passed to `subfield()` (for child contexts only)\n- `_: Context` - reference to the parent object's context (only when nesting `DataStruct`s)\n- `self: Any` - the current datastruct - note that it's a `DataStruct` subclass when packing, and a `Container` when unpacking\n- `G` - global context - general-purpose container that is not scoped to the current structure (it's identical for nested structs)\n  - `io: IO[bytes]` - the stream being read from/written to\n  - `packing: bool` - whether current operation is packing\n  - `unpacking: bool` - whether current operation is unpacking\n  - `root: Context` - context of the topmost structure\n  - `tell: () -\u003e int` - function returning the current position in the stream\n  - `seek: (offset: int, whence: int) -\u003e int` - function allowing to seek to an absolute offset\n- `P` - local context - general-purpose container that is different for each nested struct\n  - `config: Config` - current DataStruct's config \n  - `tell: () -\u003e int` - function returning the current position in the current structure (in bytes)\n  - `seek: (offset: int, whence: int) -\u003e int` - function allowing to seek to an offset within the current structure\n  - `skip: (length: int) -\u003e int` - function allowing to skip `length` bytes\n  - `i: int` - (for `repeat()` fields only) index of the current item of the list\n  - `item: Any` - (for `repeat()` fields, in `last=` lambda only) item processed right before evaluation\n  - `self: Any` - (packing only) value of the current field\n\nThe context is \"general-purpose\", meaning that the user can write custom values to it. All fields presented above can be accessed by lambda functions - see \"Parameter evaluation\".\n\n### Parameter evaluation\n\nMost field parameters support pack/unpack-time evaluation (which means they can e.g. depend on previously read fields). Lambda expressions are then given the current context, and expected to return a simple value, that would be statically valid in this parameter.\n\n```python\nan_unpredictable_field: int = field(lambda ctx: \"I\" if randint(1, 10) % 2 == 0 else \"H\")\n```\n\n### Ellipsis - special value\n\nA special value of type `Ellipsis`/`...` is used in the library, to indicate something not having a type or a value. **It's not the same as `None`**. `built()` fields, for example, have `...` as value after creating the struct, but before packing it for the first time.\n\nSpecial fields (like `padding()`, which don't have any value) must have `...` as their type hint.\n\n### Variable-length fields\n\nThis is a simple example of using parameter evaluation to dynamically size a `bytes` string. Binary strings use the `\u003clen\u003es` specifier, which can be omitted (simple `int` can be used instead). \n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    data_length: int = field(\"I\")\n    data: bytes = field(lambda ctx: ctx.data_length)\n```\n\nThe user is still responsible for adjusting `data_length` after changing `data`. The `built()` field comes in handy here:\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    data_length: int = built(\"I\", lambda ctx: len(ctx.data))\n    data: bytes = field(lambda ctx: ctx.data_length)\n```\n\nWhen unpacking, the `data_length` field will be used to dynamically size the `data` field. When packing, `data_length` will always be recalculated based on what's in `data`.\n\n### Wrapper fields - storing a list\n\nLists are also iterables, like `bytes`, but they store a number of items of a specific type. Thus, the `repeat()` field **wrapper** has to be used. \n\n**Wrapper fields** simply require calling them first with any used parameters, then with the \"base\" field.\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    item_count: int = built(\"H\", lambda ctx: len(ctx.items))\n    # This creates a list of 16-bit integers.\n    # The list is empty by default.\n    items: List[int] = repeat(lambda ctx: ctx.item_count)(field(\"H\"))\n\nmy_object = MyStruct()\nmy_object.items = [0x5555, 0x4444, 0x3333, 0x2222]\nmy_object.item_count = 1  # this doesn't matter, as the field is rebuilt\npacked = my_object.pack()\nhexdump(packed)\n# 00000000: 04 00 55 55 44 44 33 33  22 22\n```\n\n### Conditional fields\n\nThey're also wrapper fields - if the condition is not met, they act like as if the field didn't exist at all.\n\n```python\n@dataclass\nclass MyStruct(DataStruct):\n    has_text: bool = field(\"?\")\n    text: str = cond(lambda ctx: ctx.has_text)(field(\"8s\", default=\"\"))\n\nmy_object = MyStruct.unpack(b\"\\x01HELOWRLD\")\nprint(my_object)\n# MyStruct(has_text=True, text='HELOWRLD')\n\nmy_object = MyStruct.unpack(b\"\\x00\")\nprint(my_object)\n# MyStruct(has_text=False, text='')\n```\n\n### Switch fields\n\nSwitch fields are like more powerful conditional fields. The following example reads an 8/16/32-bit number, depending on the prefixing length byte. If the length is not supported, it reads the value as `bytes` instead.\n\n```python\nnumber_length: int = field(\"B\", default=1)\nnumber: Union[int, bytes] = switch(lambda ctx: ctx.number_length)(\n    _1=(int, field(\"B\")),\n    _2=(int, field(\"H\")),\n    _4=(int, field(\"I\")),\n    default=(bytes, field(lambda ctx: ctx.number_length)),\n)\n```\n\nThe values on the left (`_1`, `_2`, `_4`) are the **keys**. The key is picked depending on the key-lambda result (`ctx.number_length`). The value on the right is a tuple of the expected field type, and a `field()` specifier.\n\nSince it's not possible to pass just `1` as a keyword argument, integers are looked up prefixed with an underscore as well. Enums are additionally looked up by their name and value, and booleans are looked up by **lowercase** `true`/`false`.\n\nNote that you can pass (probably) any kind of field to the switch list.\n\n## To be continued\n\n## License\n\n```\nMIT License\n\nCopyright (c) 2023 Kuba Szczodrzyński\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuba2k2%2Fdatastruct","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuba2k2%2Fdatastruct","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuba2k2%2Fdatastruct/lists"}