{"id":13737797,"url":"https://github.com/kachayev/dataclasses-tensor","last_synced_at":"2025-04-12T22:34:30.550Z","repository":{"id":141240741,"uuid":"349288996","full_name":"kachayev/dataclasses-tensor","owner":"kachayev","description":"Easily serialize dataclasses to and from tensors (PyTorch, NumPy)","archived":false,"fork":false,"pushed_at":"2021-04-10T05:25:31.000Z","size":36,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-26T16:38:56.070Z","etag":null,"topics":["ai","machine-learning","numpy","pytorch","tensor"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kachayev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-03-19T03:29:41.000Z","updated_at":"2024-04-03T09:29:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"7866830e-5cfa-4dd1-9bce-f255a249b2b0","html_url":"https://github.com/kachayev/dataclasses-tensor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fdataclasses-tensor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fdataclasses-tensor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fdataclasses-tensor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kachayev%2Fdataclasses-tensor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kachayev","download_url":"https://codeload.github.com/kachayev/dataclasses-tensor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248265074,"owners_count":21074883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","machine-learning","numpy","pytorch","tensor"],"created_at":"2024-08-03T03:02:01.417Z","updated_at":"2025-04-12T22:34:30.497Z","avatar_url":"https://github.com/kachayev.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Dataclasses Tensor\n\nThe library provides a simple API for encoding and decoding Python [`dataclasses`](https://docs.python.org/3/library/dataclasses.html) to and from tensors (PyTorch tensors or NumPy arrays) based on `typing` annotations.\n\nHeavily inspired by [`dataclasses-json`](https://github.com/lidatong/dataclasses-json) package.\n\n## Install\n\n```shell\npip install dataclasses-tensor\n```\n\n## Quickstart\n\nTensor representation for a game state in Chess:\n\n```python\nfrom dataclasses import dataclass, field\nfrom enum import Enum\nfrom typing import Optional, List\n\nfrom dataclasses_tensor import dataclass_tensor, config\n\nclass Player(Enum):\n  WHITE = 0\n  BLACK = 1\n\nclass PieceType(Enum):\n  PAWN = 0\n  BISHOP = 1\n  KNIGHT = 2\n  ROOK = 3\n  QUEEN = 4\n  KING = 5\n\n@dataclass\nclass Piece:\n  piece_type: PieceType\n  owner: Player\n\n@dataclass_tensor\n@dataclass\nclass Chess:\n  num_moves: float\n  next_move: Player\n  board: List[Optional[Piece]] = field(metadata=config(shape=(64,)))\n```\n\nWorking with tensors:\n\n```python\n\u003e\u003e\u003e state = Chess(100., next_move=Player.WHITE, board=[Piece(PieceType.KING, Player.BLACK)])\n\u003e\u003e\u003e t1 = state.to_numpy()\narray([100.,   1.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,   0.,\n         1.,   1.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   1.,\n...\n\u003e\u003e\u003e t1.shape\n(579,)\n\u003e\u003e\u003e Chess.from_numpy(t1)\nChess(num_moves=100., next_move=\u003cPlayer.WHITE: 0\u003e, board=[Piece(piece_type=\u003cPieceType.KING: 5\u003e, owner=\u003cPlayer.BLACK: 1\u003e), ...])\n```\n\n## Types\n\n### Data Classes\n\nThe library uses type annotations to determine appropriate encoding layout. Data class member variables serialized sequentially. See supported types listed below. \n\n### Primitives (int, float, bool)\n\nThe library supports numerical primitives (`int`, `float`) and `bool`. Strings and byte arrays are not supported.\n\nWarning: be careful with tensor `dtype` as an implicit type conversion could potentially lead to losing information (for example, writing `float` into `int32` tensor and reading it back won't produce expected result).\n\n### Enums\n\nPython [`Enums`](https://docs.python.org/3/library/enum.html) are encoded using one-hot encoding.\n\n```python\n\u003e\u003e\u003e from dataclasses_tensor import dataclass_tensor\n\u003e\u003e\u003e from dataclasses import dataclass\n\u003e\u003e\u003e from enum import Enum\n\u003e\u003e\u003e\n\u003e\u003e\u003e class Matrix(Enum):\n...     THE_MATRIX = 1\n...     RELOADED = 2\n...     REVOLUTIONS = 3\n...\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class WatchList:\n...     matrix: Matrix\n...\n\u003e\u003e\u003e WatchList(Matrix.RELOADED).to_numpy()\narray([0., 0., 1.])\n\u003e\u003e\u003e WatchList.from_numpy(_)\nWatchList(matrix=\u003cMatrix.RELOADED: 2\u003e)\n```\n\n### Optional\n\n[`typing.Optional`](https://docs.python.org/3/library/typing.html#typing.Optional) type is encoded using additional dimension prior to the main datatype.\n\n```python\n\u003e\u003e\u003e from typing import Optional\n\u003e\u003e\u003e\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class MaybeWatchList:\n...     matrix: Optional[Matrix]\n\u003e\u003e\u003e\n\u003e\u003e\u003e MaybeWatchList(Matrix.RELOADED).to_numpy()\narray([0., 0., 1., 0.])\n\u003e\u003e\u003e MaybeWatchList.from_numpy([0., 0., 1., 0.])\nMaybeWatchList(matrix=\u003cMatrix.RELOADED: 2\u003e)\n\u003e\u003e\u003e MaybeWatchList.from_numpy([1., 0., 0., 0.])\nMaybeWatchList(matrix=None)\n```\n\nThe layout described for `Optional[Enum]` is consistent with having `None` as additional option into enumeration.\n\n### Arrays\n\nArrays, defined either using [`typing.List`](https://docs.python.org/3/library/typing.html#typing.List) or `[]` (supported in Python3.9+), require size to be statically provided. See example:\n\n```python\n\u003e\u003e\u003e from typing import List\n\u003e\u003e\u003e from dataclasses_tensor import config\n\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class MultipleWatchList:\n...     matrices: List[Matrix] = field(metadata=config(shape=(2,)))\n\u003e\u003e\u003e\n\u003e\u003e\u003e MultipleWatchList([Matrix.THE_MATRIX, Matrix.RELOADED]).to_numpy()\narray([1., 0., 0., 0., 1., 0.])\n\u003e\u003e\u003e MultipleWatchList.from_numpy([1., 0., 0., 0., 1., 0.])\nMultipleWatchList(matrices=[\u003cMatrix.THE_MATRIX: 1\u003e, \u003cMatrix.RELOADED: 2\u003e])\n```\n\nNested lists are supported, note multidimensional `shape` configuration:\n\n```python\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class MultipleWatchList:\n...     matrices: List[List[Matrix]] = field(metadata=config(shape=(1,2)))\n\u003e\u003e\u003e\n\u003e\u003e\u003e MultipleWatchList([[Matrix.THE_MATRIX, Matrix.RELOADED]]).to_numpy()\narray([1., 0., 0., 0., 1., 0.])\n\u003e\u003e\u003e MultipleWatchList.from_numpy([1., 0., 0., 0., 1., 0.])\nMultipleWatchList(matrices=[[\u003cMatrix.THE_MATRIX: 1\u003e, \u003cMatrix.RELOADED: 2\u003e]])\n```\n\nIf `List` argument is `Optional`, the list is automatically padded to the right shape with `None`s.\n\n```python\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class MaybeMultipleWatchList:\n...     matrices: List[Optional[Matrix]] = field(metadata=config(shape=(3,)))\n\u003e\u003e\u003e\n\u003e\u003e\u003e MaybeMultipleWatchList([Matrix.THE_MATRIX, Matrix.RELOADED]).to_numpy()\narray([0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0.])\n\u003e\u003e\u003e MaybeMultipleWatchList.from_numpy([0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0.])\nMaybeMultipleWatchList(matrices=[\u003cMatrix.THE_MATRIX: 1\u003e, \u003cMatrix.RELOADED: 2\u003e, None])\n```\n\n### Union\n\n[`typing.Union`]() is encoded by allocating one-hot tensor to determine which option from the union is given following by corresponding layouts for all options.\n\n```python\n\u003e\u003e\u003e from typing import Union\n\u003e\u003e\u003e\n\u003e\u003e\u003e class Batman(Enum):\n...     BEGINS = 1\n...     DARK_KNIGHT = 2\n...     DARK_KINGHT_RISES = 3\n...\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class WatchList:\n...     next_movie: Union[Matrix, Batman]\n...\n\u003e\u003e\u003e WatchList(Matrix.RELOADED).to_numpy()\narray([1., 0., 0., 1., 0., 0., 0., 0.])\n\u003e\u003e\u003e WatchList.from_numpy(_)\nWatchList(next_movie=\u003cMatrix.RELOADED: 2\u003e)\n\u003e\u003e\u003e WatchList(Batman.DARK_KNIGHT).to_numpy()\narray([0., 1., 0., 0., 0., 0., 1., 0.])\n\u003e\u003e\u003e WatchList.from_numpy(_)\nWatchList(next_movie=\u003cBatman.DARK_KNIGHT: 2\u003e)\n```\n\nDecoding is a fairly straigtforward process though encoding might be somewhat problematic: Python's `typing` is not designed to provide separation-by-construction for union types. The library uses simple `isinstance` checks to test out all types provided against a given value, first match is used. The library does not traverse generics, origins, supertypes, etc. So, be diligent defining of `Union`. \n\n### Recursive Definitions\n\nRecursive definitions, like linked lists, trees, graphs etc, are **not supported**. From a usability and performance point of view, it's crucial for encoder/decoder to be able to evaluate statically output tensor size.\n\n## Targets\n\nThe library supports the following containers as tensors:\n\n* [NumPy ndarray](https://numpy.org/doc/stable/reference/generated/numpy.array.html) with `to_numpy`/`from_numpy`\n* [PyTorch tensors](https://pytorch.org/docs/stable/tensors.html) with `to_torch`/`from_torch`\n\nThe best way to work with [TensorFlow tensors](https://www.tensorflow.org/api_docs/python/tf/Tensor) is to use NumPy ndarrays and convert result with `tensorflow.convert_to_tensor` (as the tensor stored in memory as a ndarray anywyas).\n\nNote, that dependencies are not installed with the library itself (TensorFlow, PyTorch or NumPy) and should be provided at runtime.\n\n## Performance\n\nTensor layout is not cached and is computed for each operation. When performing a lot of operations with class definition staying the same, it makes sense to re-use layout. For example:\n\n```python\n\u003e\u003e\u003e class Matrix(Enum):\n...     THE_MATRIX = 1\n...     RELOADED = 2\n...     REVOLUTIONS = 3\n...\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class WatchList:\n...     matrix: Matrix\n...\n\u003e\u003e\u003e layout = WatchList.tensor_layout()\n\u003e\u003e\u003e WatchList(Matrix.RELOADED).to_numpy(tensor_layout=layout)\narray([0., 0., 1.])\n\u003e\u003e\u003e WatchList.from_numpy(_, tensor_layout=layout)\nWatchList(matrix=\u003cMatrix.RELOADED: 2\u003e)\n```\n\n## Advanced Features\n\n### Dtype\n\nThe library supports float and integer (long) tensors. The data type could be specified either as a parameter to the `dataclass_tensor` decorator (applied to all operations) or independently as an argument to `to_tensor` function call. See examples below.\n\n`dtype` argument is passed to the corresponding target library, e.g. NumPy ([docs](https://numpy.org/doc/stable/reference/arrays.dtypes.html)), PyTorch ([docs](https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype)) or TensorFlow.\n\n```python\n\u003e\u003e\u003e class Matrix(Enum):\n...     THE_MATRIX = 1\n...     RELOADED = 2\n...     REVOLUTIONS = 3\n...\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class WatchList:\n...     matrix: Matrix\n...\n\u003e\u003e\u003e WatchList(Matrix.RELOADED).to_numpy()\narray([0., 0., 1.], dtype=float32)\n\u003e\u003e\u003e WatchList(Matrix.RELOADED).to_numpy(dtype=\"int32\")\narray([0, 0, 1], dtype=int32)\n```\n\nor with defaults setup in a decorator\n\n```python\n\u003e\u003e\u003e class Matrix(Enum):\n...     THE_MATRIX = 1\n...     RELOADED = 2\n...     REVOLUTIONS = 3\n...\n\u003e\u003e\u003e @dataclass_tensor(dtype=\"int32\")\n... @dataclass\n... class WatchList:\n...     matrix: Matrix\n...\n\u003e\u003e\u003e WatchList(Matrix.RELOADED).to_numpy()\narray([0, 0, 1], dtype=int32)\n```\n\n### Batch\n\nTo create batch, use `batch=True` parameter. See examples:\n\n```python\n\u003e\u003e\u003e class Matrix(Enum):\n...     THE_MATRIX = 1\n...     RELOADED = 2\n...     REVOLUTIONS = 3\n...\n\u003e\u003e\u003e @dataclass_tensor\n... @dataclass\n... class WatchList:\n...     matrix: Matrix\n...\n\u003e\u003e\u003e WatchList.to_numpy([\n...     WatchList(Matrix.THE_MATRIX),\n...     WatchList(Matrix.RELOADED),\n... ], batch=True)\narray([[1., 0., 0.],\n       [0., 1., 0.]], dtype=float32)\n\u003e\u003e\u003e WatchList.from_numpy(_, batch=True)\n[WatchList(next_move=\u003cMatrix.THE_MATRIX: 0\u003e),\n WatchList(next_move=\u003cMatrix.RELOADED: 1\u003e)]\n```\n\n`batch_size` could be used to provide length hint (to ensure good performance when working with generators):\n\n```python\n\u003e\u003e\u003e WatchList.to_numpy((\n...     WatchList(Matrix.THE_MATRIX),\n...     WatchList(Matrix.RELOADED),\n... ), batch_size=2)\narray([[1., 0., 0.],\n       [0., 1., 0.]], dtype=float32)\n\u003e\u003e\u003e WatchList.from_numpy(_, batch_size=2)\n[WatchList(next_move=\u003cMatrix.THE_MATRIX: 0\u003e),\n WatchList(next_move=\u003cMatrix.RELOADED: 1\u003e)]\n```\n\n### Custom Attribute Resolver\n\nTBD\n\n## TODO\n\n- [ ] Field configuration to pack `int` into a categorical variable\n- [ ] Custom attribute resolver (e.g. from dict instead of class instance)\n- [ ] Pretty-print for tensor layout object\n\n## Contributing\n\n* Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.\n* Fork the repository on Github \u0026 branch from `main` to `feature-*` to start making your changes.\n* Write a test which shows that the bug was fixed or that the feature works as expected.\n\nor simply...\n\n* Use it.\n* Enjoy it.\n* Spread the word.\n\n# Citing dataclasses-tensor\n\nPlease use this bibtex if you want to cite this repository in your publications:\n\n```\n@misc{dataclassestensor,\n    author = {Oleksii Kachaiev},\n    title = {{dataclasses-tensor - Library to serialize Python dataclasses to and from tensors (PyTorch, NumPy)}},\n    year = {2021},\n    publisher = {GitHub},\n    journal = {GitHub repository},\n    howpublished = {\\url{https://github.com/kachayev/dataclasses-tensor}},\n}\n```\n\n## License\n\nCopyright © 2021, Oleksii Kachaiev.\n\n`dataclasses-tensor` is licensed under the MIT license, available at MIT and also in the LICENSE file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkachayev%2Fdataclasses-tensor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkachayev%2Fdataclasses-tensor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkachayev%2Fdataclasses-tensor/lists"}