{"id":18295846,"url":"https://github.com/astropenguin/pandas-dataclasses","last_synced_at":"2025-08-13T11:35:38.985Z","repository":{"id":37526204,"uuid":"404203529","full_name":"astropenguin/pandas-dataclasses","owner":"astropenguin","description":":zap: pandas data creation by data classes","archived":false,"fork":false,"pushed_at":"2025-01-01T11:37:14.000Z","size":2973,"stargazers_count":50,"open_issues_count":4,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-31T06:08:39.223Z","etag":null,"topics":["dataclasses","pandas","python","specifications","typing"],"latest_commit_sha":null,"homepage":"https://astropenguin.github.io/pandas-dataclasses/v1.0.0","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/astropenguin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-08T03:52:11.000Z","updated_at":"2025-02-10T07:06:54.000Z","dependencies_parsed_at":"2023-12-21T16:44:59.163Z","dependency_job_id":"c19d1741-e2e7-4e0c-88a0-736a10402890","html_url":"https://github.com/astropenguin/pandas-dataclasses","commit_stats":{"total_commits":405,"total_committers":1,"mean_commits":405.0,"dds":0.0,"last_synced_commit":"a84dde2f82bcae7947ac2f27bf3b5f88aefb0197"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astropenguin%2Fpandas-dataclasses","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astropenguin%2Fpandas-dataclasses/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astropenguin%2Fpandas-dataclasses/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astropenguin%2Fpandas-dataclasses/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/astropenguin","download_url":"https://codeload.github.com/astropenguin/pandas-dataclasses/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247615385,"owners_count":20967184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataclasses","pandas","python","specifications","typing"],"created_at":"2024-11-05T14:38:32.664Z","updated_at":"2025-04-07T08:18:49.523Z","avatar_url":"https://github.com/astropenguin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pandas-dataclasses\n\n[![Release](https://img.shields.io/pypi/v/pandas-dataclasses?label=Release\u0026color=cornflowerblue\u0026style=flat-square)](https://pypi.org/project/pandas-dataclasses/)\n[![Python](https://img.shields.io/pypi/pyversions/pandas-dataclasses?label=Python\u0026color=cornflowerblue\u0026style=flat-square)](https://pypi.org/project/pandas-dataclasses/)\n[![Downloads](https://img.shields.io/pypi/dm/pandas-dataclasses?label=Downloads\u0026color=cornflowerblue\u0026style=flat-square)](https://pepy.tech/project/pandas-dataclasses)\n[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.6127352-cornflowerblue?style=flat-square)](https://doi.org/10.5281/zenodo.6127352)\n[![Tests](https://img.shields.io/github/actions/workflow/status/astropenguin/pandas-dataclasses/tests.yml?label=Tests\u0026style=flat-square)](https://github.com/astropenguin/pandas-dataclasses/actions)\n\npandas data creation by data classes\n\n## Overview\n\npandas-dataclass makes it easy to create [pandas] data (DataFrame and Series) by specifying their data types, attributes, and names using the Python's dataclass:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import AsFrame, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n    wind: Data[float]\n\n\ndf = Weather.new(\n    [2020, 2020, 2021, 2021, 2022],\n    [1, 7, 1, 7, 1],\n    [7.1, 24.3, 5.4, 25.9, 4.9],\n    [2.4, 3.1, 2.3, 2.4, 2.6],\n)\n```\n\nwhere `df` will become a DataFrame object like:\n\n```\n            temp  wind\nyear month\n2020 1       7.1   2.4\n     7      24.3   3.1\n2021 1       5.4   2.3\n     7      25.9   2.4\n2022 1       4.9   2.6\n```\n\n### Features\n\n- Specifying data types and names of each element in pandas data\n- Specifying metadata stored in pandas data attributes (attrs)\n- Support for hierarchical index and columns\n- Support for custom factory for data creation\n- Support for full [dataclass] features\n- Support for static type check by [mypy] and [Pyright] ([Pylance])\n\n### Installation\n\n```bash\npip install pandas-dataclasses\n```\n\n## How it works\n\npandas-dataclasses provides you the following features:\n\n- Type hints for dataclass fields (`Attr`, `Data`, `Index`) to specify the data type and name of each element in pandas data\n- Mix-in classes for dataclasses (`As`, `AsFrame`, `AsSeries`) to create pandas data by a classmethod (`new`) that takes the same arguments as dataclass initialization\n\nWhen you call `new`, it will first create a dataclass object and then create a Series or DataFrame object from the dataclass object according the type hints and values in it.\nIn the example above, `df = Weather.new(...)` is thus equivalent to:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom pandas_dataclasses import asframe\n```\n\u003c/details\u003e\n\n```python\nobj = Weather([2020, ...], [1, ...], [7.1, ...], [2.4, ...])\ndf = asframe(obj)\n```\n\nwhere `asframe` is a conversion function.\npandas-dataclasses does not touch the dataclass object creation itself; this allows you to fully customize your dataclass before conversion by the dataclass features (`field`, `__post_init__`, ...).\n\n## Basic usage\n\n### DataFrame creation\n\nAs shown in the example above, a dataclass that has the `AsFrame` (or `AsDataFrame` as an alias) mix-in will create DataFrame objects:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import AsFrame, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n    wind: Data[float]\n\n\ndf = Weather.new(...)\n```\n\nwhere fields typed by `Index` are *index fields*, each value of which will become an index or a part of a hierarchical index of a DataFrame object.\nFields typed by `Data` are *data fields*, each value of which will become a data column of a DataFrame object.\nFields typed by other types are just ignored in the DataFrame creation.\n\nEach data or index will be cast to the data type specified in a type hint like `Index[int]`.\nUse `Any` or `None` (like `Index[Any]`) if you do not want type casting.\nSee also [data typing rules](#data-typing-rules) for more examples.\n\nBy default, a field name (i.e. an argument name) is used for the name of corresponding data or index.\nSee also [custom naming](#custom-naming) and [naming rules](#naming-rules) if you want customization.\n\n### Series creation\n\nA dataclass that has the `AsSeries` mix-in will create Series objects:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import AsSeries, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsSeries):\n    \"\"\"Weather information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n\n\nser = Weather.new(...)\n```\n\nUnlike `AsFrame`, the second and subsequent data fields are ignored in the Series creation even if they exist.\nOther rules are the same as for the DataFrame creation.\n\n## Advanced usage\n\n### Metadata storing\n\nFields typed by `Attr` are *attribute fields*, each value of which will become an item of attributes of a DataFrame or a Series object:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import AsFrame, Attr, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n    wind: Data[float]\n    loc: Attr[str] = \"Tokyo\"\n    lon: Attr[float] = 139.69167\n    lat: Attr[float] = 35.68944\n\n\ndf = Weather.new(...)\n```\n\nwhere `df.attrs` will become like:\n\n```python\n{\"loc\": \"Tokyo\", \"lon\": 139.69167, \"lat\": 35.68944}\n```\n\n### Custom naming\n\nThe name of attribute, data, or index can be explicitly specified by adding a hashable annotation to the corresponding type:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom typing import Annotated as Ann\nfrom pandas_dataclasses import AsFrame, Attr, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Ann[Index[int], \"Year\"]\n    month: Ann[Index[int], \"Month\"]\n    temp: Ann[Data[float], \"Temperature (deg C)\"]\n    wind: Ann[Data[float], \"Wind speed (m/s)\"]\n    loc: Ann[Attr[str], \"Location\"] = \"Tokyo\"\n    lon: Ann[Attr[float], \"Longitude (deg)\"] = 139.69167\n    lat: Ann[Attr[float], \"Latitude (deg)\"] = 35.68944\n\n\ndf = Weather.new(...)\n```\n\nwhere `df` and `df.attrs` will become like:\n\n```\n            Temperature (deg C)  Wind speed (m/s)\nYear Month\n2020 1                      7.1               2.4\n     7                     24.3               3.1\n2021 1                      5.4               2.3\n     7                     25.9               2.4\n2022 1                      4.9               2.6\n```\n\n```python\n{\"Location\": \"Tokyo\", \"Longitude (deg)\": 139.69167, \"Latitude (deg)\": 35.68944}\n```\n\nIf an annotation is a [format string], it will be formatted by a dataclass object before the data creation:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom typing import Annotated as Ann\nfrom pandas_dataclasses import AsFrame, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Ann[Index[int], \"Year\"]\n    month: Ann[Index[int], \"Month\"]\n    temp: Ann[Data[float], \"Temperature ({.temp_unit})\"]\n    wind: Ann[Data[float], \"Wind speed ({.wind_unit})\"]\n    temp_unit: str = \"deg C\"\n    wind_unit: str = \"m/s\"\n\n\ndf = Weather.new(..., temp_unit=\"deg F\", wind_unit=\"km/h\")\n```\n\nwhere units of the temperature and the wind speed will be dynamically updated (see also [naming rules](#naming-rules)).\n\n### Hierarchical columns\n\nAdding tuple annotations to data fields will create DataFrame objects with hierarchical columns:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom typing import Annotated as Ann\nfrom pandas_dataclasses import AsFrame, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Ann[Index[int], \"Year\"]\n    month: Ann[Index[int], \"Month\"]\n    temp_avg: Ann[Data[float], (\"Temperature (deg C)\", \"Average\")]\n    temp_max: Ann[Data[float], (\"Temperature (deg C)\", \"Maximum\")]\n    wind_avg: Ann[Data[float], (\"Wind speed (m/s)\", \"Average\")]\n    wind_max: Ann[Data[float], (\"Wind speed (m/s)\", \"Maximum\")]\n\n\ndf = Weather.new(...)\n```\n\nwhere `df` will become like:\n\n```\n           Temperature (deg C)         Wind speed (m/s)\n                       Average Maximum          Average Maximum\nYear Month\n2020 1                     7.1    11.1              2.4     8.8\n     7                    24.3    27.7              3.1    10.2\n2021 1                     5.4    10.3              2.3    10.7\n     7                    25.9    30.3              2.4     9.0\n2022 1                     4.9     9.4              2.6     8.8\n```\n\nColumn names can be (explicitly) specified by dictionary annotations:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom typing import Annotated as Ann\nfrom pandas_dataclasses import AsFrame, Data, Index\n```\n\u003c/details\u003e\n\n```python\ndef name(meas: str, stat: str) -\u003e dict[str, str]:\n    \"\"\"Create a dictionary annotation for a column name.\"\"\"\n    return {\"Measurement\": meas, \"Statistic\": stat}\n\n\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Ann[Index[int], \"Year\"]\n    month: Ann[Index[int], \"Month\"]\n    temp_avg: Ann[Data[float], name(\"Temperature (deg C)\", \"Average\")]\n    temp_max: Ann[Data[float], name(\"Temperature (deg C)\", \"Maximum\")]\n    wind_avg: Ann[Data[float], name(\"Wind speed (m/s)\", \"Average\")]\n    wind_max: Ann[Data[float], name(\"Wind speed (m/s)\", \"Maximum\")]\n\n\ndf = Weather.new(...)\n```\n\nwhere `df` will become like:\n\n```\nMeasurement Temperature (deg C)         Wind speed (m/s)\nStatistic               Average Maximum          Average Maximum\nYear Month\n2020 1                      7.1    11.1              2.4     8.8\n     7                     24.3    27.7              3.1    10.2\n2021 1                      5.4    10.3              2.3    10.7\n     7                     25.9    30.3              2.4     9.0\n2022 1                      4.9     9.4              2.6     8.8\n```\n\nIf a tuple or dictionary annotation has [format string]s, they will also be formatted by a dataclass object (see also [naming rules](#naming-rules)).\n\n### Multiple-item fields\n\nMultiple (and possibly extra) attributes, data, or indices can be added by fields with corresponding type hints wrapped by `Multiple`:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import AsFrame, Data, Index, Multiple\n```\n\u003c/details\u003e\n\n\n```python\n@dataclass\nclass Weather(AsFrame):\n    \"\"\"Weather information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n    wind: Data[float]\n    extra_index: Multiple[Index[int]]\n    extra_data: Multiple[Data[float]]\n\n\ndf = Weather.new(\n    [2020, 2020, 2021, 2021, 2022],\n    [1, 7, 1, 7, 1],\n    [7.1, 24.3, 5.4, 25.9, 4.9],\n    [2.4, 3.1, 2.3, 2.4, 2.6],\n    extra_index={\n        \"day\": [1, 1, 1, 1, 1],\n        \"week\": [2, 2, 4, 3, 5],\n    },\n    extra_data={\n        \"humid\": [65, 89, 57, 83, 52],\n        \"press\": [1013.8, 1006.2, 1014.1, 1007.7, 1012.7],\n    },\n)\n```\n\nwhere `df` will become like:\n\n```\n                     temp  wind  humid   press\nyear month day week\n2020 1     1   2      7.1   2.4   65.0  1013.8\n     7     1   2     24.3   3.1   89.0  1006.2\n2021 1     1   4      5.4   2.3   57.0  1014.1\n     7     1   3     25.9   2.4   83.0  1007.7\n2022 1     1   5      4.9   2.6   52.0  1012.7\n```\n\nIf multiple items of the same name exist, the last-defined one will be finally used.\nFor example, if the `extra_index` field contains `\"month\": [2, 8, 2, 8, 2]`, the values given by the `month` field will be overwritten.\n\n### Custom pandas factory\n\nA custom class can be specified as a factory for the Series or DataFrame creation by `As`, the generic version of `AsFrame` and `AsSeries`.\nNote that the custom class must be a subclass of either `pandas.Series` or `pandas.DataFrame`:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nimport pandas as pd\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import As, Data, Index\n```\n\u003c/details\u003e\n\n```python\nclass CustomSeries(pd.Series):\n    \"\"\"Custom pandas Series.\"\"\"\n\n    pass\n\n\n@dataclass\nclass Temperature(As[CustomSeries]):\n    \"\"\"Temperature information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n\n\nser = Temperature.new(...)\n```\n\nwhere `ser` is statically regarded as `CustomSeries` and will become a `CustomSeries` object.\n\nGeneric Series type (`Series[T]`) is also supported, however, it is only for static the type check in the current pandas versions.\nIn such cases, you can additionally give a factory that must work in runtime as a class argument:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nimport pandas as pd\nfrom dataclasses import dataclass\nfrom pandas_dataclasses import As, Data, Index\n```\n\u003c/details\u003e\n\n```python\n@dataclass\nclass Temperature(As[\"pd.Series[float]\"], factory=pd.Series):\n    \"\"\"Temperature information.\"\"\"\n\n    year: Index[int]\n    month: Index[int]\n    temp: Data[float]\n\n\nser = Temperature.new(...)\n```\n\nwhere `ser` is statically regarded as `Series[float]` but will become a `Series` object in runtime.\n\n## Appendix\n\n### Data typing rules\n\nThe data type (dtype) of data or index is determined from the first `Data` or `Index` type of the corresponding field, respectively.\nThe following table shows how the data type is inferred:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom typing import Any, Annotated as Ann, Literal as L\nfrom pandas_dataclasses import Data\n```\n\u003c/details\u003e\n\nType hint | Inferred data type\n--- | ---\n`Data[Any]` | `None` (no type casting)\n`Data[None]` | `None` (no type casting)\n`Data[int]` | `numpy.int64`\n`Data[int \\| str]` | `numpy.int64`\n`Data[numpy.int32]` | `numpy.int32`\n`Data[L[\"datetime64[ns]\"]]` | `numpy.dtype(\"\u003cM8[ns]\")`\n`Data[L[\"category\"]]` | `pandas.CategoricalDtype()`\n`Data[int] \\| str` | `numpy.int64`\n`Data[int] \\| Data[float]` | `numpy.int64`\n`Ann[Data[int], \"spam\"]` | `numpy.int64`\n`Data[Ann[int, \"spam\"]]` | `numpy.int64`\n\n### Naming rules\n\nThe name of attribute, data, or index is determined from the first annotation of the first `Attr`, `Data`, or `Index` type of the corresponding field, respectively.\nIf the annotation is a [format string] or a tuple that has [format string]s, it (they) will be formatted by a dataclass object before the data creation.\nOtherwise, the field name (i.e. argument name) will be used.\nThe following table shows how the name is inferred:\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see all imports\u003c/summary\u003e\n\n```python\nfrom typing import Any, Annotated as Ann\nfrom pandas_dataclasses import Data\n```\n\u003c/details\u003e\n\nType hint | Inferred name\n--- | ---\n`Data[Any]` | (field name)\n`Ann[Data[Any], ..., \"spam\"]` | (field name)\n`Ann[Data[Any], \"spam\"]` | `\"spam\"`\n`Ann[Data[Any], \"spam\", \"ham\"]` | `\"spam\"`\n`Ann[Data[Any], \"spam\"] \\| Ann[str, \"ham\"]` | `\"spam\"`\n`Ann[Data[Any], \"spam\"] \\| Ann[Data[float], \"ham\"]` | `\"spam\"`\n`Ann[Data[Any], \"{.name}\"` | `\"{.name}\".format(obj)`\n`Ann[Data[Any], (\"spam\", \"ham\")]` | `(\"spam\", \"ham\")`\n`Ann[Data[Any], (\"{.name}\", \"ham\")]` | `(\"{.name}\".format(obj), \"ham\")`\n\nwhere `obj` is a dataclass object that is expected to have `obj.name`.\n\n### Development roadmap\n\nRelease version | Features\n--- | ---\nv0.5 | Support for dynamic naming\nv0.6 | Support for extension array and dtype\nv0.7 | Support for hierarchical columns\nv0.8 | Support for mypy and callable pandas factory\nv0.9 | Support for Ellipsis (`...`) as an alias of field name\nv0.10 | Support for union type in type hints\nv0.11 | Support for Python 3.11 and drop support for Python 3.7\nv0.12 | Support for multiple items received in a single field\nv1.0 | Initial major release (freezing public features until v2.0)\n\n\u003c!-- References --\u003e\n[dataclass]: https://docs.python.org/3/library/dataclasses.html\n[format string]: https://docs.python.org/3/library/string.html#format-string-syntax\n[mypy]: http://www.mypy-lang.org\n[NumPy]: https://numpy.org\n[pandas]: https://pandas.pydata.org\n[Pylance]: https://github.com/microsoft/pylance-release\n[Pyright]: https://github.com/microsoft/pyright\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastropenguin%2Fpandas-dataclasses","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastropenguin%2Fpandas-dataclasses","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastropenguin%2Fpandas-dataclasses/lists"}