{"id":15065407,"url":"https://github.com/p2p-ld/numpydantic","last_synced_at":"2025-10-06T00:48:32.503Z","repository":{"id":220482114,"uuid":"751633806","full_name":"p2p-ld/numpydantic","owner":"p2p-ld","description":"Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)","archived":false,"fork":false,"pushed_at":"2025-08-16T04:41:00.000Z","size":764,"stargazers_count":125,"open_issues_count":19,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-20T18:19:05.576Z","etag":null,"topics":["arrays","dask","data-modeling","hdf5","numpy","pydantic","pydantic-numpy","serialization","validation","zarr"],"latest_commit_sha":null,"homepage":"https://numpydantic.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/p2p-ld.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/contributing/coc.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-02-02T01:50:57.000Z","updated_at":"2025-08-29T21:07:17.000Z","dependencies_parsed_at":"2024-02-08T23:25:00.370Z","dependency_job_id":"7dd13218-9b99-4170-a7e1-500f6ea4971f","html_url":"https://github.com/p2p-ld/numpydantic","commit_stats":null,"previous_names":["p2p-ld/numpydantic"],"tags_count":29,"template":false,"template_full_name":null,"purl":"pkg:github/p2p-ld/numpydantic","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p2p-ld%2Fnumpydantic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p2p-ld%2Fnumpydantic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p2p-ld%2Fnumpydantic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p2p-ld%2Fnumpydantic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/p2p-ld","download_url":"https://codeload.github.com/p2p-ld/numpydantic/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/p2p-ld%2Fnumpydantic/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278542684,"owners_count":26004061,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrays","dask","data-modeling","hdf5","numpy","pydantic","pydantic-numpy","serialization","validation","zarr"],"created_at":"2024-09-25T00:38:02.005Z","updated_at":"2025-10-06T00:48:32.486Z","avatar_url":"https://github.com/p2p-ld.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# numpydantic\n\n[![PyPI - Version](https://img.shields.io/pypi/v/numpydantic)](https://pypi.org/project/numpydantic)\n[![Documentation Status](https://readthedocs.org/projects/numpydantic/badge/?version=latest)](https://numpydantic.readthedocs.io/en/latest/?badge=latest)\n[![Coverage Status](https://coveralls.io/repos/github/p2p-ld/numpydantic/badge.svg)](https://coveralls.io/github/p2p-ld/numpydantic)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA python package for specifying, validating, and serializing arrays with arbitrary backends in pydantic.\n\n**Problem:** \n1) Pydantic is great for modeling data. \n2) Arrays are one of a few elemental types in computing,\n\nbut ...\n\n3) Typical type annotations would only work for a single array library implementation\n4) They wouldn’t allow you to specify array shapes and dtypes, and\n5) If you try and specify an array in pydantic, this happens:\n\n```python\n\u003e\u003e\u003e from pydantic import BaseModel\n\u003e\u003e\u003e import numpy as np\n\n\u003e\u003e\u003e class MyModel(BaseModel):\n\u003e\u003e\u003e     array: np.ndarray\npydantic.errors.PydanticSchemaGenerationError: \nUnable to generate pydantic-core schema for \u003cclass 'numpy.ndarray'\u003e. \nSet `arbitrary_types_allowed=True` in the model_config to ignore this error \nor implement `__get_pydantic_core_schema__` on your type to fully support it.\n```\n\n**Solution**\n\nNumpydantic allows you to do this:\n\n```python\nfrom pydantic import BaseModel\nfrom numpydantic import NDArray, Shape\n\nclass MyModel(BaseModel):\n    array: NDArray[Shape[\"3 x, 4 y, * z\"], int]\n```\n\nAnd use it with your favorite array library:\n\n```python\nimport numpy as np\nimport dask.array as da\nimport zarr\n\n# numpy\nmodel = MyModel(array=np.zeros((3, 4, 5), dtype=int))\n# dask\nmodel = MyModel(array=da.zeros((3, 4, 5), dtype=int))\n# hdf5 datasets\nmodel = MyModel(array=('data.h5', '/nested/dataset'))\n# zarr arrays\nmodel = MyModel(array=zarr.zeros((3,4,5), dtype=int))\nmodel = MyModel(array='data.zarr')\nmodel = MyModel(array=('data.zarr', '/nested/dataset'))\n# video files\nmodel = MyModel(array=\"data.mp4\")\n```\n\n`numpydantic` supports pydantic but none of its behavior is dependent on it!\nUse the `NDArray` type annotation like a regular type outside\nof pydantic -- eg. to validate an array anywhere, use `isinstance`:\n\n```python\narray_type = NDArray[Shape[\"1, 2, 3\"], int]\nisinstance(np.zeros((1,2,3), dtype=int), array_type)\n# True\nisinstance(zarr.zeros((1,2,3), dtype=int), array_type)\n# True\nisinstance(np.zeros((4,5,6), dtype=int), array_type)\n# False\nisinstance(np.zeros((1,2,3), dtype=float), array_type)\n# False\n```\n\nOr use it as a convenient callable shorthand for validating and working with\narray types that usually don't have an array-like API.\n\n```python\n\u003e\u003e\u003e rgb_video_type = NDArray[Shape[\"* t, 1920 x, 1080 y, 3 rgb\"], np.uint8]\n\u003e\u003e\u003e video = rgb_video_type('data.mp4')\n\u003e\u003e\u003e video.shape\n(10, 1920, 1080, 3)\n\u003e\u003e\u003e video[0, 0:3, 0:3, 0]\narray([[0, 0, 0],\n       [0, 0, 0],\n       [0, 0, 0]], dtype=uint8)\n```\n\n\n## Features:\n- **Types** - Annotations (based on [npytyping](https://github.com/ramonhagenaars/nptyping))\n  for specifying arrays in pydantic models\n- **Validation** - Shape, dtype, and other array validations\n- **Interfaces** - Works with [`numpy`](https://numpydantic.readthedocs.io/en/latest/api/interface/numpy.html), \n  [`dask`](https://numpydantic.readthedocs.io/en/latest/api/interface/dask.html), \n  [`hdf5`](https://numpydantic.readthedocs.io/en/latest/api/interface/hdf5.html),\n  [`video`](https://numpydantic.readthedocs.io/en/latest/api/interface/video.html), \n  [`zarr`](https://numpydantic.readthedocs.io/en/latest/api/interface/zarr.html),\n  and a simple extension system to make it work with whatever else you want!\n- **Serialization** - Dump an array as a JSON-compatible array-of-arrays with enough metadata to be able to \n  recreate the model in the native format\n- **Schema Generation** - Correct JSON Schema for arrays, complete with shape and dtype constraints, to\n  make your models interoperable \n- **Fast** - The validation codepath is careful to take quick exits and not perform unnecessary work,\n  and interfaces use whatever tools available to validate against array metadata and lazy load to avoid\n  expensive i/o operations. Our goal is to make numpydantic a tool you don't ever need to think about.\n\nComing soon:\n- **Metadata** - This package was built to be used with [linkml arrays](https://linkml.io/linkml/schemas/arrays.html),\n  so we will be extending it to include arbitrary metadata included in the type annotation object in the JSON schema representation.\n- **Extensible Specification** - for v1, we are implementing the existing nptyping syntax, but \n  for v2 we will be updating that to an extensible specification syntax to allow interfaces to validate additional\n  constraints like chunk sizes, as well as make array specifications more introspectable and friendly to runtime usage.\n- **Advanced dtype handling** - handling dtypes that only exist in some array backends, allowing\n  minimum and maximum precision ranges, and so on as type maps provided by interface classes :)\n- (see [todo](https://numpydantic.readthedocs.io/en/latest/todo.html))\n\n## Installation\n\nnumpydantic tries to keep dependencies minimal, so by default it only comes with \ndependencies to use the numpy interface. Add the extra relevant to your favorite\narray library to be able to use it!\n\n```shell\npip install numpydantic\n# dask\npip install 'numpydantic[dask]'\n# hdf5\npip install 'numpydantic[hdf5]'\n# video\npip install 'numpydantic[video]'\n# zarr\npip install 'numpydantic[zarr]'\n# all array formats\npip intsall 'numpydantic[array]'\n```\n\n## Usage\n\n\u003e [!TIP]\n\u003e The README is just a sample! See the full documentation at \n\u003e https://numpydantic.readthedocs.io\n\nSpecify an array using [nptyping syntax](https://github.com/ramonhagenaars/nptyping/blob/master/USERDOCS.md)\nand use it with your favorite array library :)\n\nUse the `NDArray` class like you would any other python type,\ncombine it with `Union`, make it `Optional`, etc.\n\nFor example, to specify a very special type of image that can either be\n- a 2D float array where the axes can be any size, or \n- a 3D uint8 array where the third axis must be size 3\n- a 1080p video \n\n```python\nfrom typing import Union\nfrom pydantic import BaseModel\nimport numpy as np\n\nfrom numpydantic import NDArray, Shape\n\nclass Image(BaseModel):\n    array: Union[\n        NDArray[Shape[\"* x, * y\"], float],\n        NDArray[Shape[\"* x, * y, 3 rgb\"], np.uint8],\n        NDArray[Shape[\"* t, 1080 y, 1920 x, 3 rgb\"], np.uint8]\n    ]\n```\n\nAnd then use that as a transparent interface to your favorite array library!\n\n### Interfaces\n\n#### Numpy\n\nThe Coca-Cola of array libraries\n\n```python\nimport numpy as np\n# works\nframe_gray = Image(array=np.ones((1280, 720), dtype=float))\nframe_rgb  = Image(array=np.ones((1280, 720, 3), dtype=np.uint8))\n\n# fails\nwrong_n_dimensions = Image(array=np.ones((1280,), dtype=float))\nwrong_shape = Image(array=np.ones((1280,720,10), dtype=np.uint8))\n\n# shapes and types are checked together, so this also fails\nwrong_shape_dtype_combo = Image(array=np.ones((1280, 720, 3), dtype=float))\n```\n\n#### Dask\n\nHigh performance chunked arrays! The backend for many new array libraries! \n\nWorks exactly the same as numpy arrays\n\n```python\nimport dask.array as da\n\n# validate a humongous image without having to load it into memory\nvideo_array = da.zeros(shape=(1e10,1e20,3), dtype=np.uint8)\ndask_video = Image(array=video_array)\n```\n\n#### HDF5\n\nArray work increasingly can't fit on memory, but dealing with arrays on disk \ncan become a pain in concurrent applications. Numpydantic allows you to \nspecify the location of an array within an hdf5 file on disk and use it just like\nany other array!\n\neg. Make an array on disk...\n\n```python\nfrom pathlib import Path\nimport h5py\nfrom numpydantic.interface.hdf5 import H5ArrayPath\n\nh5f_file = Path('my_file.h5')\narray_path = \"/nested/array\"\n\n# make an HDF5 array\nh5f = h5py.File(h5f_file, \"w\")\narray = np.random.randint(0, 255, (1920,1080,3), np.uint8)\nh5f.create_dataset(array_path, data=array)\nh5f.close()\n```\n\nThen use it in your model! numpydantic will only open the file as long as it's needed\n\n```python\n\u003e\u003e\u003e h5f_image = Image(array=H5ArrayPath(file=h5f_file, path=array_path))\n\u003e\u003e\u003e h5f_image.array[0:5,0:5,0]\narray([[0, 0, 0, 0, 0],\n       [0, 0, 0, 0, 0],\n       [0, 0, 0, 0, 0],\n       [0, 0, 0, 0, 0],\n       [0, 0, 0, 0, 0]], dtype=uint8)\n\u003e\u003e\u003e h5f_image.array[0:2,0:2,0] = 1\n\u003e\u003e\u003e h5f_image.array[0:5,0:5,0]\narray([[1, 1, 0, 0, 0],\n       [1, 1, 0, 0, 0],\n       [0, 0, 0, 0, 0],\n       [0, 0, 0, 0, 0],\n       [0, 0, 0, 0, 0]], dtype=uint8)\n```\n\nNumpydantic tries to be a smart but transparent proxy, exposing the methods and attributes\nof the source type even when we aren't directly using them, like when dealing with on-disk HDF5 arrays.\n\nIf you want, you can take full control and directly interact with the underlying :class:`h5py.Dataset`\nobject and leave the file open between calls:\n\n```python\n\u003e\u003e\u003e dataset = h5f_image.array.open()\n\u003e\u003e\u003e # do some stuff that requires the dataset to be held open\n\u003e\u003e\u003e h5f_image.array.close()\n```\n\n#### Video\n\nVideos are just arrays with fancy encoding! Numpydantic can validate shape and dtype\nas well as lazy load chunks of frames with arraylike syntax!\n\nSay we have some video `data.mp4` ...\n\n```python\nvideo = Image(array='data.mp4')\n# get a single frame\nvideo.array[5]\n# or a range of frames!\nvideo.array[5:10]\n# or whatever slicing you want to do!\nvideo.array[5:50:5, 0:10, 50:70]\n```\n\nAs elsewhere, a proxy class is a transparent pass-through interface to the underlying\nopencv class, so we can get the rest of the video properties ...\n\n```python\nimport cv2\n\n# get the total frames from opencv\nvideo.array.get(cv2.CAP_PROP_FRAME_COUNT)\n# the proxy class also provides a convenience property\nvideo.array.n_frames\n```\n\n#### Zarr\n\nZarr works similarly!\n\nUse it with any of Zarr's backends: Nested, Zipfile, S3, it's all the same!\n\nEg. create a nested zarr array on disk and use it...\n\n```python\nimport zarr\nfrom numpydantic.interface.zarr import ZarrArrayPath\n\narray_file = 'data/array.zarr'\nnested_path = 'data/sets/here'\n\nroot = zarr.open(array_file, mode='w')\nnested_array = root.zeros(\n    nested_path, \n    shape=(1000, 1080, 1920, 3), \n    dtype=np.uint8\n)\n\n# validates just fine!\nzarr_video = Image(array=ZarrArrayPath(array_file, nested_path))\n# or just pass a tuple, the interface can discover it's a zarr array\nzarr_video = Image(array=(array_file, nested_path))\n```\n\n### JSON Schema\n\nNumpydantic generates JSON Schema for all its array specifications, so for the above\nmodel, we get a schema for each of the possible array types that properly handles\nthe shape and dtype constraints and includes the origin numpy type as a `dtype` annotation.\n\n```python\nImage.model_json_schema()\n```\n\n```json\n{\n  \"properties\": {\n    \"array\": {\n      \"anyOf\": [\n        {\n          \"items\": {\"items\": {\"type\": \"number\"}, \"type\": \"array\"},\n          \"type\": \"array\"\n        },\n        {\n          \"dtype\": \"numpy.uint8\",\n          \"items\": {\n            \"items\": {\n              \"items\": {\n                \"maximum\": 255,\n                \"minimum\": 0,\n                \"type\": \"integer\"\n              },\n              \"maxItems\": 3,\n              \"minItems\": 3,\n              \"type\": \"array\"\n            },\n            \"type\": \"array\"\n          },\n          \"type\": \"array\"\n        },\n        {\n          \"dtype\": \"numpy.uint8\",\n          \"items\": {\n            \"items\": {\n              \"items\": {\n                \"items\": {\n                  \"maximum\": 255,\n                  \"minimum\": 0,\n                  \"type\": \"integer\"\n                },\n                \"maxItems\": 3,\n                \"minItems\": 3,\n                \"type\": \"array\"\n              },\n              \"maxItems\": 1920,\n              \"minItems\": 1920,\n              \"type\": \"array\"\n            },\n            \"maxItems\": 1080,\n            \"minItems\": 1080,\n            \"type\": \"array\"\n          },\n          \"type\": \"array\"\n        }\n      ],\n      \"title\": \"Array\"\n    }\n  },\n  \"required\": [\"array\"],\n  \"title\": \"Image\",\n  \"type\": \"object\"\n}\n```\n\nnumpydantic can even handle shapes with unbounded numbers of dimensions by using\nrecursive JSON schema!!!\n\nSo the any-shaped array (using nptyping's ellipsis notation):\n\n```python\nclass AnyShape(BaseModel):\n    array: NDArray[Shape[\"*, ...\"], np.uint8]\n```\n\nis rendered to JSON-Schema like this:\n\n```json\n{\n  \"$defs\": {\n    \"any-shape-array-9b5d89838a990d79\": {\n      \"anyOf\": [\n        {\n          \"items\": {\n            \"$ref\": \"#/$defs/any-shape-array-9b5d89838a990d79\"\n          },\n          \"type\": \"array\"\n        },\n        {\"maximum\": 255, \"minimum\": 0, \"type\": \"integer\"}\n      ]\n    }\n  },\n  \"properties\": {\n    \"array\": {\n      \"dtype\": \"numpy.uint8\",\n      \"items\": {\"$ref\": \"#/$defs/any-shape-array-9b5d89838a990d79\"},\n      \"title\": \"Array\",\n      \"type\": \"array\"\n    }\n  },\n  \"required\": [\"array\"],\n  \"title\": \"AnyShape\",\n  \"type\": \"object\"\n}\n```\n\nwhere the key `\"any-shape-array-9b5d89838a990d79\"` uses a (blake2b) hash of the\ninner dtype specification so that having multiple any-shaped arrays in a single \nmodel schema are deduplicated without conflicts.\n\n### Dumping\n\nOne of the main reasons to use chunked array libraries like zarr is to avoid\nneeding to load the entire array into memory. When dumping data to JSON, numpydantic \ntries to mirror this behavior, by default only dumping the metadata that is\nnecessary to identify the array.\n\nFor example, with zarr:\n\n```python\narray = zarr.array([[1,2,3],[4,5,6],[7,8,9]], dtype=float)\ninstance = Image(array=array)\ndumped = instance.model_dump_json()\n```\n\n```json\n{\n  \"array\":\n  {\n    \"Chunk shape\": \"(3, 3)\",\n    \"Chunks initialized\": \"1/1\",\n    \"Compressor\": \"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)\",\n    \"Data type\": \"float64\",\n    \"No. bytes\": \"72\",\n    \"No. bytes stored\": \"421\",\n    \"Order\": \"C\",\n    \"Read-only\": \"False\",\n    \"Shape\": \"(3, 3)\",\n    \"Storage ratio\": \"0.2\",\n    \"Store type\": \"zarr.storage.KVStore\",\n    \"Type\": \"zarr.core.Array\",\n    \"hexdigest\": \"c51604eace325fe42bbebf39146c0956bd2ed13c\"\n  }\n}\n```\n\nTo print the whole array, we use pydantic's serialization contexts:\n\n```python\ndumped = instance.model_dump_json(context={'zarr_dump_array': True})\n```\n```json\n{\n  \"array\":\n  {\n    \"same thing,\": \"except also...\",\n    \"array\": [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]],\n    \"hexdigest\": \"c51604eace325fe42bbebf39146c0956bd2ed13c\"\n  }\n}\n```\n\n## Vendored Dependencies\n\nWe have vendored dependencies in the `src/numpydantic/vendor` package,\nand reproduced their licenses in the `licenses` directory.\n\n- [nptyping](https://github.com/ramonhagenaars/nptyping) - `numpydantic.vendor.nptyping` - `/licenses/nptyping.txt`","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fp2p-ld%2Fnumpydantic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fp2p-ld%2Fnumpydantic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fp2p-ld%2Fnumpydantic/lists"}