{"id":13468894,"url":"https://github.com/unionai-oss/pandera","last_synced_at":"2026-04-14T03:02:17.944Z","repository":{"id":37582848,"uuid":"155649699","full_name":"unionai-oss/pandera","owner":"unionai-oss","description":"A light-weight, flexible, and expressive statistical data testing library","archived":false,"fork":false,"pushed_at":"2024-10-15T02:46:42.000Z","size":4273,"stargazers_count":3343,"open_issues_count":392,"forks_count":308,"subscribers_count":20,"default_branch":"main","last_synced_at":"2024-10-29T22:56:47.185Z","etag":null,"topics":["assertions","data-assertions","data-check","data-cleaning","data-processing","data-validation","data-verification","dataframe-schema","dataframes","hypothesis-testing","pandas","pandas-dataframe","pandas-validation","pandas-validator","schema","testing","testing-tools","validation"],"latest_commit_sha":null,"homepage":"https://www.union.ai/pandera","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unionai-oss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["cosmicBboy"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2018-11-01T02:18:34.000Z","updated_at":"2024-10-28T20:54:56.000Z","dependencies_parsed_at":"2022-07-14T23:46:09.240Z","dependency_job_id":"5289f6ee-70bb-41be-abfb-e8de94d5df4b","html_url":"https://github.com/unionai-oss/pandera","commit_stats":{"total_commits":777,"total_committers":148,"mean_commits":5.25,"dds":0.4015444015444015,"last_synced_commit":"ea4538d2f71795bba09e602d568d673798c92b35"},"previous_names":["pandera-dev/pandera"],"tags_count":99,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unionai-oss%2Fpandera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unionai-oss%2Fpandera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unionai-oss%2Fpandera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unionai-oss%2Fpandera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unionai-oss","download_url":"https://codeload.github.com/unionai-oss/pandera/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245597341,"owners_count":20641866,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assertions","data-assertions","data-check","data-cleaning","data-processing","data-validation","data-verification","dataframe-schema","dataframes","hypothesis-testing","pandas","pandas-dataframe","pandas-validation","pandas-validator","schema","testing","testing-tools","validation"],"created_at":"2024-07-31T15:01:21.457Z","updated_at":"2026-04-14T03:02:17.938Z","avatar_url":"https://github.com/unionai-oss.png","language":"Python","readme":"\u003cbr\u003e\n\u003cdiv align=\"center\"\u003e\u003ca href=\"https://www.union.ai/pandera\"\u003e\u003cimg src=\"docs/source/_static/pandera-banner.png\" width=\"400\"\u003e\u003c/a\u003e\u003c/div\u003e\n\n\u003ch1 align=\"center\"\u003e\n  The Open-source Framework for Dataset Validation\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  📊 🔎 ✅\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ci\u003eData validation for scientists, engineers, and analysts seeking correctness.\u003c/i\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n\n[![CI Build](https://img.shields.io/github/actions/workflow/status/unionai-oss/pandera/ci-tests.yml?branch=main\u0026label=tests\u0026style=for-the-badge)](https://github.com/unionai-oss/pandera/actions/workflows/ci-tests.yml?query=branch%3Amain)\n[![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=stable\u0026style=for-the-badge)](https://pandera.readthedocs.io/en/stable/?badge=stable)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/pandera.svg?style=for-the-badge)](https://pypi.org/project/pandera/)\n[![PyPI license](https://img.shields.io/pypi/l/pandera.svg?style=for-the-badge)](https://pypi.python.org/pypi/)\n[![pyOpenSci](https://go.union.ai/pandera-pyopensci-badge)](https://github.com/pyOpenSci/software-review/issues/12)\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://img.shields.io/badge/repo%20status-Active-Green?style=for-the-badge)](https://www.repostatus.org/#active)\n[![Documentation Status](https://readthedocs.org/projects/pandera/badge/?version=latest\u0026style=for-the-badge)](https://pandera.readthedocs.io/en/latest/?badge=latest)\n[![codecov](https://img.shields.io/codecov/c/github/unionai-oss/pandera?style=for-the-badge)](https://codecov.io/gh/unionai-oss/pandera)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/pandera.svg?style=for-the-badge)](https://pypi.python.org/pypi/pandera/)\n[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.3385265-blue?style=for-the-badge)](https://doi.org/10.5281/zenodo.3385265)\n[![asv](http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=for-the-badge)](https://pandera-dev.github.io/pandera-asv-logs/)\n[![Total Downloads](https://img.shields.io/pepy/dt/pandera?style=for-the-badge\u0026color=blue)](https://pepy.tech/project/pandera)\n[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandera?style=for-the-badge)](https://anaconda.org/conda-forge/pandera)\n[![Slack](https://img.shields.io/badge/Slack-4A154B?logo=slack\u0026logoColor=fff\u0026style=for-the-badge)](https://flyte-org.slack.com/archives/C08FDTY2X3L)\n\nPandera is a [Union.ai](https://union.ai/blog-post/pandera-joins-union-ai) open\nsource project that provides a flexible and expressive API for performing data\nvalidation on dataframe-like objects. The goal of Pandera is to make data\nprocessing pipelines more readable and robust with statistically typed\ndataframes.\n\n## Install\n\nPandera supports [multiple dataframe libraries](https://pandera.readthedocs.io/en/stable/supported_libraries.html), including [pandas](http://pandas.pydata.org), [polars](https://docs.pola.rs/), [pyspark](https://spark.apache.org/docs/latest/api/python/index.html), and more. To validate `pandas` DataFrames, install Pandera with the `pandas` extra:\n\n**With `pip`:**\n\n```\npip install 'pandera[pandas]'\n```\n\n**With `uv`:**\n\n```\nuv pip install 'pandera[pandas]'\n```\n\n**With `conda`:**\n\n```\nconda install -c conda-forge pandera-pandas\n```\n\n## Get started\n\nFirst, create a dataframe:\n\n```python\nimport pandas as pd\nimport pandera.pandas as pa\n\n# data to validate\ndf = pd.DataFrame({\n    \"column1\": [1, 2, 3],\n    \"column2\": [1.1, 1.2, 1.3],\n    \"column3\": [\"a\", \"b\", \"c\"],\n})\n```\n\nValidate the data using the object-based API:\n\n```python\n# define a schema\nschema = pa.DataFrameSchema({\n    \"column1\": pa.Column(int, pa.Check.ge(0)),\n    \"column2\": pa.Column(float, pa.Check.lt(10)),\n    \"column3\": pa.Column(\n        str,\n        [\n            pa.Check.isin([*\"abc\"]),\n            pa.Check(lambda series: series.str.len() == 1),\n        ]\n    ),\n})\n\nprint(schema.validate(df))\n#    column1  column2 column3\n# 0        1      1.1       a\n# 1        2      1.2       b\n# 2        3      1.3       c\n```\n\nOr validate the data using the class-based API:\n\n```python\n# define a schema\nclass Schema(pa.DataFrameModel):\n    column1: int = pa.Field(ge=0)\n    column2: float = pa.Field(lt=10)\n    column3: str = pa.Field(isin=[*\"abc\"])\n\n    @pa.check(\"column3\")\n    def custom_check(cls, series: pd.Series) -\u003e pd.Series:\n        return series.str.len() == 1\n\nprint(Schema.validate(df))\n#    column1  column2 column3\n# 0        1      1.1       a\n# 1        2      1.2       b\n# 2        3      1.3       c\n```\n\n\n\u003e [!WARNING]\n\u003e Pandera `v0.24.0` introduces the `pandera.pandas` module, which is now the\n\u003e (highly) recommended way of defining `DataFrameSchema`s and `DataFrameModel`s\n\u003e for `pandas` data structures like `DataFrame`s. Defining a dataframe schema from\n\u003e the top-level `pandera` module will produce a `FutureWarning`:\n\u003e\n\u003e ```python\n\u003e import pandera as pa\n\u003e\n\u003e schema = pa.DataFrameSchema({\"col\": pa.Column(str)})\n\u003e ```\n\u003e\n\u003e Update your import to:\n\u003e\n\u003e ```python\n\u003e import pandera.pandas as pa\n\u003e ```\n\u003e\n\u003e And all of the rest of your pandera code should work. Using the top-level\n\u003e `pandera` module to access `DataFrameSchema` and the other pandera classes\n\u003e or functions will be deprecated in version `0.29.0`\n\n\n## Next steps\n\nSee the [official documentation](https://pandera.readthedocs.io) to learn more.\n","funding_links":["https://github.com/sponsors/cosmicBboy"],"categories":["Python","Data Validation","Data Processing","testing","🐍 Python","\u003ca id=\"tools\"\u003e\u003c/a\u003e🛠️ Tools","📋 Contents","Data Containers \u0026 Dataframes"],"sub_categories":["Synthetic Data","Data Management","Useful Python Tools for Data Analysis","NLP","🧬 1. Core Frameworks \u0026 Libraries"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funionai-oss%2Fpandera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funionai-oss%2Fpandera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funionai-oss%2Fpandera/lists"}