{"id":30862325,"url":"https://github.com/luanee/pandera-report","last_synced_at":"2025-09-07T17:14:55.683Z","repository":{"id":196070955,"uuid":"694275767","full_name":"Luanee/pandera-report","owner":"Luanee","description":"Pandera Report for row-based reporting by using the power of pandera.","archived":false,"fork":false,"pushed_at":"2024-09-27T22:41:02.000Z","size":111,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-31T09:33:50.621Z","etag":null,"topics":["pandera","reporting"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Luanee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-09-20T17:09:46.000Z","updated_at":"2025-01-19T10:53:37.000Z","dependencies_parsed_at":"2023-12-19T08:10:19.369Z","dependency_job_id":"67a28e69-8f88-4574-89aa-5c98517923d7","html_url":"https://github.com/Luanee/pandera-report","commit_stats":{"total_commits":41,"total_committers":1,"mean_commits":41.0,"dds":0.0,"last_synced_commit":"1b92013e3c3b5ae1e1e531b287d9ce0b857e1dc4"},"previous_names":["luanee/pandera-report"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/Luanee/pandera-report","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Luanee%2Fpandera-report","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Luanee%2Fpandera-report/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Luanee%2Fpandera-report/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Luanee%2Fpandera-report/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Luanee","download_url":"https://codeload.github.com/Luanee/pandera-report/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Luanee%2Fpandera-report/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274066376,"owners_count":25216447,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandera","reporting"],"created_at":"2025-09-07T17:14:52.095Z","updated_at":"2025-09-07T17:14:55.673Z","avatar_url":"https://github.com/Luanee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\" style=\"color: #a3cef1\"\u003e\n  Pandera Extension for row-based reporting\n\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n    \u003cdiv align=\"center\"\u003e\n        \u003c!-- Line 1 --\u003e\n        \u003ca href=\"https://python.org\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/python-v3.9+-white.svg?logo=python\u0026logoColor=a3cef1\u0026label=python\u0026color=a3cef1\" alt=\"Python version\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://www.union.ai/pandera\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/Pandera-v0.17.0+%20-white.svg?logo=pandera\u0026style=flat\u0026color=a3cef1\u0026label=pandera\" alt=\"Pandera Version\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://pypi.org/project/pandera-report\" target=\"_blank\"\u003e\n            \u003cimg src=\"https://img.shields.io/pypi/v/pandera-report?style=flat\u0026color=a3cef1\u0026label=pypi\" alt=\"Package version\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://github.com/pre-commit/pre-commit\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=a3cef1\u0026color=a3cef1\" alt=\"Pre-commit\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://github.com/psf/black\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg?color=a3cef1\" alt=\"Black\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://pycqa.github.io/isort/\"\u003e\n            \u003cimg src=\"https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026color=a3cef1\" alt=\"isort\"\u003e\n        \u003c/a\u003e\n        \u003ca href=\"https://github.com/luanee/pandera-report/actions/workflows/pipeline.yml?event=push\u0026query=branch%3Amain\" target=\"_blank\"\u003e\n            \u003cimg src=\"https://img.shields.io/github/actions/workflow/status/luanee/pandera-report/pipeline.yml?branch=main\u0026label=tests\u0026style=flat\u0026color=a3cef1\" alt=\"Test\"\u003e\n        \u003c/a\u003e\n    \u003c/div\u003e\n\u003c/p\u003e\n\n---\n\n## 🚀 Description\n\n\u003e [pandera](https://github.com/unionai-oss/pandera) provides a flexible and expressive API for performing data\n\u003e validation on dataframe-like objects to make data processing pipelines more\n\u003e readable and robust\n\nIf you have to report potential quality issues resulting from the dataframe validation via `pandera`, than `pandera-report` is your friend. Based on the information of possible validation issues that pandera provides, your original dataframe will be extended with these issues on a row-level base.\n\nWith\n`pandera-report`, you can:\n\n- Seamlessly integrates with the `pandera` library to provide enhanced data validation capabilities without interfering with the pandera functionality.\n- Provides a convenient way to enrich your data with information about why specific rows failed validation.\n\n## ⚡ Setup\n\nUsing pip:\n\n```bash\npip install pandera-report\n```\n\nUsing poetry:\n\n```bash\npoetry add pandera-report\n```\n\n## Quick start\n\nThe following example is taken from the `pandera` documentation and shows the definition of a DataFrameSchema which will end in a valid result for the provided dataframe.\n\n```Python\nimport pandas as pd\nimport pandera as pa\n\n\n# data to validate\ndf = pd.DataFrame({\n    \"column1\": [1, 4, 0, 10, 9],\n    \"column2\": [-1.3, -1.4, -2.9, -10.1, -20.4],\n    \"column3\": [\"value_1\", \"value_2\", \"value_3\", \"value_2\", \"value_1\"]\n})\n\n# define schema\nschema = pa.DataFrameSchema({\n    \"column1\": pa.Column(int, checks=pa.Check.le(10)),\n    \"column2\": pa.Column(float, checks=pa.Check.lt(-1.2)),\n    \"column3\": pa.Column(str, checks=[\n        pa.Check.str_startswith(\"value_\"),\n        # define custom checks as functions that take a series as input and\n        # outputs a boolean or boolean Series\n        pa.Check(lambda s: s.str.split(\"_\", expand=True).shape[1] == 2)\n    ]),\n})\n\nvalidated_df = schema(df)\nprint(validated_df)\n\n#     column1  column2  column3\n#  0        1     -1.3  value_1\n#  1        4     -1.4  value_2\n#  2        0     -2.9  value_3\n#  3       10    -10.1  value_2\n#  4        9    -20.4  value_1\n```\n\nTo make usage of the `pandera-report` functionality for the same schema and dataframe, you can do this:\n\n```Python\n\nvalidator = DataFrameValidator() # default is quality_report=True, lazy=True\nprint(validator.validate(schema, df))\n\n#     column1  column2  column3 quality_issues quality_status\n#  0        1     -1.3  value_1           None          Valid\n#  1        4     -1.4  value_2           None          Valid\n#  2        0     -2.9  value_3           None          Valid\n#  3       10    -10.1  value_2           None          Valid\n#  4        9    -20.4  value_1           None          Valid\n```\n\nYou see?! Same result but extended by the fact that the validation of the dataframe was completely valid. This can also be deactivated for the case that everything is 100% valid.\n\nBut what if the dataframe contains data quality issues? `pandera` will throw SchemaErrors or SchemaError (depends on the lazyness). Let's see what `pandera-report` does, if we change the dataframe against the schema definition:\n\n```Python\n\n# data to validate\ndf = pd.DataFrame({\n    \"column1\": [1, 4, 0, 10, 9],\n    \"column2\": [-1.3, -1.4, -2.9, -10.1, -20.4],\n    \"column3\": [\"value_1\", \"value_2\", \"value_3\", \"value_2\", \"value1\"]\n})\n\nvalidator = DataFrameValidator()\nprint(validator.validate(schema, df))\n\n#     column1  column2  column3                              quality_issues quality_status\n#  0        1     -1.3  value_1                                        None          Valid\n#  1        4     -1.4  value_2                                        None          Valid\n#  2        0     -2.9  value_3                                        None          Valid\n#  3       10    -10.1  value_2                                        None          Valid\n#  4        9    -20.4   value1  Column \u003ccolumn3\u003e: str_startswith('value_')        Invalid\n```\n\nWhy is this useful? Quite simply, it becomes particularly interesting when you are not the one who has to prepare a valid file so that it can be processed into a valid DataFrame in the end.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluanee%2Fpandera-report","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluanee%2Fpandera-report","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluanee%2Fpandera-report/lists"}