{"id":22903997,"url":"https://github.com/json2d/validframe","last_synced_at":"2026-05-16T22:04:18.452Z","repository":{"id":62586967,"uuid":"240632155","full_name":"json2d/validframe","owner":"json2d","description":"a validation library for Pandas dataframes","archived":false,"fork":false,"pushed_at":"2020-03-21T02:40:45.000Z","size":82,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-06-10T20:01:25.469Z","etag":null,"topics":["pandas-dataframes","validation-library"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/json2d.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-15T02:23:14.000Z","updated_at":"2021-06-10T04:22:12.000Z","dependencies_parsed_at":"2022-11-03T22:16:08.970Z","dependency_job_id":null,"html_url":"https://github.com/json2d/validframe","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/json2d/validframe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/json2d%2Fvalidframe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/json2d%2Fvalidframe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/json2d%2Fvalidframe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/json2d%2Fvalidframe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/json2d","download_url":"https://codeload.github.com/json2d/validframe/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/json2d%2Fvalidframe/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266604009,"owners_count":23954725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas-dataframes","validation-library"],"created_at":"2024-12-14T02:39:42.967Z","updated_at":"2026-05-16T22:04:13.419Z","avatar_url":"https://github.com/json2d.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🖼 validframe\n[![PyPI version](https://badge.fury.io/py/validframe.svg)](https://badge.fury.io/py/validframe)\n[![Build Status](https://travis-ci.com/json2d/validframe.svg?branch=master)](https://travis-ci.com/json2d/validframe) [![Coverage Status](https://coveralls.io/repos/github/json2d/validframe/badge.svg?branch=master)](https://coveralls.io/github/json2d/validframe?branch=master)\n\n[`validium`](https://github.com/json2d/validium) validators for pandas dataframes\n\n## Quick install\n```bash\npip install validframe\n```\n\n## Basic usage\n\nNeed some faith in those frames? Let's dive in.\n\n### Predefined validators\n\nOut-of-the-box you get a set of validator factories to handle the considerably more common ways to validate dataframes:\n\n```py\nimport pandas as pd\nimport numpy as np\n\ndf = pd.DataFrame(\n  columns = ['like_counts','comment'], # headers\n  data = [\n    [42, 'hello world'], # row 0\n    [100000, '😆'], # row 1\n    [123456, 'lol'], # row 2\n    [987, \"you're the baz\"] # row 3\n  ])\n\n\nvalidators = [\n  vf.frame.not_empty(), # frame must be not empty\n  vf.frame.empty(), # frame must be empty\n  vf.frame.rows(4), # frame must have 4 rows\n  vf.frame.rows(100), # frame must have 100 rows\n  vf.frame.cols(2), # frame must have 2 cols\n\n  vf.rows.uniq(), # rows must be unique\n\n  vf.cells.all_is(str, cols=['comment']), # all cells must be instances of \u003cstr\u003e\n  vf.cells.all_eq(1, cols=['like_counts']), # all cells must equal 1\n  vf.cells.all_gt(0, cols=['like_counts']), # all cells must be greater than 0\n  vf.cells.all_lt(0, cols=['like_counts']), # all cells must be less than 0\n  vf.cells.all_gte(0, cols=['like_counts']), # all cells must be greater than or equal to 0\n  vf.cells.all_lte(0, cols=['like_counts']), # all cells must be less than or equal to 0\n\n  vf.cells.some_eq(42, cols=['like_counts']), # some cells must equal 42\n  vf.cells.some_is(np.nan, cols=['comment']), # some cells must be instances of \u003cnumpy.nan\u003e\n  vf.cells.some_gt(100000, cols=['like_counts']), # some cells must be greater than 100000\n  vf.cells.some_lt(987, cols=['like_counts']), # some cells must be less than 987\n  vf.cells.some_gte(100000, cols=['like_counts']), # some cells must be greater than or equal to 100000\n  vf.cells.some_lte(987, cols=['like_counts']), # some cells must be less than or equal to 987\n\n  vf.cells.none_eq(0, cols=['like_counts']), # no cells must equal 0\n  vf.cells.none_is(str, cols=['like_counts']), # no cells must be instances of \u003cstr\u003e\n  vf.cells.none_gt(100000, cols=['like_counts']), # no cells must be greater than 100000\n  vf.cells.none_lt(42, cols=['like_counts']), # no cells must be less than 42\n  vf.cells.none_gte(100000, cols=['like_counts']), # no cells must be greater than or equal to 100000\n  vf.cells.none_lte(42, cols=['like_counts']), # no cells must be less than or equal to 42   \n\n  vf.cells.some_or_none_is(str, cols=['comment']), # some or no cells must be instances of \u003cstr\u003e\n  vf.cells.some_or_none_eq(0, cols=['like_counts']), # some or no cells must equal 0\n  vf.cells.some_or_none_gt(0, cols=['like_counts']), # some or no cells must be greater than 0\n  vf.cells.some_or_none_lt(0, cols=['like_counts']), # some or no cells must be less than 0\n  vf.cells.some_or_none_gte(0, cols=['like_counts']), # some or no cells must be greater than or equal to 0\n  vf.cells.some_or_none_lte(0, cols=['like_counts']), # some or no cells must be less than or equal to 0\n\n  vf.cells.all_or_none_is(str, cols=['comment']), # all or no cells must be instances of \u003cstr\u003e\n  vf.cells.all_or_none_eq(42, cols=['like_counts']), # all or no cells must equal 42\n  vf.cells.all_or_none_gt(100000, cols=['like_counts']), # all or no cells must be greater than 100000\n  vf.cells.all_or_none_lt(987, cols=['like_counts']), # all or no cells must be less than 987\n  vf.cells.all_or_none_gte(100000, cols=['like_counts']), # all or no cells must be greater than or equal to 100000\n  vf.cells.all_or_none_lte(987, cols=['like_counts']), # all or no cells must be less than or equal to 987\n\n  vf.cells.all_or_some_is(str, cols=['comment']), # all or some cells must be instances of \u003cstr\u003e\n  vf.cells.all_or_some_eq(0, cols=['like_counts']), # all or some cells must equal 0\n  vf.cells.all_or_some_gt(100000, cols=['like_counts']), # all or some cells must be greater than 100000\n  vf.cells.all_or_some_lt(42, cols=['like_counts']), # all or some cells must be less than 42\n  vf.cells.all_or_some_gte(100000, cols=['like_counts']), # all or some cells must be greater than or equal to 100000\n  vf.cells.all_or_some_lte(42, cols=['like_counts']), # all or some cells must be less than or equal to 42   \n\n  vf.cells.sum_eq(-1, cols=['like_counts']), # all cells summed must equal -1\n  vf.cells.sum_gt(0, cols=['like_counts']), # all cells summed must be greater than 0\n  vf.cells.sum_lt(0, cols=['like_counts']), # all cells summed must be less than 0\n  vf.cells.sum_gte(0, cols=['like_counts']), # all cells must be greater than or equal to 0\n  vf.cells.sum_lte(0, cols=['like_counts']), # all cells must be less than or equal to 0\n\n  vf.cells.uniq(cols=['comments']) # all cells must be unique\n]\n\nfor v in validators:\n  try:\n    v.validate(df)\n  except AssertionError as err\n    print(err)\n\n# AssertionError: frame must be empty\n# AssertionError: frame must have 100 rows\n# AssertionError: (cols=['like_counts']) all cells must equal 1\n# AssertionError: (cols=['like_counts']) all cells must be less than 0\n# AssertionError: (cols=['like_counts']) all cells must be less than or equal 0\n# AssertionError: (cols=['comment']) some cells must be instances of \u003cnumpy.nan\u003e\n# AssertionError: (cols=['like_counts']) some cells must be greater than 100000\n# AssertionError: (cols=['like_counts']) some cells must be less than 987\n# AssertionError: (cols=['like_counts']) no cells must be greater than or equal to 100000\n# AssertionError: (cols=['like_counts']) no cells must be less than or equal to 42\n# AssertionError: (cols=['comment']) some or no cells must be instances of \u003cstr\u003e\n# AssertionError: (cols=['like_counts']) some or no cells must be greater than 0\n# AssertionError: (cols=['like_counts']) some or no cells must be greater than or equal to 0\n# AssertionError: (cols=['like_counts']) all or no cells must equal 42\n# AssertionError: (cols=['like_counts']) all or no cells must be greater than or equal to 100000\n# AssertionError: (cols=['like_counts']) all or no cells must be less than or equal to 987\n# AssertionError: (cols=['like_counts']) all or some cells must equal 0\n# AssertionError: (cols=['like_counts']) all or some cells must be greater than 100000\n# AssertionError: (cols=['like_counts']) all or some cells must be less than 42\n# AssertionError: (cols=['like_counts']) all cells summed must be less than 0 \n```\n\nNot quite exhaustive, but enough to cover basic use.\n\n\u003e Think there are some other common validators that are missing here? Proposals via issues and PRs are welcomed 👍\n\n## More advanced usage\n\n### Custom validators\n\nWhen none of the predefined validators can do the trick, well its time to roll up your sleeves and create your own validator.\n\nFor starters you can create a `CellsValidator` to validate dataframes by their cells:\n\n```py\nimport validframe as vf\n\ndf = pd.DataFrame(\n  columns: ['like_counts','comment'], # headers\n  data: [\n    [42, 'hello world'], # row 0\n    [100000, '😆'], # row 1\n    [123456, 'lol'], # row 2\n    [987, 'earth is definitely flat'] # row 3\n  ])\n\nalotta_likes_validator = vf.CellsValidator(\n  lambda xs: all([x \u003e= 1000 for x in xs]),\n  'all like counts must be atleast 1000'\n  cols=['like_counts']\n)\n\nalotta_likes_validator.validate(df) # AssertionError: all likes must be atleast 1000\n\n```\n\nYou can also create a `RowsValidator` to validate dataframes by their rows:\n\n```py\ndf = pd.DataFrame(\n  columns: ['date', 'total', 'subtotal', 'tax'], # headers\\\n  data: [\n    ['2020-01-11', 108.25, 100, 8.25], \n    ['2010-01-11', 106, 100, 6], \n    ['2009-01-11', 104.50, 100, 4.50] \n  ])\n\ntotal_validator = vf.RowsValidator(\n  lambda rows: all([row['total'] == row['sub_total'] + row['tax'] for row in rows]),\n  'all rows must have total equal the sub-total plus tax',\n  cols=['total', 'sub_total', 'tax']\n)\n\ntotal_validator.validate(df) # pass\n```\n\nIf you really enjoy `pandas` then you might prefer to create a `FrameValidator` to validate dataframes utilizing `pandas` and `numpy` to write the logic:\n\n```py\n\nimport pandas as pd\nimport numpy as np\n\nledger_df = pd.DataFrame(\n  columns = ['company', 'balance'],\n  data = [\n    ['Google', 100000], \n    ['Google', -90000], \n    ['Netflix', -10000], # will be unbalanced\n    ['Amazon', 0], \n    ['Google', -10000], \n  ]\n)\n\ndef is_balanced_by_company(df):\n  pivot_df = df.pivot_table(values='balance', columns=['company'], aggfunc=np.sum)\n  return pivot_df[pivot_df == 0].count().sum() == 0\n\nbalanced_validator = vf.FrameValidator(\n  is_balanced_by_company,\n  'sum of balances for every company must equals 0'\n)\n\nbalanced_validator.validate(ledger_df) # AssertionError: sum of balances for every company must equals 0\n\n```\n\n### Go functional\n\nAs with [`validium`](https://github.com/json2d/validium) validators in general, using a functional programming library like `ramda` can add brevity and readability to the code for your validation logic.\n\n```py\nimport ramda as R\n\n# same as above\nall_gt_zero_validator = vf.CellsValidator(\n  R.all(lambda x: x\u003e0),\n  'all cells must be greater than 0'\n  cols=['a']\n)\n```\n\nThis is especially true when your validation logic start to become a bit more complex:\n\n```py\nsum_numbers_eq_zero_validator = vf.CellsValidator(\n  R.compose(R.equals(0), R.sum, R.filter(lambda x: isinstance(x, Number)),\n  'all cells that are numbers summed must be greater than 0'\n  cols=['credit', 'debit']\n)\n```\n\n### Max flexibility\n\nAnother recommendation would be to use a function instead of a `lambda` when your validation logic can't be expressed comfortably as a onliner, eg. your logic involves making a request to a web API:\n\n```py\nimport pandas as pd\nimport request\n\ndef match_remote_checksums(df):\n  checksums = request.get(REMOTE_CHECKSUM_URL) # just imagine\n  remote_df = pd.DataFrame({'checksum': checksums})\n  return df.equals(remote_df)\n\n# as a oneliner:\n# match_remote_checksums = lambda df: pd.DataFrame({'checksum': request.get(REMOTE_CHECKSUM_URL)}).equals(df)\n\nvalidator = vf.FrameValidator(\n  match_remote_checksums, \n  'checksums must match the set from the server', \n  cols=['checksum']\n)\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjson2d%2Fvalidframe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjson2d%2Fvalidframe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjson2d%2Fvalidframe/lists"}