{"id":13655972,"url":"https://github.com/multimeric/PandasSchema","last_synced_at":"2025-04-23T17:30:52.159Z","repository":{"id":43891210,"uuid":"75599503","full_name":"multimeric/PandasSchema","owner":"multimeric","description":"A validation library for Pandas data frames using user-friendly schemas","archived":false,"fork":false,"pushed_at":"2023-03-24T11:48:47.000Z","size":785,"stargazers_count":191,"open_issues_count":38,"forks_count":36,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-11T21:03:22.237Z","etag":null,"topics":["data-science","pandas","schema","validation"],"latest_commit_sha":null,"homepage":"https://multimeric.github.io/PandasSchema/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/multimeric.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-05T07:22:21.000Z","updated_at":"2025-03-07T12:08:07.000Z","dependencies_parsed_at":"2024-06-18T15:25:29.931Z","dependency_job_id":"86055cbd-ee14-4391-b4e0-a479e0713dbd","html_url":"https://github.com/multimeric/PandasSchema","commit_stats":{"total_commits":89,"total_committers":9,"mean_commits":9.88888888888889,"dds":0.348314606741573,"last_synced_commit":"a38895ccd5b4e6d5c0279ca20389387128cdca4d"},"previous_names":["tmiguelt/pandasschema"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multimeric%2FPandasSchema","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multimeric%2FPandasSchema/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multimeric%2FPandasSchema/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/multimeric%2FPandasSchema/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/multimeric","download_url":"https://codeload.github.com/multimeric/PandasSchema/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250480333,"owners_count":21437524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","pandas","schema","validation"],"created_at":"2024-08-02T04:00:44.168Z","updated_at":"2025-04-23T17:30:51.173Z","avatar_url":"https://github.com/multimeric.png","language":"Python","readme":"\nPandasSchema\n************\n\nFor the full documentation, refer to the `Github Pages Website\n\u003chttps://tmiguelt.github.io/PandasSchema/\u003e`_.\n\n======================================================================\n\nPandasSchema is a module for validating tabulated data, such as CSVs\n(Comma Separated Value files), and TSVs (Tab Separated Value files).\nIt uses the incredibly powerful data analysis tool Pandas to do so\nquickly and efficiently.\n\nFor example, say your code expects a CSV that looks a bit like this:\n\n.. code:: default\n\n   Given Name,Family Name,Age,Sex,Customer ID\n   Gerald,Hampton,82,Male,2582GABK\n   Yuuwa,Miyake,27,Male,7951WVLW\n   Edyta,Majewska,50,Female,7758NSID\n\nNow you want to be able to ensure that the data in your CSV is in the\ncorrect format:\n\n.. code:: python\n\n   import pandas as pd\n   from io import StringIO\n   from pandas_schema import Column, Schema\n   from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation\n\n   schema = Schema([\n       Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),\n       Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),\n       Column('Age', [InRangeValidation(0, 120)]),\n       Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),\n       Column('Customer ID', [MatchesPatternValidation(r'\\d{4}[A-Z]{4}')])\n   ])\n\n   test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID\n   Gerald ,Hampton,82,Male,2582GABK\n   Yuuwa,Miyake,270,male,7951WVLW\n   Edyta,Majewska ,50,Female,775ANSID\n   '''))\n\n   errors = schema.validate(test_data)\n\n   for error in errors:\n       print(error)\n\nPandasSchema would then output\n\n.. code:: text\n\n   {row: 0, column: \"Given Name\"}: \"Gerald \" contains trailing whitespace\n   {row: 1, column: \"Age\"}: \"270\" was not in the range [0, 120)\n   {row: 1, column: \"Sex\"}: \"male\" is not in the list of legal options (Male, Female, Other)\n   {row: 2, column: \"Family Name\"}: \"Majewska \" contains trailing whitespace\n   {row: 2, column: \"Customer ID\"}: \"775ANSID\" does not match the pattern \"\\d{4}[A-Z]{4}\"\n","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmultimeric%2FPandasSchema","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmultimeric%2FPandasSchema","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmultimeric%2FPandasSchema/lists"}