{"id":13905726,"url":"https://github.com/shawnbrown/datatest","last_synced_at":"2026-01-18T00:39:04.146Z","repository":{"id":45607723,"uuid":"58643049","full_name":"shawnbrown/datatest","owner":"shawnbrown","description":"Tools for test driven data-wrangling and data validation.","archived":false,"fork":false,"pushed_at":"2021-12-05T17:44:33.000Z","size":3434,"stargazers_count":294,"open_issues_count":15,"forks_count":13,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-09T15:20:20.343Z","etag":null,"topics":["data-wrangling","pytest-plugin","python","quality-assurance","testing","unittest"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shawnbrown.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-05-12T13:16:27.000Z","updated_at":"2024-09-30T01:36:59.000Z","dependencies_parsed_at":"2022-08-28T06:22:03.518Z","dependency_job_id":null,"html_url":"https://github.com/shawnbrown/datatest","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnbrown%2Fdatatest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnbrown%2Fdatatest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnbrown%2Fdatatest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shawnbrown%2Fdatatest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shawnbrown","download_url":"https://codeload.github.com/shawnbrown/datatest/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226336657,"owners_count":17608868,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-wrangling","pytest-plugin","python","quality-assurance","testing","unittest"],"created_at":"2024-08-06T23:01:22.315Z","updated_at":"2026-01-18T00:39:04.139Z","avatar_url":"https://github.com/shawnbrown.png","language":"Python","readme":"\n********************************************************\ndatatest: Test driven data-wrangling and data validation\n********************************************************\n\n|licensebadge| |pythonbadge| |requiresbadge|\n|repobadge| |buildbadge| |statusbadge| |stabledocsbadge| |latestdocsbadge|\n\nDatatest helps to speed up and formalize data-wrangling and data\nvalidation tasks. It implements a system of validation methods,\ndifference classes, and acceptance managers. Datatest can help you:\n\n* Clean and wrangle data faster and more accurately.\n* Maintain a record of checks and decisions regarding important data sets.\n* Distinguish between ideal criteria and acceptible deviation.\n* Validate the input and output of data pipeline components.\n* Measure progress of data preparation tasks.\n* On-board new team members with an explicit and structured process.\n\nDatatest can be used directly in your own projects or as part of a testing\nframework like pytest_ or unittest_. It has no hard dependencies; it's\ntested on Python 2.6, 2.7, 3.2 through 3.10, PyPy, and PyPy3; and is freely\navailable under the Apache License, version 2.\n\n.. _pytest: https://pytest.org\n.. _unittest: https://docs.python.org/library/unittest.html\n\n\n:Documentation:\n    | https://datatest.readthedocs.io/ (stable)\n    | https://datatest.readthedocs.io/en/latest/ (latest)\n\n:Official:\n    | https://pypi.org/project/datatest/\n\n\nCode Examples\n=============\n\nValidating a Dictionary of Lists\n--------------------------------\n\n.. code-block:: python\n\n    from datatest import validate, accepted, Invalid\n\n\n    data = {\n        'A': [1, 2, 3, 4],\n        'B': ['x', 'y', 'x', 'x'],\n        'C': ['foo', 'bar', 'baz', 'EMPTY']\n    }\n\n    validate(data.keys(), {'A', 'B', 'C'})\n\n    validate(data['A'], int)\n\n    validate(data['B'], {'x', 'y'})\n\n    with accepted(Invalid('EMPTY')):\n        validate(data['C'], str.islower)\n\n\nValidating a Pandas DataFrame\n-----------------------------\n\n.. code-block:: python\n\n    import pandas as pd\n    from datatest import register_accessors, accepted, Invalid\n\n\n    register_accessors()\n    df = pd.read_csv('data.csv')\n\n    df.columns.validate({'A', 'B', 'C'})\n\n    df['A'].validate(int)\n\n    df['B'].validate({'x', 'y'})\n\n    with accepted(Invalid('EMPTY')):\n        df['C'].validate(str.islower)\n\n\nInstallation\n============\n\n.. start-inclusion-marker-install\n\nThe easiest way to install datatest is to use `pip \u003chttps://pip.pypa.io\u003e`_:\n\n.. code-block:: console\n\n    pip install datatest\n\nIf you are upgrading from version 0.11.0 or newer, use the ``--upgrade``\noption:\n\n.. code-block:: console\n\n    pip install --upgrade datatest\n\n\nUpgrading From Version 0.9.6\n----------------------------\n\nIf you have an existing codebase of older datatest scripts, you should\nupgrade using the following steps:\n\n* Install datatest 0.10.0 first:\n\n  .. code-block:: console\n\n      pip install --force-reinstall datatest==0.10.0\n\n* Run your existing code and check for DeprecationWarnings.\n\n* Update the parts of your code that use deprecated features.\n\n* Once your code is running without DeprecationWarnings,\n  install the latest version of datatest:\n\n  .. code-block:: console\n\n      pip install --upgrade datatest\n\n\nStuntman Mike\n-------------\n\nIf you need bug-fixes or features that are not available\nin the current stable release, you can \"pip install\" the\ndevelopment version directly from GitHub:\n\n.. code-block:: console\n\n    pip install --upgrade https://github.com/shawnbrown/datatest/archive/master.zip\n\nAll of the usual caveats for a development install should\napply---only use this version if you can risk some instability\nor if you know exactly what you're doing. While care is taken\nto never break the build, it can happen.\n\n\nSafety-first Clyde\n------------------\n\nIf you need to review and test packages before installing, you can\ninstall datatest manually.\n\nDownload the latest **source** distribution from the Python Package\nIndex (PyPI):\n\n    https://pypi.org/project/datatest/#files\n\nUnpack the file (replacing X.Y.Z with the appropriate version number)\nand review the source code:\n\n.. code-block:: console\n\n    tar xvfz datatest-X.Y.Z.tar.gz\n\nChange to the unpacked directory and run the tests:\n\n.. code-block:: console\n\n    cd datatest-X.Y.Z\n    python setup.py test\n\nDon't worry if some of the tests are skipped. Tests for optional data\nsources (like pandas DataFrames or NumPy arrays) are skipped when the\nrelated third-party packages are not installed.\n\nIf the source code and test results are satisfactory, install the\npackage:\n\n.. code-block:: console\n\n    python setup.py install\n\n.. end-inclusion-marker-install\n\n\nSupported Versions\n==================\n\nTested on Python 2.6, 2.7, 3.2 through 3.10, PyPy, and PyPy3.\nDatatest is pure Python and may also run on other implementations\nas well (check using \"setup.py test\" before installing).\n\n\nBackward Compatibility\n======================\n\nIf you have existing tests that use API features which have\nchanged since 0.9.0, you can still run your old code by\nadding the following import to the beginning of each file:\n\n.. code-block:: python\n\n    from datatest.__past__ import api09\n\nTo maintain existing test code, this project makes a best-effort\nattempt to provide backward compatibility support for older\nfeatures. The API will be improved in the future but only in\nmeasured and sustainable ways.\n\nAll of the data used at the `National Committee for an Effective\nCongress \u003chttp://www.ncec.org/about\u003e`_ has been checked with\ndatatest for several years so there is, already, a large and\ngrowing codebase that relies on current features and must be\nmaintained into the future.\n\n\nSoft Dependencies\n=================\n\nDatatest has no hard, third-party dependencies. But if you want\nto interface with pandas DataFrames, NumPy arrays, or other\noptional data sources, you will need to install the relevant\npackages (``pandas``, ``numpy``, etc.).\n\n\nDevelopment Repository\n======================\n\nThe development repository for ``datatest`` is hosted on\n`GitHub \u003chttps://github.com/shawnbrown/datatest\u003e`_.\n\n\n----------\n\nFreely licensed under the Apache License, Version 2.0\n\nCopyright 2014 - 2021 National Committee for an Effective Congress, et al.\n\n\n.. start-inclusion-marker-badge-substitutions\n\n.. |buildbadge| image:: https://img.shields.io/travis/shawnbrown/datatest?logo=travis-ci\u0026logoColor=white\u0026style=flat-square\n    :target: https://travis-ci.org/shawnbrown/datatest\n    :alt: Current Build Status\n\n.. |pypibadge| image:: https://img.shields.io/pypi/v/datatest?logo=pypi\u0026logoColor=white\u0026style=flat-square\n    :target: https://pypi.org/project/datatest/\n    :alt: Current PyPI Version\n\n.. |commitsbadge| image:: https://img.shields.io/github/commits-since/shawnbrown/datatest/latest?color=informational\u0026logo=github\u0026logoColor=white\u0026style=flat-square\n    :target: https://github.com/shawnbrown/datatest/\n    :alt: Commits Since Last Release\n\n.. |statusbadge| image:: https://img.shields.io/pypi/status/datatest?label=PyPI%20status\u0026logo=pypi\u0026logoColor=white\u0026style=flat-square\n    :target: https://pypi.org/project/datatest/\n    :alt: Development Status\n\n.. |licensebadge| image:: https://img.shields.io/badge/license-Apache_2-informational?logo=open-source-initiative\u0026logoColor=white\u0026style=flat-square\n    :target: https://opensource.org/licenses/Apache-2.0\n    :alt: Apache 2.0 License\n\n.. |pythonbadge| image:: https://img.shields.io/badge/python-2.6_|_2.7_|_3.2_through_3.10_|_PyPy_|_PyPy3-informational?logo=python\u0026logoColor=white\u0026style=flat-square\n    :target: https://pypi.org/project/datatest/#supported-versions\n    :alt: Supported Python Versions\n\n.. |requiresbadge| image:: https://img.shields.io/badge/install_requires-no_dependencies-informational?logo=pypi\u0026logoColor=white\u0026style=flat-square\n    :target: https://pypi.org/project/datatest/#installation\n    :alt: Installation Requirements\n\n.. |repobadge| image:: https://img.shields.io/badge/repo-GitHub-informational?logo=github\u0026logoColor=white\u0026style=flat-square\n    :target: https://github.com/shawnbrown/datatest/\n    :alt: Development Repository\n\n.. |stabledocsbadge| image:: https://img.shields.io/badge/docs_(stable)-Read_the_Docs-informational?logo=read-the-docs\u0026logoColor=white\u0026style=flat-square\n    :target: https://datatest.readthedocs.io/en/stable/\n    :alt: Documentation (stable)\n\n.. |latestdocsbadge| image:: https://img.shields.io/badge/docs_(latest)-Read_the_Docs-informational?logo=read-the-docs\u0026logoColor=white\u0026style=flat-square\n    :target: https://datatest.readthedocs.io/en/latest/\n    :alt: Documentation (latest)\n\n.. |starsbadge| image:: https://img.shields.io/github/stars/shawnbrown/datatest.svg?logo=github\u0026logoColor=white\u0026style=flat-square\n    :target: https://github.com/shawnbrown/datatest/stargazers\n    :alt: GitHub users who have starred this project\n\n.. end-inclusion-marker-badge-substitutions\n","funding_links":[],"categories":["Python","数据读写与提取","(4) Packages"],"sub_categories":["(3.4) :blue_book: Books / papers"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshawnbrown%2Fdatatest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshawnbrown%2Fdatatest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshawnbrown%2Fdatatest/lists"}