{"id":37063742,"url":"https://github.com/mfcabrera/hooqu","last_synced_at":"2026-01-14T07:17:45.014Z","repository":{"id":37791571,"uuid":"226362221","full_name":"mfcabrera/hooqu","owner":"mfcabrera","description":"hooqu is a library built on top of Pandas-like Dataframes for defining \"unit tests for data\". This is a spiritual port of Apache Deequ to Python","archived":false,"fork":false,"pushed_at":"2024-12-09T08:05:09.000Z","size":214,"stargazers_count":29,"open_issues_count":23,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-26T21:33:18.313Z","etag":null,"topics":["data-quality","data-quality-checks","data-science"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mfcabrera.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.rst","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-06T15:54:47.000Z","updated_at":"2025-03-19T20:34:35.000Z","dependencies_parsed_at":"2024-03-25T05:22:21.724Z","dependency_job_id":"1e10b051-d26a-4ff0-a9e9-4522927bff11","html_url":"https://github.com/mfcabrera/hooqu","commit_stats":{"total_commits":201,"total_committers":3,"mean_commits":67.0,"dds":"0.19402985074626866","last_synced_commit":"d0216e282186540eadfbae605ded62393a10332c"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/mfcabrera/hooqu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfcabrera%2Fhooqu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfcabrera%2Fhooqu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfcabrera%2Fhooqu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfcabrera%2Fhooqu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mfcabrera","download_url":"https://codeload.github.com/mfcabrera/hooqu/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mfcabrera%2Fhooqu/sbom","scorecard":{"id":638691,"data":{"date":"2025-08-11","repo":{"name":"github.com/mfcabrera/hooqu","commit":"d0216e282186540eadfbae605ded62393a10332c"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.5,"checks":[{"name":"Code-Review","score":2,"reason":"Found 7/29 approved changesets -- score normalized to 2","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":4,"reason":"6 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2020-73","Warn: Project is vulnerable to: PYSEC-2024-48 / GHSA-fj7x-q9j7-g6q6","Warn: Project is vulnerable to: PYSEC-2021-437 / GHSA-5xp3-jfq3-5q8x","Warn: Project is vulnerable to: PYSEC-2023-228 / GHSA-mq26-g339-26xf","Warn: Project is vulnerable to: PYSEC-2021-142 / GHSA-8q59-q68h-6hv4","Warn: Project is vulnerable to: PYSEC-2022-43017 / GHSA-qwmp-2cf2-g9g6"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 9 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-21T10:01:50.459Z","repository_id":37791571,"created_at":"2025-08-21T10:01:50.460Z","updated_at":"2025-08-21T10:01:50.460Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28412790,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-quality","data-quality-checks","data-science"],"created_at":"2026-01-14T07:17:44.376Z","updated_at":"2026-01-14T07:17:45.008Z","avatar_url":"https://github.com/mfcabrera.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"===============================\nHooqu - Unit Tests for Data\n===============================\n\n.. image:: https://img.shields.io/pypi/v/hooqu.svg\n        :target: https://pypi.python.org/pypi/hooqu\n.. image:: https://travis-ci.com/mfcabrera/hooqu.svg?token=pq89mpsBBBTg11hAgCHH\u0026branch=master\n        :target: https://travis-ci.com/mfcabrera/hooqu\n.. image:: https://readthedocs.org/projects/hooqu/badge/?version=latest\n        :target: https://hooqu.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation Status\n.. image:: https://codecov.io/gh/mfcabrera/hooqu/branch/master/graph/badge.svg\n  :target: https://codecov.io/gh/mfcabrera/hooqu\n.. image:: https://pyup.io/repos/github/mfcabrera/hooqu/shield.svg\n     :target: https://pyup.io/repos/github/mfcabrera/hooqu/\n     :alt: Updates\n\n----------\n\n**Documentation**: https://hooqu.readthedocs.io\n\n**Source Code**: https://github.com/mfcabrera/hooqu\n\n----------\n\nHooqu is a library built on top of Pandas dataframes for defining \"unit tests for data\",\nwhich measure data quality datasets.\n\nHooqu is a \"spiritual\" Python port of `Apache Deequ \u003chttps://github.com/awslabs/deequ/\u003e`_ and\nis currently in an experimental state. I am happy to receive feedback and contributions.\n\nThe main motivation of Hooqu is to enable data science projects to discover the quality of their input/output data using a similar API to the on found in Deequ, allowing to share\nthe same vocabulary of checks between different teams.\n\nInstall\n-------\n\nHooqu requires Pandas \u003e= 1.0 and Python \u003e= 3.7. To install via pip use:\n\n::\n\n   pip install hooqu\n\n\nQuick Start\n-----------\n\n\n.. code:: python\n\n   import pandas as pd\n\n   # data to validate\n   df = pd.DataFrame(\n          [\n              (1, \"Thingy A\", \"awesome thing.\", \"high\", 0),\n              (2, \"Thingy B\", \"available at http://thingb.com\", None, 0),\n              (3, None, None, \"low\", 5),\n              (4, \"Thingy D\", \"checkout https://thingd.ca\", \"low\", 10),\n              (5, \"Thingy E\", None, \"high\", 12),\n          ],\n          columns=[\"id\", \"productName\", \"description\", \"priority\", \"numViews\"]\n   )\n\nChecks we want to perform:\n\n- there are 5 rows in total\n- values of the id attribute are never Null/None and unique\n- values of the productName attribute are never null/None\n- the priority attribute can only contain \"high\" or \"low\" as value\n- numViews should not contain negative values\n- at least half of the values in description should contain a url\n- the median of numViews should be less than or equal to 10\n\nIn code this looks as follows:\n\n.. code:: python\n\n    from hooqu.checks import Check, CheckLevel, CheckStatus\n    from hooqu.verification_suite import VerificationSuite\n    from hooqu.constraints import ConstraintStatus\n\n\n    verification_result = (\n          VerificationSuite()\n          .on_data(df)\n          .add_check(\n              Check(CheckLevel.ERROR, \"Basic Check\")\n              .has_size(lambda sz: sz == 5)  # we expect 5 rows\n              .is_complete(\"id\")  # should never be None/Null\n              .is_unique(\"id\")  # should not contain duplicates\n              .is_complete(\"productName\")  # should never be None/Null\n              .is_contained_in(\"priority\", (\"high\", \"low\"))\n              .is_non_negative(\"numViews\")\n              # at least half of the descriptions should contain a url\n              .contains_url(\"description\", lambda d: d \u003e= 0.5)\n              # half of the items should have less than 10 views\n              .has_quantile(\"numViews\", 0.5, lambda v: v \u003c= 10)\n          )\n          .run()\n    )\n\n\n\nAfter calling ``run``, hooqu will compute some metrics on the data. Afterwards it invokes your assertion functions\n(e.g., ``lambda sz: sz == 5`` for the size check) on these metrics to see if the constraints hold on the data.\n\nWe can inspect the `VerificationResult \u003chttps://github.com/mfcabrera/hooqu/blob/b2c522854c674db9496c89d540df3fe4bb30d882/hooqu/verification_suite.py#L17\u003e`_ to see if the test found errors:\n\n.. code:: python\n\n    if verification_result.status == CheckStatus.SUCCESS:\n          print(\"Alles klar: The data passed the test, everything is fine!\")\n    else:\n          print(\"We found errors in the data\")\n\n    for check_result in verification_result.check_results.values():\n          for cr in check_result.constraint_results:\n              if cr.status != ConstraintStatus.SUCCESS:\n                  print(f\"{cr.constraint}: {cr.message}\")\n\n\nIf we run the example, we get the following output:\n\n::\n\n   We found errors in the data\n   CompletenessConstraint(Completeness(productName)): Value 0.8 does not meet the constraint requirement.\n   PatternMatchConstraint(containsURL(description)): Value 0.4 does not meet the constraint requirement.\n\nThe test found that our assumptions are violated! Only 4 out of 5 (80%) of the values of the productName attribute are non-null and only 2 out of 5 (40%) values of the description attribute contained a url.\nFortunately, we ran a test and found the errors, somebody should immediately fix the data :)\n\n\nContributing\n------------\n\nAll contributions, bug reports, bug fixes, documentation improvements,\nenhancements and ideas are welcome.  Please use `GitHub issues\n\u003chttps://github.com/mfcabrera/hooqu/issues\u003e`_: for bug reports,\nfeature requests, install issues, RFCs, thoughts, etc.\n\nSee the full `cotributing guide \u003chttps://github.com/mfcabrera/hooqu/blob/master/CONTRIBUTING.rst\u003e`_ for more information.\n\n\nWhy Hooqu?\n----------\n\n- Easy to use declarative API to add data verification steps to your\n  data processing pipeline.\n- The ``VerificationResult`` allows you know not only what check fail\n  but the values of the computed metric, allowing for flexible\n  handling of issues with the data.\n- Incremental metric computation capability allows to compare quality\n  metrics change across time (planned).\n- Support for storing and loading computed metrics (planned).\n\n\n\nReferences\n----------\n\nThis project is a \"spiritual\" port of `Apache Deequ \u003chttps://github.com/awslabs/deequ/\u003e`_ and thus tries to implement\nthe declarative API described on the paper \"`Automating large-scale data quality verification \u003chttp://www.vldb.org/pvldb/vol11/p1781-schelter.pdf\u003e`_\"\nwhile trying to remain pythonic as much as possible. This project does not use (py)Spark but rather\nPandas (and hopefully in the future it will support other compatible dataframe implementations).\n\n\nName\n----\n\nJukumari (pronounced hooqumari) is the Aymara name for the `spectacled bear \u003chttps://en.wikipedia.org/wiki/Spectacled_bear\u003e`_ (*Tremarctos ornatus*), also known as the Andean\nbear, Andean short-faced bear, or mountain bear.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmfcabrera%2Fhooqu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmfcabrera%2Fhooqu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmfcabrera%2Fhooqu/lists"}