Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kwanUm/awesome-data-quality
Curated list of tools and frameworks assisting in monitoring data quality
https://github.com/kwanUm/awesome-data-quality
List: awesome-data-quality
Last synced: 16 days ago
JSON representation
Curated list of tools and frameworks assisting in monitoring data quality
- Host: GitHub
- URL: https://github.com/kwanUm/awesome-data-quality
- Owner: kwanUm
- License: apache-2.0
- Created: 2022-04-02T13:43:41.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-04-03T08:18:02.000Z (over 2 years ago)
- Last Synced: 2024-05-22T19:02:20.593Z (7 months ago)
- Size: 28.3 KB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-data-quality - Curated list of tools and frameworks assisting in monitoring data quality. (Other Lists / Monkey C Lists)
README
# awesome-data-quality
A curated list of awesome tools for testing and monitoring data quality - typically at the data warehouse/lake or within running data pipelines.
_If you want to contribute to this list (please do), send me a pull request or [contact me](https://mobile.twitter.com/orikabeli)._
## Table of Contents
TBD
### Frameworks and Libraries#### Open sourced
* [elementary](https://github.com/elementary-data/elementary) - Data monitoring and observability tailored to dbt.
* [mobydq](https://github.com/ubisoft/mobydq) - tool for data engineering teams to run & automate data quality checks on their data pipeline.
* [ydata-quality](https://github.com/ydataai/ydata-quality) - python library for assessing data quality throughout stages of the data pipeline development.
* [great-expectations](https://github.com/great-expectations/great_expectations) - tool for data testing, documentation, and profiling.
* [deepqu](https://github.com/awslabs/python-deequ) - libray by Amazon for defining unit tests for data with focus on large datasets. Based on Apache Spark.
* [soda](https://github.com/sodadata/soda-core) - enables data testing through extended SQL queries.
* [dqm](https://github.com/piotr-kalanski/data-quality-monitoring) - another data quality monitoring tool implemented using Spark.
* [owl-sanitizer](https://github.com/ronald-smith-angel/owl-data-sanitizer) - yet another Spark based lightweight data validation framework.
* [griffin](https://github.com/apache/griffin) - Data Quality solution for distributed data systems at any scale in both streaming and batch data context.
* [drunken-data-quality](https://github.com/FRosner/drunken-data-quality)
* [DataQuality for BigData](https://github.com/agile-lab-dev/DataQuality)
* [TopNotch](https://github.com/blackrock/TopNotch)
* [Phasor Data Quality Tracker](https://github.com/GridProtectionAlliance/pdqtracker)
* [DataCleaner](https://github.com/datacleaner/DataCleaner)
* [data-quality](https://github.com/Talend/data-quality)##### Geared for ML
* [deepchecks](https://github.com/deepchecks/deepchecks) - tool for validating your machine learning models and data. Implemented test suites tailored towards ML models datasets and outputs.
* [evidently](https://github.com/evidentlyai/evidently) - analyze and track data and ML model output quality.##### Pipelines with data quality included
* [dbt](https://docs.getdbt.com/docs/building-a-dbt-project/tests), [dataform](https://dataform.co/blog/data-assertions) - ELT tools that comes with a handy utility to define tests as SQL queries.#### Paid
Offering ranges from data to pipelines testing, with focus on real-time monitoring, automation of tests creation & threshold setting, and addditional enterprise features.
* [Bigeye](https://bigeye.com)
* [Soda](https://soda.io)
* [Databand](https://databand.ai)
* [Monte Carlo](https://montecarlodata.com)
* [great expectations](https://greatexpectations.io)
* [Sifflet](https://siffletapp.com)
* [Validio](https://validio.io)
* [Lightup](https://lightup.ai)
* [Lantern](https://lantern.so)
* [Metaplane](https://metaplane.dev)
* [Datafold](https://datafold.com)
* [Acceldata](https://acceldata.io)
* [Anomalo](https://anomalo.com)
* [Marquez](https://marquezproject.github.io)TODOs
* Add tools for unstructured data (Arthur, Robust)
*