https://github.com/kwanUm/awesome-data-quality

Curated list of tools and frameworks assisting in monitoring data quality
https://github.com/kwanUm/awesome-data-quality

List: awesome-data-quality

Last synced: 6 months ago
JSON representation

Curated list of tools and frameworks assisting in monitoring data quality

Host: GitHub
URL: https://github.com/kwanUm/awesome-data-quality
Owner: kwanUm
License: apache-2.0
Created: 2022-04-02T13:43:41.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-04-03T08:18:02.000Z (about 3 years ago)
Last Synced: 2024-05-22T19:02:20.593Z (about 1 year ago)
Size: 28.3 KB
Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ultimate-awesome - awesome-data-quality - Curated list of tools and frameworks assisting in monitoring data quality. (Other Lists / Julia Lists)

README

        # awesome-data-quality

A curated list of awesome tools for testing and monitoring data quality - typically at the data warehouse/lake or within running data pipelines.

_If you want to contribute to this list (please do), send me a pull request or [contact me](https://mobile.twitter.com/orikabeli)._

## Table of Contents

TBD

### Frameworks and Libraries

#### Open sourced

* [elementary](https://github.com/elementary-data/elementary) - Data monitoring and observability tailored to dbt.

* [mobydq](https://github.com/ubisoft/mobydq) -  tool for data engineering teams to run & automate data quality checks on their data pipeline.

* [ydata-quality](https://github.com/ydataai/ydata-quality) - python library for assessing data quality throughout stages of the data pipeline development.

* [great-expectations](https://github.com/great-expectations/great_expectations) - tool for data testing, documentation, and profiling.

* [deepqu](https://github.com/awslabs/python-deequ) - libray by Amazon for defining unit tests for data with focus on large datasets. Based on Apache Spark.

* [soda](https://github.com/sodadata/soda-core) - enables data testing through extended SQL queries.

* [dqm](https://github.com/piotr-kalanski/data-quality-monitoring) - another data quality monitoring tool implemented using Spark.

* [owl-sanitizer](https://github.com/ronald-smith-angel/owl-data-sanitizer) - yet another Spark based lightweight data validation framework.

* [griffin](https://github.com/apache/griffin) - Data Quality solution for distributed data systems at any scale in both streaming and batch data context. 

* [drunken-data-quality](https://github.com/FRosner/drunken-data-quality)

* [DataQuality for BigData](https://github.com/agile-lab-dev/DataQuality)

* [TopNotch](https://github.com/blackrock/TopNotch)

* [Phasor Data Quality Tracker](https://github.com/GridProtectionAlliance/pdqtracker)

* [DataCleaner](https://github.com/datacleaner/DataCleaner)

* [data-quality](https://github.com/Talend/data-quality) 

##### Geared for ML

* [deepchecks](https://github.com/deepchecks/deepchecks) - tool for validating your machine learning models and data. Implemented test suites tailored towards ML models datasets and outputs.

* [evidently](https://github.com/evidentlyai/evidently) - analyze and track data and ML model output quality.

##### Pipelines with data quality included

* [dbt](https://docs.getdbt.com/docs/building-a-dbt-project/tests), [dataform](https://dataform.co/blog/data-assertions) - ELT tools that comes with a handy utility to define tests as SQL queries.

#### Paid

Offering ranges from data to pipelines testing, with focus on real-time monitoring, automation of tests creation & threshold setting, and addditional enterprise features.

* [Bigeye](https://bigeye.com)

* [Soda](https://soda.io)

* [Databand](https://databand.ai)

* [Monte Carlo](https://montecarlodata.com)

* [great expectations](https://greatexpectations.io)

* [Sifflet](https://siffletapp.com)

* [Validio](https://validio.io)

* [Lightup](https://lightup.ai)

* [Lantern](https://lantern.so)

* [Metaplane](https://metaplane.dev)

* [Datafold](https://datafold.com)

* [Acceldata](https://acceldata.io)

* [Anomalo](https://anomalo.com)

* [Marquez](https://marquezproject.github.io)

TODOs

* Add tools for unstructured data (Arthur, Robust)

*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kwanUm/awesome-data-quality

Awesome Lists containing this project

README