https://github.com/sujjadshaik/awesome-data-quality
A curated list of resources for testing, monitoring, and improving data quality across various data environments.
https://github.com/sujjadshaik/awesome-data-quality
List: awesome-data-quality
awesome awesome-data awesome-data-quality awesome-list awesome-readme
Last synced: 3 months ago
JSON representation
A curated list of resources for testing, monitoring, and improving data quality across various data environments.
- Host: GitHub
- URL: https://github.com/sujjadshaik/awesome-data-quality
- Owner: sujjadshaik
- License: mit
- Created: 2025-02-15T21:57:20.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-02-15T22:07:03.000Z (3 months ago)
- Last Synced: 2025-02-15T23:19:59.051Z (3 months ago)
- Topics: awesome, awesome-data, awesome-data-quality, awesome-list, awesome-readme
- Homepage: https://sujjad.tech/blog/tech/awesome-data-quality
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ultimate-awesome - awesome-data-quality - A curated list of resources for testing, monitoring, and improving data quality across various data environments. (Other Lists / Julia Lists)
README
# Awesome Data Quality Resources
A curated list of resources for testing, monitoring, and improving data quality across various data environments.
## Table of Contents
- [Frameworks and Libraries](#frameworks-and-libraries)
- [Open Source](#open-source)
- [Commercial](#commercial)
- [Books and Methodologies](#books-and-methodologies)
- [Tools](#tools)
- [Open Source Tools](#open-source-tools)
- [Commercial Tools](#commercial-tools)
- [Articles and Guides](#articles-and-guides)## Frameworks and Libraries
### Open Source
- **elementary** - Data monitoring and observability tailored to dbt. [GitHub](https://github.com/elementary-data/elementary)
- **mobydq** - Tool for data engineering teams to run & automate data quality checks on their data pipeline. [GitHub](https://github.com/mobydq/mobydq)
- **ydata-quality** - Python library for assessing data quality throughout stages of the data pipeline development. [GitHub](https://github.com/ydataai/ydata-quality)
- **great-expectations** - Tool for data testing, documentation, and profiling. [GitHub](https://github.com/great-expectations/great_expectations)
- **deequ** - Library by Amazon for defining unit tests for data with a focus on large datasets. Based on Apache Spark. [GitHub](https://github.com/awslabs/deequ)
- **soda** - Enables data testing through extended SQL queries. [GitHub](https://github.com/sodadata/soda)
- **dqm** - Another data quality monitoring tool implemented using Spark. [GitHub](https://github.com/linkedin/dqm)
- **owl-sanitizer** - Lightweight data validation framework based on Spark. [GitHub](https://github.com/linkedin/owl-sanitizer)
- **griffin** - Data Quality solution for distributed data systems at any scale in both streaming and batch data context. [GitHub](https://github.com/apache/incubator-griffin)### Commercial
- **Bigeye** - Continuous data quality monitoring and anomaly detection. [Website](https://www.bigeye.com/)
- **Soda** - Data testing and monitoring platform. [Website](https://www.soda.io/)
- **Databand** - Data pipeline observability and monitoring. [Website](https://www.databand.ai/)
- **Monte Carlo** - Data observability platform. [Website](https://www.montecarlodata.com/)
- **Sifflet** - Data quality monitoring and observability. [Website](https://www.sifflet.io/)
- **Validio** - Real-time data quality monitoring. [Website](https://www.validio.com/)
- **Lightup** - Data quality checks and monitoring. [Website](https://www.lightup.ai/)
- **Lantern** - Data quality and observability. [Website](https://www.lantern.io/)
- **Metaplane** - Data quality monitoring for data teams. [Website](https://www.metaplane.dev/)
- **Datafold** - Proactive data quality platform. [Website](https://www.datafold.com/)
- **Acceldata** - Data observability and quality management. [Website](https://www.acceldata.io/)
- **Anomalo** - Automated data quality monitoring. [Website](https://www.anomalo.com/)
- **Marquez** - Metadata service for collecting, aggregating, and visualizing a data ecosystem's metadata. [GitHub](https://github.com/MarquezProject/marquez)## Books and Methodologies
- **Complete Data Quality Methodology (CDQM)** - By Carlo Batini/Monica Scannapieco. [Book](https://www.amazon.com/Data-Quality-Concepts-Methodologies-Techniques/dp/3540331727)
- **Data Quality Assessment Framework** - By Arkady Maydanchik. [Book](https://www.amazon.com/Data-Quality-Assessment-Arkady-Maydanchik/dp/097714000X)
- **CIHI Information Quality Framework** - From the Canadian Institute for Health Information. [Resource](https://www.cihi.ca/en/data-quality)
- **Enterprise Knowledge Management** - By David Loshin. [Book](https://www.amazon.com/Enterprise-Knowledge-Management-Quality-Approach/dp/0123737245)
- **MIKE2.0** - Open Source initiative for Enterprise Information Management. [Website](https://mike2.openmethodology.org/)
- **Ten Steps to Quality Data and Trusted Information** - By Danette McGilvray. [Book](https://www.amazon.com/Ten-Steps-Quality-Trusted-Information/dp/0977140034)
- **Total Information Quality Management (TIQM)** - By Larry English. [Book](https://www.amazon.com/Improving-Data-Warehouse-Business-Information/dp/0471353835)## Tools
### Open Source Tools
- **Deequ** - For defining unit tests for data. [GitHub](https://github.com/awslabs/deequ)
- **dbt Core** - Data transformation tool with built-in testing capabilities. [GitHub](https://github.com/dbt-labs/dbt-core)
- **MobyDQ** - Automates data quality checks. [GitHub](https://github.com/mobydq/mobydq)
- **Great Expectations** - Data validation and profiling. [GitHub](https://github.com/great-expectations/great_expectations)
- **Soda Core** - Python library for data reliability. [GitHub](https://github.com/sodadata/soda-core)
- **Cucumber** - Behavior-driven development tool for data quality testing. [GitHub](https://github.com/cucumber/cucumber)### Commercial Tools
- **Ataccama** - Comprehensive data quality and catalog suite. [Website](https://www.ataccama.com/)
- **Informatica** - Data quality and observability platform. [Website](https://www.informatica.com/)
- **Talend** - Data quality solutions with real-time monitoring. [Website](https://www.talend.com/)
- **IBM InfoSphere QualityStage** - Data quality and governance. [Website](https://www.ibm.com/products/infosphere-information-server)
- **Precisely Trillium Quality** - Enterprise data quality tool. [Website](https://www.precisely.com/product/trillium-quality)
- **Adverity** - Marketing data integration with data quality management. [Website](https://www.adverity.com/)
- **Oracle Enterprise Data Quality** - Robust data profiling and cleansing. [Website](https://www.oracle.com/enterprise-data-quality/)## Articles and Guides
- **A Guide to Data Quality Tools: The 4 Leading Solutions** - Zendata. [Article](https://www.zendata.dev/post/a-guide-to-data-quality-tools-the-4-leading-solutions)
- **Top Data Quality Management Tools to Choose in 2024** - Mad Devs. [Article](https://maddevs.io/blog/data-quality-management-tools/)
- **Data Quality Management: Tools, Pillars, and Best Practices** - lakeFS. [Article](https://lakefs.io/data-quality/data-quality-management/)
- **Best Data Quality Tools for 2024: Top 10 Choices** - Adverity. [Article](https://www.adverity.com/blog/data-quality-tools/)
- **The 8 Best Data Quality Management Tools and Software for 2025** - Solutions Review. [Article](https://solutionsreview.com/data-management/the-best-data-quality-management-tools/)
- **9 Best Tools for Data Quality in 2024** - Datafold. [Article](https://www.datafold.com/blog/9-best-tools-for-data-quality-in-2021)
- **Data Quality Management Best Practices: A Short Guide** - Zendata. [Article](https://www.zendata.dev/post/data-quality-management-best-practices-a-short-guide)