An open API service indexing awesome lists of open source software.

https://github.com/sujjadshaik/awesome-data-quality

A curated list of resources for testing, monitoring, and improving data quality across various data environments.
https://github.com/sujjadshaik/awesome-data-quality

List: awesome-data-quality

awesome awesome-data awesome-data-quality awesome-list awesome-readme

Last synced: 3 months ago
JSON representation

A curated list of resources for testing, monitoring, and improving data quality across various data environments.

Awesome Lists containing this project

README

        

# Awesome Data Quality Resources

A curated list of resources for testing, monitoring, and improving data quality across various data environments.

## Table of Contents

- [Frameworks and Libraries](#frameworks-and-libraries)
- [Open Source](#open-source)
- [Commercial](#commercial)
- [Books and Methodologies](#books-and-methodologies)
- [Tools](#tools)
- [Open Source Tools](#open-source-tools)
- [Commercial Tools](#commercial-tools)
- [Articles and Guides](#articles-and-guides)

## Frameworks and Libraries

### Open Source

- **elementary** - Data monitoring and observability tailored to dbt. [GitHub](https://github.com/elementary-data/elementary)
- **mobydq** - Tool for data engineering teams to run & automate data quality checks on their data pipeline. [GitHub](https://github.com/mobydq/mobydq)
- **ydata-quality** - Python library for assessing data quality throughout stages of the data pipeline development. [GitHub](https://github.com/ydataai/ydata-quality)
- **great-expectations** - Tool for data testing, documentation, and profiling. [GitHub](https://github.com/great-expectations/great_expectations)
- **deequ** - Library by Amazon for defining unit tests for data with a focus on large datasets. Based on Apache Spark. [GitHub](https://github.com/awslabs/deequ)
- **soda** - Enables data testing through extended SQL queries. [GitHub](https://github.com/sodadata/soda)
- **dqm** - Another data quality monitoring tool implemented using Spark. [GitHub](https://github.com/linkedin/dqm)
- **owl-sanitizer** - Lightweight data validation framework based on Spark. [GitHub](https://github.com/linkedin/owl-sanitizer)
- **griffin** - Data Quality solution for distributed data systems at any scale in both streaming and batch data context. [GitHub](https://github.com/apache/incubator-griffin)

### Commercial

- **Bigeye** - Continuous data quality monitoring and anomaly detection. [Website](https://www.bigeye.com/)
- **Soda** - Data testing and monitoring platform. [Website](https://www.soda.io/)
- **Databand** - Data pipeline observability and monitoring. [Website](https://www.databand.ai/)
- **Monte Carlo** - Data observability platform. [Website](https://www.montecarlodata.com/)
- **Sifflet** - Data quality monitoring and observability. [Website](https://www.sifflet.io/)
- **Validio** - Real-time data quality monitoring. [Website](https://www.validio.com/)
- **Lightup** - Data quality checks and monitoring. [Website](https://www.lightup.ai/)
- **Lantern** - Data quality and observability. [Website](https://www.lantern.io/)
- **Metaplane** - Data quality monitoring for data teams. [Website](https://www.metaplane.dev/)
- **Datafold** - Proactive data quality platform. [Website](https://www.datafold.com/)
- **Acceldata** - Data observability and quality management. [Website](https://www.acceldata.io/)
- **Anomalo** - Automated data quality monitoring. [Website](https://www.anomalo.com/)
- **Marquez** - Metadata service for collecting, aggregating, and visualizing a data ecosystem's metadata. [GitHub](https://github.com/MarquezProject/marquez)

## Books and Methodologies

- **Complete Data Quality Methodology (CDQM)** - By Carlo Batini/Monica Scannapieco. [Book](https://www.amazon.com/Data-Quality-Concepts-Methodologies-Techniques/dp/3540331727)
- **Data Quality Assessment Framework** - By Arkady Maydanchik. [Book](https://www.amazon.com/Data-Quality-Assessment-Arkady-Maydanchik/dp/097714000X)
- **CIHI Information Quality Framework** - From the Canadian Institute for Health Information. [Resource](https://www.cihi.ca/en/data-quality)
- **Enterprise Knowledge Management** - By David Loshin. [Book](https://www.amazon.com/Enterprise-Knowledge-Management-Quality-Approach/dp/0123737245)
- **MIKE2.0** - Open Source initiative for Enterprise Information Management. [Website](https://mike2.openmethodology.org/)
- **Ten Steps to Quality Data and Trusted Information** - By Danette McGilvray. [Book](https://www.amazon.com/Ten-Steps-Quality-Trusted-Information/dp/0977140034)
- **Total Information Quality Management (TIQM)** - By Larry English. [Book](https://www.amazon.com/Improving-Data-Warehouse-Business-Information/dp/0471353835)

## Tools

### Open Source Tools

- **Deequ** - For defining unit tests for data. [GitHub](https://github.com/awslabs/deequ)
- **dbt Core** - Data transformation tool with built-in testing capabilities. [GitHub](https://github.com/dbt-labs/dbt-core)
- **MobyDQ** - Automates data quality checks. [GitHub](https://github.com/mobydq/mobydq)
- **Great Expectations** - Data validation and profiling. [GitHub](https://github.com/great-expectations/great_expectations)
- **Soda Core** - Python library for data reliability. [GitHub](https://github.com/sodadata/soda-core)
- **Cucumber** - Behavior-driven development tool for data quality testing. [GitHub](https://github.com/cucumber/cucumber)

### Commercial Tools

- **Ataccama** - Comprehensive data quality and catalog suite. [Website](https://www.ataccama.com/)
- **Informatica** - Data quality and observability platform. [Website](https://www.informatica.com/)
- **Talend** - Data quality solutions with real-time monitoring. [Website](https://www.talend.com/)
- **IBM InfoSphere QualityStage** - Data quality and governance. [Website](https://www.ibm.com/products/infosphere-information-server)
- **Precisely Trillium Quality** - Enterprise data quality tool. [Website](https://www.precisely.com/product/trillium-quality)
- **Adverity** - Marketing data integration with data quality management. [Website](https://www.adverity.com/)
- **Oracle Enterprise Data Quality** - Robust data profiling and cleansing. [Website](https://www.oracle.com/enterprise-data-quality/)

## Articles and Guides

- **A Guide to Data Quality Tools: The 4 Leading Solutions** - Zendata. [Article](https://www.zendata.dev/post/a-guide-to-data-quality-tools-the-4-leading-solutions)
- **Top Data Quality Management Tools to Choose in 2024** - Mad Devs. [Article](https://maddevs.io/blog/data-quality-management-tools/)
- **Data Quality Management: Tools, Pillars, and Best Practices** - lakeFS. [Article](https://lakefs.io/data-quality/data-quality-management/)
- **Best Data Quality Tools for 2024: Top 10 Choices** - Adverity. [Article](https://www.adverity.com/blog/data-quality-tools/)
- **The 8 Best Data Quality Management Tools and Software for 2025** - Solutions Review. [Article](https://solutionsreview.com/data-management/the-best-data-quality-management-tools/)
- **9 Best Tools for Data Quality in 2024** - Datafold. [Article](https://www.datafold.com/blog/9-best-tools-for-data-quality-in-2021)
- **Data Quality Management Best Practices: A Short Guide** - Zendata. [Article](https://www.zendata.dev/post/data-quality-management-best-practices-a-short-guide)