Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kiwicom/contessa
Easy way to define, execute and store quality rules for your data.
https://github.com/kiwicom/contessa
data data-engineering data-quality framework mysql postgres python quality-assurance sqlite3
Last synced: 3 months ago
JSON representation
Easy way to define, execute and store quality rules for your data.
- Host: GitHub
- URL: https://github.com/kiwicom/contessa
- Owner: kiwicom
- License: mit
- Created: 2019-08-05T07:00:28.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-12-15T12:00:22.000Z (11 months ago)
- Last Synced: 2024-04-17T04:45:25.196Z (7 months ago)
- Topics: data, data-engineering, data-quality, framework, mysql, postgres, python, quality-assurance, sqlite3
- Language: Python
- Homepage:
- Size: 199 KB
- Stars: 18
- Watchers: 12
- Forks: 7
- Open Issues: 22
-
Metadata Files:
- Readme: README-MINIMAL.rst
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - kiwicom/contessa - Easy way to define, execute and store quality rules for your data. (Python)
README
Contessa
============================Hello, welcome to Contessa!
Contessa is a **Data Quality** library that provides you an easy way to define, execute and
store quality rules for your data.Instead of writing a lot of sql queries that look almost exactly the same, we're aiming for more
pragmatic approach - define rules programatically. This enables much more flexibility for the user and also for us as the creators of the lib.We implement new Rules (incrementally) that should reflect Data Quality domain. From the start these are simple
rules like - NOT_NULL, GT (greater than) etc. We want to build on these simple rules and provide more complex Data Quality checkers out-of-the-box.**Goals:**
- be database agnostic (to a reasonable degree), so you will define checks against any database (e.g. mysql vs. postgres) in the same way
- automatize data quality results e.g. from postgres table to Datadog dashboard
- programmatic approach to data-quality definition, which leads to:- dynamic composition of rules in a simple script using db or any 3rd party tool - e.g. take all tables, create NOT_NULl rule for all of them for each integer column
- users can use special rules for data if needed, if not, they can go with generic solutions
- automatizable testable parts of definitions when needed
- easier maintenance when number of checks scales too fast :)
Full docs here https://contessa.readthedocs.io/en/latest