Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with data-quality

A curated list of projects in awesome lists tagged with data-quality .

https://github.com/kestra-io/kestra

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine

Last synced: 29 Sep 2024

https://github.com/gokumohandas/mlops-course

Learn how to design, develop, deploy and iterate on production-grade ML applications.

data-engineering data-quality data-science deep-learning distributed-ml llms machine-learning mlops natural-language-processing python pytorch ray

Last synced: 30 Sep 2024

https://github.com/GokuMohandas/mlops-course

Learn how to design, develop, deploy and iterate on production-grade ML applications.

data-engineering data-quality data-science deep-learning distributed-ml llms machine-learning mlops natural-language-processing python pytorch ray

Last synced: 31 Jul 2024

https://github.com/whylabs/whylogs

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

ai-pipelines analytics approximate-statistics calculate-statistics constraints data-constraints data-pipeline data-quality data-science dataops dataset logging machine-learning ml-pipelines mlops model-performance python statistical-properties

Last synced: 30 Sep 2024

https://github.com/featureform/featureform

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database

Last synced: 30 Sep 2024

https://github.com/featureform/embeddinghub

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database

Last synced: 31 Jul 2024

https://github.com/WeBankFinTech/Qualitis

Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis

compare data-quality data-quality-model datashperestudio dss linkis quality quality-check quality-improvement workflow

Last synced: 01 Aug 2024

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 01 Aug 2024

https://github.com/ubisoft/mobydq

:whale: Tool to automate data quality checks on data pipelines

big-data data-pipeline data-quality data-quality-checks data-quality-monitoring data-warehouse

Last synced: 02 Aug 2024

https://github.com/OHDSI/DataQualityDashboard

A tool to help improve data quality standards in observational data science.

data-quality

Last synced: 08 Aug 2024

https://github.com/aai-institute/pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

banzhaf-index data-centric-ai data-cleaning data-pruning data-quality data-valuation game-theory influence-functions least-core machine-learning robust-machine-learning shapley-value transferlab

Last synced: 03 Aug 2024

https://github.com/aws-samples/amazon-deequ-glue

Automated data quality suggestions and analysis with Deequ on AWS Glue

aws aws-glue data-quality deequ

Last synced: 13 Aug 2024

https://github.com/great-expectations/great_expectations_action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

actions continuous-integration data-integrity data-quality data-science mlops

Last synced: 01 Aug 2024

https://github.com/datakitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 28 Sep 2024

https://github.com/dqops/dqo

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

data-observability data-ops data-profiling data-quality data-quality-checks data-quality-measurement data-quality-monitoring data-quality-report monitoring

Last synced: 04 Aug 2024

https://github.com/DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake

Last synced: 02 Aug 2024

https://github.com/emilyriederer/convo

R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)

controlled-vocabulary data-quality data-validation r-package schema-design variable-names variable-naming

Last synced: 13 Aug 2024

https://github.com/semyonsinchenko/tsumugi-spark

SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.

data-quality deequ pyspark spark

Last synced: 26 Sep 2024

https://github.com/kiwicom/contessa

Easy way to define, execute and store quality rules for your data.

data data-engineering data-quality framework mysql postgres python quality-assurance sqlite3

Last synced: 13 Aug 2024

https://github.com/christianbors/OpenRefineQualityMetrics

MetricDoc is an interactive visual exploration environment for assessing data quality

data-profiling data-quality data-quality-checks data-wrangling interactive-visualizations quality-metrics visual-analytics

Last synced: 01 Aug 2024

https://github.com/adidas/lakehouse-engine-docs

The Goal of this project is to provide documentation for the Lakehouse Engine framework.

big-data data-engineering data-quality databricks delta-lake framework great-expectations lakehouse lakehouse-engine spark

Last synced: 28 Sep 2024

https://github.com/BetweenTwoTests/between_dbs

DDL & test data for different databases for ETL data quality checks / data loading tests

data-quality database etl

Last synced: 13 Aug 2024