Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with data-quality
A curated list of projects in awesome lists tagged with data-quality .
https://github.com/gokumohandas/made-with-ml
Learn how to design, develop, deploy and iterate on production-grade ML applications.
data-engineering data-quality data-science deep-learning distributed-ml distributed-training llms machine-learning mlops natural-language-processing python pytorch ray
Last synced: 30 Dec 2024
https://github.com/GokuMohandas/Made-With-ML
Learn how to design, develop, deploy and iterate on production-grade ML applications.
data-engineering data-quality data-science deep-learning distributed-ml distributed-training llms machine-learning mlops natural-language-processing python pytorch ray
Last synced: 27 Oct 2024
https://github.com/eugeneyan/applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
applied-data-science applied-machine-learning computer-vision data-discovery data-engineering data-quality data-science deep-learning machine-learning natural-language-processing production recsys reinforcement-learning search
Last synced: 23 Nov 2024
https://github.com/ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 30 Dec 2024
https://github.com/ydataai/pandas-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 14 Dec 2024
https://github.com/great-expectations/great_expectations
Always know what to expect from your data.
cleandata data-engineering data-profilers data-profiling data-quality data-science data-unit-tests datacleaner datacleaning dataquality dataunittest eda exploratory-analysis exploratory-data-analysis exploratorydataanalysis mlops pipeline pipeline-debt pipeline-testing pipeline-tests
Last synced: 30 Dec 2024
https://github.com/cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
active-learning annotation data-centric-ai data-cleaning data-curation data-labeling data-profiling data-quality data-science data-validation dataops dataquality datasets exploratory-data-analysis labeling llms noisy-labels out-of-distribution-detection outlier-detection weak-supervision
Last synced: 31 Dec 2024
https://github.com/voxel51/fiftyone
Refine high-quality datasets and visual AI models
active-learning artificial-intelligence computer-vision data-centric-ai data-cleaning data-curation data-quality data-science deep-learning developer-tools image-classification machine-learning object-detection python unstructured-data vector-search visualization
Last synced: 30 Oct 2024
https://github.com/feast-dev/feast
The Open Source Feature Store for Machine Learning
big-data data-engineering data-quality data-science feature-store features machine-learning ml mlops python
Last synced: 30 Dec 2024
https://github.com/open-metadata/openmetadata
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datadiscovery dataengineering dataquality dbt hacktoberfest metadata metadata-management snowflake
Last synced: 30 Dec 2024
https://github.com/evidentlyai/evidently
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
data-drift data-quality data-science data-validation generative-ai hacktoberfest html-report jupyter-notebook llm llmops machine-learning mlops model-monitoring pandas-dataframe
Last synced: 30 Dec 2024
https://github.com/treeverse/lakefs
lakeFS - Data version control for your data lake | Git for data
apache-spark apache-sparksql aws-s3 azure-blob-storage azure-storage data-engineering data-lake data-quality data-version-control data-versioning datalake datalakes git-for-data go golang google-cloud-storage hadoop-filesystem lakefs object-storage
Last synced: 30 Dec 2024
https://github.com/open-metadata/OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
data-catalog data-collaboration data-contracts data-discovery data-governance data-lineage data-observability data-profiling data-quality data-quality-checks data-science data-validation datacatalog datadiscovery dataengineering dataquality dbt metadata metadata-management snowflake
Last synced: 27 Oct 2024
https://github.com/treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
apache-spark apache-sparksql aws-s3 azure-blob-storage azure-storage data-engineering data-lake data-quality data-version-control data-versioning datalake datalakes git-for-data go golang google-cloud-storage hadoop-filesystem lakefs object-storage
Last synced: 27 Oct 2024
https://github.com/gokumohandas/mlops-course
Learn how to design, develop, deploy and iterate on production-grade ML applications.
data-engineering data-quality data-science deep-learning distributed-ml llms machine-learning mlops natural-language-processing python pytorch ray
Last synced: 01 Jan 2025
https://github.com/GokuMohandas/mlops-course
Learn how to design, develop, deploy and iterate on production-grade ML applications.
data-engineering data-quality data-science deep-learning distributed-ml llms machine-learning mlops natural-language-processing python pytorch ray
Last synced: 30 Oct 2024
https://github.com/datafold/data-diff
Compare tables within or across databases
data data-diffing data-engineering data-quality data-quality-monitoring data-science database databricks-sql dataengineering dataquality dbt mysql oracle-database postgres postgresql python rdbms snowflake sql trino
Last synced: 29 Oct 2024
https://github.com/whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
ai-pipelines analytics approximate-statistics calculate-statistics constraints data-constraints data-pipeline data-quality data-science dataops dataset logging machine-learning ml-pipelines mlops model-performance python statistical-properties
Last synced: 31 Dec 2024
https://github.com/feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
apache-spark artificial-intelligence azure data-engineering data-quality data-science feature-engineering feature-governance feature-management feature-marketplace feature-metadata feature-platform feature-store machine-learning mlops
Last synced: 02 Jan 2025
https://github.com/sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
data-contracts data-engineering data-governance data-monitoring data-observability data-profiling data-quality data-quality-checks data-quality-monitoring data-quality-testing data-reliability data-testing data-unit-tests data-validation dataquality datatesting dbt pipeline-testing python snowflake
Last synced: 02 Jan 2025
https://github.com/featureform/featureform
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database
Last synced: 01 Jan 2025
https://github.com/re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
data-analysis data-monitoring data-observability data-quality data-quality-checks data-quality-monitoring data-reliability data-testing dataquality dbt dbt-packages open-source-tooling
Last synced: 03 Dec 2024
https://github.com/opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
alerting bigdata data-catalog data-discovery data-engineering data-exploration data-governance data-lineage data-observability data-pipelines data-platform data-profiling data-quality data-science datacatalog lineage metadata metadata-management observability oss
Last synced: 02 Jan 2025
https://github.com/daochenzha/data-centric-ai
A curated, but incomplete, list of data-centric AI resources.
ai artificial-intelligence data-centric data-centric-ai data-centric-machine-learning data-curation data-engineering data-quality data-science machine-learning
Last synced: 01 Dec 2024
https://github.com/daochenzha/data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
ai artificial-intelligence data-centric data-centric-ai data-centric-machine-learning data-curation data-engineering data-quality data-science machine-learning
Last synced: 30 Oct 2024
https://github.com/cleanlab/cleanvision
Automatically find issues in image datasets and practice data-centric computer vision.
computer-vision data-centric-ai data-exploration data-profiling data-quality data-science data-validation deep-learning exploratory-data-analysis image-analysis image-classification image-generation image-quality image-segmentation
Last synced: 26 Oct 2024
https://github.com/rstudio/pointblank
Data quality assessment and metadata reporting for data frames and database tables
data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration
Last synced: 30 Dec 2024
https://github.com/kennethleungty/failed-ml
Compilation of high-profile real-world examples of failed machine learning projects
ai artificial-intelligence classification computer-vision data-engineering data-quality data-science deep-learning failed-data-science failed-machine-learning failed-ml fml forecasting machine-learning ml natural-language-processing production recsys regression
Last synced: 22 Nov 2024
https://github.com/kennethleungty/Failed-ML
Compilation of high-profile real-world examples of failed machine learning projects
ai artificial-intelligence classification computer-vision data-engineering data-quality data-science deep-learning failed-data-science failed-machine-learning failed-ml fml forecasting machine-learning ml natural-language-processing production recsys regression
Last synced: 05 Nov 2024
https://github.com/WeBankFinTech/Qualitis
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. https://github.com/WeBankFinTech/Qualitis
compare data-quality data-quality-model datashperestudio dss linkis quality quality-check quality-improvement workflow
Last synced: 05 Nov 2024
https://github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 27 Nov 2024
https://github.com/nvidia/nemo-curator
Scalable data pre processing and curation toolkit for LLMs
data data-curation data-prep data-preparation data-processing data-processing-pipelines data-quality datacuration datarecipes deduplication fast-data-processing fine-tuning large-language-models large-scale-data-processing llm llm-data-quality llmapps python semantic-deduplication
Last synced: 03 Jan 2025
https://github.com/polyaxon/traceml
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
dask data-exploration data-profiling data-quality data-quality-checks data-science data-visualization dataframes dataops explainable-ai matplotlib mlops pandas pandas-summary plotly pytorch spark statistics tensorflow tracking
Last synced: 02 Jan 2025
https://github.com/infuseai/piperider
Code review for data in dbt
code-review continuous-integration data-exploration data-observability data-pipeline data-profiler data-profiling data-quality data-reliability data-science data-testing data-visualization dbt dbt-metrics eda exploratory-data-analysis pull-requests python reporting
Last synced: 01 Jan 2025
https://github.com/InfuseAI/piperider
Code review for data in dbt
code-review continuous-integration data-exploration data-observability data-pipeline data-profiler data-profiling data-quality data-reliability data-science data-testing data-visualization dbt dbt-metrics eda exploratory-data-analysis pull-requests python reporting
Last synced: 09 Nov 2024
https://github.com/data-drift/data-drift
Metrics Observability & Troubleshooting
analytics bigquery context data-diffing data-governance data-lineage data-monitoring data-observability data-quality data-reliability data-version-control dbt dbt-metrics dbt-packages drill-down metrics reconciliation redshift semantic-layer snowflake
Last synced: 01 Jan 2025
https://github.com/alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming
Last synced: 05 Nov 2024
https://github.com/ubisoft/mobydq
:whale: Tool to automate data quality checks on data pipelines
big-data data-pipeline data-quality data-quality-checks data-quality-monitoring data-warehouse
Last synced: 11 Nov 2024
https://github.com/adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
big-data configuration-driven data-engineering data-quality databricks delta-lake framework great-expectations lakehouse spark
Last synced: 03 Jan 2025
https://github.com/bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS).
data data-contract data-contracts data-engineering data-mesh data-quality
Last synced: 25 Nov 2024
https://github.com/whylabs/whylogs-java
Profile and monitor your ML data pipeline end-to-end
ai-pipelines aiops apache-spark approximate-statistics calculate-statistics data-quality dataset java mlops spark statistical-properties statistics whylogs
Last synced: 28 Sep 2024
https://github.com/gair-nlp/prox
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
continual continual-pre-training data-centric-ai data-quality llama llm mistral neural-symbolic pre-training
Last synced: 04 Jan 2025
https://github.com/astronomer/airflow-provider-great-expectations
Great Expectations Airflow operator
airflow airflow-operators airflow-providers data-quality data-science data-testing
Last synced: 01 Jan 2025
https://github.com/AKSW/RDFUnit
An RDF Unit Testing Suite
data-quality data-quality-checks data-validation rdf schema schema-validation shacl unit-testing validation web-ontology-language
Last synced: 04 Nov 2024
https://github.com/ohdsi/dataqualitydashboard
A tool to help improve data quality standards in observational data science.
Last synced: 01 Jan 2025
https://github.com/OHDSI/DataQualityDashboard
A tool to help improve data quality standards in observational data science.
Last synced: 27 Nov 2024
https://github.com/dqops/dqo
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
data-observability data-ops data-profiling data-quality data-quality-checks data-quality-measurement data-quality-monitoring data-quality-report monitoring
Last synced: 19 Nov 2024
https://github.com/Seddryck/NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
business-intelligence cube data-quality data-quality-checks database etl nunit test-automation test-framework
Last synced: 13 Nov 2024
https://github.com/seddryck/nbi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
business-intelligence cube data-quality data-quality-checks database etl nunit test-automation test-framework
Last synced: 04 Jan 2025
https://github.com/datakitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake
Last synced: 04 Jan 2025
https://github.com/re-data/dbt-re-data
re_data - fix data issues before your users & CEO would discover them 😊
data-monitoring data-observability data-quality data-testing dbt dbt-packages sql
Last synced: 06 Nov 2024
https://github.com/aai-institute/pyDVL
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
banzhaf-index data-centric-ai data-cleaning data-pruning data-quality data-valuation game-theory influence-functions least-core machine-learning robust-machine-learning shapley-value transferlab
Last synced: 17 Nov 2024
https://github.com/DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
data data-engineering data-observability data-profiling data-quality data-reliability data-science datachecker datacleaner datacleaning dataops dataquality datatesting datavalidation mssql pipeline-tests postgresql redshift self-hosted snowflake
Last synced: 13 Nov 2024
https://github.com/aws-samples/amazon-deequ-glue
Automated data quality suggestions and analysis with Deequ on AWS Glue
aws aws-glue data-quality deequ
Last synced: 04 Dec 2024
https://github.com/gclunies/reflekt
Define, govern, and model event data for warehouse-first product analytics.
avo customer-data-platform data-modeling data-quality data-warehouse dbt dbt-package events governance product-analytics schema-registry segment segment-protocols
Last synced: 28 Dec 2024
https://github.com/GClunies/Reflekt
Define, govern, and model event data for warehouse-first product analytics.
avo customer-data-platform data-modeling data-quality data-warehouse dbt dbt-package events governance product-analytics schema-registry segment segment-protocols
Last synced: 29 Dec 2024
https://github.com/great-expectations/great_expectations_action
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
actions continuous-integration data-integrity data-quality data-science mlops
Last synced: 06 Nov 2024
https://github.com/impetus/jumbune
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
aiops apm cluster-monitoring data-analysis data-quality developer-tools devops-tools hadoop hadoop-cluster hadoop-monitor hadoop-monitoring monitoring-tool optimization-framework yarn yarn-hadoop-cluster
Last synced: 14 Nov 2024
https://github.com/kevinadhiguna/dqlab-career-track
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
career-track data-analysis data-analyst data-manipulation data-quality data-visualization dqlab dqlab-career-track exploratory-data-analysis machine-learning python sql
Last synced: 28 Oct 2024
https://github.com/provectus/data-quality-gate
Data Quality Gate based on AWS
athena aws aws-lambda data-governance data-quality great-expectations redshift s3 terraform
Last synced: 19 Dec 2024
https://github.com/anerv/bikedna
BikeDNA: Bicycle Infrastructure Data & Network Assessment
bicycle-infrastructure bicycle-network data-quality geospatial-data openstreetmap sustainable-mobility urban-planning volunteered-geographic-information
Last synced: 28 Oct 2024
https://github.com/kenthsu/udacity-data-engineering-nanodgree
Udacity Data Engineering Nanodegree Program
apache-airflow apache-cassandra apache-spark aws-redshift aws-s3 data-engineering data-lake data-pipelines data-quality data-warehouses postgresql
Last synced: 12 Oct 2024
https://github.com/datakitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake
Last synced: 01 Jan 2025
https://github.com/davidberenstein1957/dataset-viber
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
data-collection data-quality evaluation human-feedback
Last synced: 01 Jan 2025
https://github.com/ammsa/dtcleaner
DTCleaner: data cleaning using multi-target decision trees.
data-cleaning data-mining data-preprocessing data-quality data-science data-wrangling
Last synced: 28 Oct 2024
https://github.com/ropensci/daiquiri
Data quality reporting for temporal datasets.
data-quality initial-data-analysis r r-package reproducible-research rstats temporal-data time-series
Last synced: 04 Dec 2024
https://github.com/giscience/ohsome-quality-api
Data quality estimations for OpenStreetMap
accuracy completeness data-quality heigit indicators ohsome openstreetmap openstreetmap-data osm osm-data reports
Last synced: 12 Nov 2024
https://github.com/emilyriederer/convo
R package based on "Column Names as Contracts" blog post (https://emilyriederer.netlify.app/post/column-name-contracts/)
controlled-vocabulary data-quality data-validation r-package schema-design variable-names variable-naming
Last synced: 04 Dec 2024
https://github.com/bolcom/hive_compared_bq
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
bigquery data-quality hive python validation
Last synced: 15 Dec 2024
https://github.com/timgent/data-flare
Data quality control tool built on spark and deequ
Last synced: 16 Nov 2024
https://github.com/semyonsinchenko/tsumugi-spark
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
data-quality deequ pyspark spark
Last synced: 10 Oct 2024
https://github.com/dp6/penguin-datalayer-collect
A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
adobe-launch data-quality data-quality-monitoring datalayer dp6 dtm gtm gtm-server-side hacktoberfest marketing-automation monitoring penguin-datalayer raft-suite tealium
Last synced: 04 Dec 2024
https://github.com/kiwicom/contessa
Easy way to define, execute and store quality rules for your data.
data data-engineering data-quality framework mysql postgres python quality-assurance sqlite3
Last synced: 04 Dec 2024
https://github.com/hms-dbmi/EHRtemporalVariability
R package for delineating temporal dataset shifts in Eletronic Health Records
biomedical-data-science biomedical-informatics data-quality data-quality-monitoring dataset-shifts electronic-health-records time variability visualization
Last synced: 19 Nov 2024
https://github.com/piotr-kalanski/data-quality-monitoring
Data Quality Monitoring Tool
data-quality monitoring scala spark
Last synced: 27 Oct 2024
https://github.com/aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
apache-iceberg apache-spark aws aws-cloudwatch aws-glue aws-lambda data-quality monitoring pyiceberg sam-cli
Last synced: 12 Oct 2024
https://github.com/ahmadassaf/roomba
A Node.js tool to examine the correctness of Open Data Metadata and build custom dataset profiles
ckan ckan-api data-profiling data-quality dataset dataset-catalog dataset-metadata node portal
Last synced: 13 Oct 2024
https://github.com/data-catering/data-caterer
Data generation and validation tool for any data source
data-generation data-quality data-test data-testing data-validation java scala testing-automation ui yaml
Last synced: 29 Dec 2024
https://github.com/cdcgov/cdh-lava-react
CDC Data Hub Lifecycle, Analysis & Visualization Accelerator (LAVA) REACT Components based on machine readable requirements.
agile-development azure data-analysis data-catalog data-governance data-quality data-science data-visualization databricks datavisualization devops excel-export metadata operations powerautomate powerbi pyspark security sql test-automation
Last synced: 08 Nov 2024
https://github.com/christianbors/OpenRefineQualityMetrics
MetricDoc is an interactive visual exploration environment for assessing data quality
data-profiling data-quality data-quality-checks data-wrangling interactive-visualizations quality-metrics visual-analytics
Last synced: 05 Nov 2024
https://github.com/dp6/penguin-datalayer
Crawler assistido para validação de objetos enviados à camada de dados (Data Layer)
data-quality data-quality-checks datalayer dp6 gtm hacktoberfest json-schema nodejs raft-suite
Last synced: 04 Dec 2024
https://github.com/datarootsio/notion-dbs-data-quality
Using Great Expectations and Notion's API, this repo aims to provide data quality for our databases in Notion.
data-engineering-pipeline data-quality great-expectations notion notion-api notion-database
Last synced: 14 Nov 2024
https://github.com/adidas/lakehouse-engine-docs
The Goal of this project is to provide documentation for the Lakehouse Engine framework.
big-data data-engineering data-quality databricks delta-lake framework great-expectations lakehouse lakehouse-engine spark
Last synced: 12 Oct 2024
https://github.com/open-risk/dataqualitytoolkit
Python toolkit for evaluating and visualizing the data quality of excel spreadsheets
data-quality data-quality-measurement data-science excel spreadsheet
Last synced: 13 Oct 2024
https://github.com/dp6/raft-suite-hub
O Hub é a solução responsável por centralizar a consolidação dos dados no BigQuery, ferramenta escolhida para servir de data warehouse do raft-suite.
bigquery data data-quality google-cloud google-cloud-functions hacktoberfest
Last synced: 04 Dec 2024
https://github.com/dp6/penguin-document-formatter
A document reader to extract Google Analytics planned events to use on the Raft Suite Data Quality
analytics data-quality google-cloud hacktoberfest monitoring pdf-converter
Last synced: 04 Dec 2024
https://github.com/giscience/ohsome-dashboard
Web Client for easy access to OSM History and Quality Analyses
data-quality openstreetmap openstreetmap-data openstreetmap-history osm osm-data
Last synced: 12 Nov 2024
https://github.com/dp6/penguin-datalayer-core
Validation core engine for the data layer of the Raft Suite ecosystem.
camada-de-dados data-quality data-quality-checks datalayer dp6 gtm hacktoberfest json-schema marketing-automation penguin-datalayer
Last synced: 04 Dec 2024
https://github.com/nationalparkservice/qckit
QCkit provides useful functions for data quality control and manipulation including updating data to DarwinCore standards, unit conversions, and data flagging.
darwin-core data-quality data-science npsdataverse quality-control r r-package rstats
Last synced: 08 Nov 2024
https://github.com/harpin-ai/toolkit-examples
Examples for trying out the harpin AI identity resolution and data quality toolkit
data-engineering data-quality dedupe deduplication entity-resolution identity identity-resolution spark
Last synced: 01 Nov 2024
https://github.com/absaoss/spark-data-standardization
A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.
data-quality data-structures scala schema spark
Last synced: 07 Nov 2024
https://github.com/byteplant/address-validator-net
NodeJS wrapper for the address-validator.net API
address address-autocomplete address-cleaning address-matching address-validation address-verification autocomplete byteplant cleaning cleaning-data data-quality data-validation javascript node-js node-module typescript validation verification wrapper
Last synced: 16 Nov 2024
https://github.com/dylan-profiler/tangled-up-in-unicode
Access to the Unicode Character Database (UCD)
data-analysis data-quality exploration linguistic-analysis linguistics python unicode
Last synced: 16 Nov 2024
https://github.com/dev-ev/isobaric-inspection-jupyter
Inspecting the quality of isobaric labeling proteomic data in a Jupyter notebook. Data output from Proteome Discoverer.
data-quality data-visualization data-wrangling isobaric-labeling jupyterlab mass-spectrometry proteome-discoverer proteomics proteomics-data-analysis python quantitative-proteomics
Last synced: 19 Nov 2024
https://github.com/byteplant/email-validator-net
NodeJS wrapper for the email-validator.net API
byteplant cleaning cleaning-data data-quality data-validation email email-cleaning email-marketing email-validation email-verification javascript node-js node-module typescript validation verification
Last synced: 16 Nov 2024
https://github.com/data-drift/dbt-snapshot-analytics
Get insight from a dbt snapshot on your metric quality
analytics data-quality dbt monitoring snapshot
Last synced: 30 Dec 2024
https://github.com/maastrichtu-ids/dqa-pipeline
Large-scale RDF-based Data Quality Assessment Pipeline
data-quality docker fair-data rdf sparql
Last synced: 21 Dec 2024
https://github.com/maastrichtu-ids/fairsharing-metrics
📊 Fairsharing metrics implementation
bioinformatics data-quality docker python rdf rdfunit
Last synced: 21 Dec 2024
https://github.com/byteplant/jquery-address-validator-net
jQuery plugin for the address-validator.net API
address address-autocomplete address-validation data-cleaning data-quality data-validation form-validation form-validation-jquery javascript javascript-library jquery validation
Last synced: 16 Nov 2024
https://github.com/opendatadiscovery/odd-great-expectations
Integration for collecting metadata from Great Expectations
Last synced: 14 Nov 2024