Projects in Awesome Lists tagged with data-testing
A curated list of projects in awesome lists tagged with data-testing .
https://github.com/sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
data-contracts data-engineering data-governance data-monitoring data-observability data-profiling data-quality data-quality-checks data-quality-monitoring data-quality-testing data-reliability data-testing data-unit-tests data-validation dataquality datatesting dbt pipeline-testing python snowflake
Last synced: 14 May 2025
https://github.com/re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
data-analysis data-monitoring data-observability data-quality data-quality-checks data-quality-monitoring data-reliability data-testing dataquality dbt dbt-packages open-source-tooling
Last synced: 14 May 2025
https://github.com/infuseai/piperider
Code review for data in dbt
code-review continuous-integration data-exploration data-observability data-pipeline data-profiler data-profiling data-quality data-reliability data-science data-testing data-visualization dbt dbt-metrics eda exploratory-data-analysis pull-requests python reporting
Last synced: 10 Apr 2025
https://github.com/InfuseAI/piperider
Code review for data in dbt
code-review continuous-integration data-exploration data-observability data-pipeline data-profiler data-profiling data-quality data-reliability data-science data-testing data-visualization dbt dbt-metrics eda exploratory-data-analysis pull-requests python reporting
Last synced: 18 Apr 2025
https://posit-dev.github.io/pointblank/
Data validation made beautiful and powerful
data-quality data-testing data-validation easy-to-understand tabular-data
Last synced: 22 Jun 2025
https://github.com/lukaszlapaj/software-testing-resource-pack
Various files useful for manual testing and test automation etc.
api-testing backend-testing bootstrapping data-testing e2e-testing e2e-tests front-end-testing manual-testing quality-assurance resource-pack resources software-quality software-testing test-automation test-data testing testing-tools ui-tests
Last synced: 01 Mar 2026
https://github.com/astronomer/airflow-provider-great-expectations
Great Expectations Airflow operator
airflow airflow-operators airflow-providers data-quality data-science data-testing
Last synced: 16 May 2025
https://github.com/re-data/dbt-re-data
re_data - fix data issues before your users & CEO would discover them 😊
data-monitoring data-observability data-quality data-testing dbt dbt-packages sql
Last synced: 07 Apr 2025
https://github.com/posit-dev/pointblank
Find out if your data is what you think it is
data-quality data-testing data-validation easy-to-understand tabular-data
Last synced: 04 Feb 2026
https://github.com/sodadata/soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
data-engineering data-observability data-quality data-testing pyspark python soda-sql spark
Last synced: 26 Jul 2025
https://github.com/datakitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, Â new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
data data-engineering data-observability data-quality data-science data-testing datachecker dataops dataprofiling dataquality datavalidation mssql postgresql python redshift self-hosted snowflake
Last synced: 25 Feb 2026
https://github.com/data-catering/data-caterer
Data generation and validation tool for any data source
data-generation data-quality data-test data-testing data-validation java scala testing-automation ui yaml
Last synced: 02 Sep 2025
https://github.com/serialbandicoot/great-assertions
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
data-science data-testing databricks great-expectations jupyter-notebook python python3 quality-assurance testing
Last synced: 28 Oct 2025
https://github.com/shridhar1504/sales-forecasting-datascience-project
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
data-analytics data-cleaning data-science data-testing data-visualization forecasting-models machin model-evaluation model-fitting prediction predictive-modeling python3 regression-algorithms salesforecast sklearn-library supervised-learning
Last synced: 30 Oct 2025
https://github.com/ericmjl/software-testing-open-source-and-data-science
Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series
data-science data-testing machine-learning-testing software-testing testing
Last synced: 25 Feb 2026
https://github.com/jafeerr/spark-data-test
Spark Data Test - A PySpark-based automation testing utility to compare Spark DataFrames
apache-spark data-testing dataframe pyspark
Last synced: 04 Oct 2025
https://github.com/manoj9788/spark-etl-tests
A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark
data-testing etl etl-automation scala
Last synced: 20 Jun 2025
https://github.com/balajimohan18/sales-forecasting-datascience-project
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
data-analytics data-science data-testing data-visualization forecasting forecasting-models machine-learning model-evaluation predictive-modeling python regression-algorithms salesforecast scipy sklearn-library supervised-learning
Last synced: 04 Mar 2025
https://github.com/blleshi/credit_risk_classification
Credit Risk Classification
classification-report confusion-matrix credit-risk credit-risk-classification data-testing data-training imbalanced-learning lending loans logistic-regression logistic-regression-model pandas randomoversampler resampled-data target-classification
Last synced: 27 Feb 2025