Projects in Awesome Lists tagged with data-versioning
A curated list of projects in awesome lists tagged with data-versioning .
https://github.com/dolthub/dolt
Dolt – Git for Data
command-line data-version-control data-versioning database database-version-control database-versioning decentralized-database git git-database git-for-data git-for-databases git-sql golang immutable-database mariadb mysql sql version-controlled-database
Last synced: 12 Mar 2026
https://github.com/wandb/wandb
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
ai collaboration data-science data-versioning deep-learning experiment-track hyperparameter-optimization hyperparameter-search hyperparameter-tuning jax keras machine-learning ml-platform mlops model-versioning pytorch reinforcement-learning reproducibility tensorflow
Last synced: 05 Feb 2026
https://github.com/treeverse/lakefs
lakeFS - Data version control for your data lake | Git for data
apache-spark apache-sparksql aws-s3 azure-blob-storage azure-storage data-engineering data-lake data-quality data-version-control data-versioning datalake datalakes git-for-data go golang google-cloud-storage hadoop-filesystem lakefs object-storage
Last synced: 18 Feb 2026
https://github.com/treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
apache-spark apache-sparksql aws-s3 azure-blob-storage azure-storage data-engineering data-lake data-quality data-version-control data-versioning datalake datalakes git-for-data go golang google-cloud-storage hadoop-filesystem lakefs object-storage
Last synced: 20 Mar 2025
https://github.com/quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
data data-engineering data-version-control data-versioning parquet python serialization
Last synced: 13 May 2025
https://github.com/iusztinpaul/energy-forecasting
🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴
3-pipeline-design airflow batch-processing cicd data-versioning docker fastapi feature-store gcp github-actions great-expectations hopsworks ml-monitoring mlops model-registry poetry python sktime streamlit weights-and-biases
Last synced: 15 May 2025
https://github.com/koordinates/kart
Distributed version-control for geospatial and tabular data
data data-versioning geospatial geospatial-data gis version-control
Last synced: 01 May 2025
https://github.com/BemiHQ/bemi
Automatic data change tracking for PostgreSQL
audit-log audit-trail change-data-capture change-tracking data-versioning postgresql
Last synced: 16 Jul 2025
https://github.com/bemihq/bemi
Automatic data change tracking for PostgreSQL
audit-log audit-trail change-data-capture change-tracking data-versioning postgresql
Last synced: 16 May 2025
https://github.com/RecallGraph/RecallGraph
A versioning data store for time-variant graph data.
arangodb data-versioning dynamic-networks foxx-microservice streaming-graph-data temporal-graphs
Last synced: 31 Mar 2025
https://github.com/leeper/data-versioning
Collecting thoughts about data versioning
data data-citation data-versioning metadata unf version-control
Last synced: 15 Feb 2026
https://github.com/BemiHQ/bemi-prisma
Automatic data change tracking for Prisma
audit-log audit-trail data-versioning postgresql prisma
Last synced: 07 May 2025
https://github.com/GitDataAI/jiaozifs
A Git-like Version Control File System for AI & Data Product Management.
aiops data-collaboration data-lake data-lineage data-product data-version-control data-versioning dataops digital-twins federated-learning git git-filesystem git-for-data git-interface jiaozifs jzfs mlops version-controlled-filesystem
Last synced: 03 Mar 2025
https://github.com/GitDataAI/jzfs
A Git-like Version Control File System for AI & Data Product Management.
aiops data-collaboration data-lake data-lineage data-product data-version-control data-versioning dataops digital-twins federated-learning git git-filesystem git-for-data git-interface jiaozifs jzfs mlops version-controlled-filesystem
Last synced: 04 Apr 2025
https://github.com/layerai-archive/sdk
Metadata store for Production ML
collaboration data-science data-versioning deep-learning experiment-tracking hyperparameter-optimization hyperparameter-tuning keras machine-learning mlops model-versioning python pytorch reinforcement-learning sklearn tensorflow
Last synced: 30 Sep 2025
https://github.com/ropensci/gittargets
Data version control for reproducible analysis pipelines in R with {targets}.
data-science data-version-control data-versioning r r-package reproducibility reproducible-research rstats targets workflow
Last synced: 21 Aug 2025
https://github.com/bemihq/bemi-prisma
Automatic data change tracking for Prisma
audit-log audit-trail data-versioning postgresql prisma
Last synced: 17 Mar 2025
https://github.com/wrgl/wrgl
Git-like data versioning.
data-version-control data-versioning git git-for-data go golang
Last synced: 16 Jan 2026
https://github.com/bemihq/bemi-typeorm
Automatic data change tracking for TypeORM
audit-trail data-versioning typeorm
Last synced: 06 Apr 2025
https://github.com/aws/amazon-finspace-examples
This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace
aws data-science data-versioning examples finspace timeseries-analysis
Last synced: 20 Oct 2025
https://github.com/pier4all/mongoose-versioned
Document versioning library for MongoDB using the mongoose package.
data-versioning mongo mongodb mongoose versioning
Last synced: 04 Jun 2026
https://github.com/ropensci/butterfly
Verification of continually updating timeseries data where we expect new values, but want to ensure previous data remains unchanged. Maintained by @thomaszwagerman
data-versioning qaqc r r-package rstats timeseries verification
Last synced: 20 Feb 2026
https://github.com/bemihq/bemi-supabase-js
Automatic data change tracking for Supabase JS
audit-log audit-trail data-versioning postgresql supabase supabase-db supabase-js
Last synced: 26 Oct 2025
https://github.com/BemiHQ/bemi-supabase-js
Automatic data change tracking for Supabase JS
audit-log audit-trail data-versioning postgresql supabase supabase-db supabase-js
Last synced: 25 Sep 2025
https://github.com/bemihq/bemi-sqlalchemy
Automatic data change tracking for SQLAlchemy
audit-log audit-trail data-versioning postgresql sqlalchemy
Last synced: 05 Jul 2025
https://github.com/dolthub/kedro-dolt
Kedro-Dolt Hook Plugin
data data-versioning dolt kedro-plugin
Last synced: 16 Jun 2025
https://github.com/bemihq/bemi-mikro-orm
Automatic data change tracking for MikroORM
audit-log audit-trail data-versioning mikro-orm postgresql
Last synced: 28 Feb 2026
https://github.com/newronai/newron-sdk
Newron is a data-centric ML platform to easily build, manage, deploy and continuously improve models through data driven development.
data-science data-versioning deep-learning experiment-tracking hyperparameter-optimization hyperparameter-search hyperparameter-tuning keras machine-learning ml-platform mlops model-monitoring model-registry model-reproducibility model-versioning python3 pytorch tensorflow
Last synced: 02 Apr 2025
https://github.com/abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
apache-iceberg apache-nessie apache-spark atomic-etl block-storage branch-based-development data-engineering data-lakehouse data-pipelines data-versioning dataops distributed-systems etl etl-pipeline git-for-data minio s3 spark-etl table-format time-travel
Last synced: 20 May 2026
https://github.com/amr-yasser226/customer-churn-prediction
End-to-end customer churn prediction project: dataset preparation, experiments with scikit-learn, CI/CD, and deployment examples.
churn-prediction classification data-versioning docker jupyter-notebook machine-learning mlflow mlops pytest python scikit-learn
Last synced: 13 Apr 2026
https://github.com/imamaaa/mlops-air-quality-prediction-pipeline
MLOps pipeline for real-time air quality monitoring and pollution prediction. Uses ARIMA & LSTM models, DVC for data versioning, Flask API for deployment, and Prometheus & Grafana for monitoring.
air-quality arima data-versioning dvc environmental-monitoring flask-api grafana lstm machine-learning mlops pollution-prediction prometheus time-series
Last synced: 13 Apr 2026
https://github.com/prakulhiremath/flashback
⏪ Git for DataFrames. Time-travel debugging, exact temporal lineage, and feature evolution tracking for Pandas and Polars.
data-lineage data-versioning mlops polars time-travel
Last synced: 14 Jun 2026
https://github.com/pier4all/data-versioning
Repository for evaluating the different approaches to data versioning
Last synced: 04 Jun 2026
https://github.com/ksm26/llmops
In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy custom Large Language Models (LLMs).
data-versioning deeplearning-ai deployment fine-tuning foundational-models large-language-models llmops model-customization pipeline-deployment responsible-ai supervised-learning
Last synced: 28 Mar 2025
https://github.com/neptune-ai/project-tabular-data-version
Project with tabular data versioned with Artifacts.
data-versioning machine-learning mlops neptune neptune-ai
Last synced: 29 Dec 2025