https://github.com/codecuttech/dvc-demo
https://github.com/codecuttech/dvc-demo
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/codecuttech/dvc-demo
- Owner: CodeCutTech
- Created: 2025-05-09T17:11:45.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-05-21T19:44:54.000Z (7 months ago)
- Last Synced: 2025-06-30T05:35:52.965Z (6 months ago)
- Language: Python
- Size: 364 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://codecut.ai/introduction-to-dvc-data-version-control-tool-for-machine-learning-projects-2/)
# DVC Demo
A demonstration of Data Version Control (DVC) for managing ML pipelines and data versioning.
## What is DVC?
[DVC](https://dvc.org/) is an open-source version control system for machine learning projects. It helps you:
- Version control large files, data sets, machine learning models, and metrics
- Track ML experiments
- Create reproducible ML pipelines
- Collaborate with team members
## Project Structure
```
.
├── data/ # Raw and processed data files
│ └── raw.dvc # DVC file for raw data
├── src/ # Source code for data processing and model training
├── config/ # Configuration files
├── .dvc/ # DVC internal files
├── dvc.yaml # DVC pipeline definition
├── dvc.lock # DVC lock file for reproducible pipelines
└── .dvcignore # Files/directories to be ignored by DVC
```
## Setup
1. Install project dependencies using uv:
```bash
uv sync dvc
```
2. Pull the data from remote storage:
```bash
dvc pull
```
3. Run the pipeline to reproduce all stages:
```bash
dvc repro
```
## Version Control
- Track data files: `dvc add `
- Push data to remote storage: `dvc push`
- Pull data from remote storage: `dvc pull`
- Check status: `dvc status`