https://github.com/blaz-cerpnjak/dvc-git-example
DVC - Data Version Control Basics
https://github.com/blaz-cerpnjak/dvc-git-example
data-version-control dvc dvc-google-drive
Last synced: about 1 month ago
JSON representation
DVC - Data Version Control Basics
- Host: GitHub
- URL: https://github.com/blaz-cerpnjak/dvc-git-example
- Owner: blaz-cerpnjak
- Created: 2024-03-23T16:36:34.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-24T12:19:37.000Z (almost 2 years ago)
- Last Synced: 2025-01-23T06:45:00.150Z (about 1 year ago)
- Topics: data-version-control, dvc, dvc-google-drive
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Version Control Example (DVC)
Source: [dvc.org](https://dvc.org/doc/start)
### Project Setup
#### Init Git
```bash
git init
```
#### Init DVC
```bash
dvc init
git status
git commit -m "Initialize DVC"
git push
```
#### Add some data to the project
```bash
mkdir data
# Add .csv file to data/
```
### Configuring a dvc remote storage
#### Add dvc data storage remote (Google Drive as example)
```bash
# copy ID of your folder drive.google.com/drive/folders/[ID]
# you will need to authenticate and allow dvc to access that folder
dvc remote add -d storage gdrive://[ID]
git commit .dvc/config -m "Configure remote storage"
git push
```
### Adding and pushing data to dvc remote storage
#### Add data to dvc
```bash
dvc add data/data.csv
# This .csv file will be automaticaly added to .gitignore
```
#### Push data to dvc
```bash
dvc push
```
#### Push changes to git
```bash
git add data/data.csv.dvc
git add data/.gitignore
git commit -m "Add raw data"
git push
```
### Pulling data from dvc storage
For example we remove our file.
```bash
rm -f data/data.csv
rm -rf .dvc/cache
```
Let's pull from dvc.
```bash
dvc pull
```
### Making local changes
```bash
mkdir tmp
cp data/data.csv tmp/data.csv
cat tmp/data.csv >> data/data.csv
ls -lh data
```
```bash
dvc add data/data.csv
git add data/data.csv.dvc
git commit -m "Dataset updates"
dvc push
```
### Restoring changes
Check changes:
```bash
git log --online
```
Restore changes:
```bash
# Make sure to checkout data/data.csv.dvc and not data/data.csv !
git checkout HEAD^1 data/data.csv.dvc
dvc checkout
```
Verify:
```bash
ls -lh data
```