Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/curiousily/reproducible-ml-with-dvc
Tutorial on experiment tracking and reproducibility for Machine Learning projects with DVC
https://github.com/curiousily/reproducible-ml-with-dvc
deep-learning dvc experiment-tracking linear-regression machine-learning metrics python random-forest reproducibility scikit-learn tracking
Last synced: 2 months ago
JSON representation
Tutorial on experiment tracking and reproducibility for Machine Learning projects with DVC
- Host: GitHub
- URL: https://github.com/curiousily/reproducible-ml-with-dvc
- Owner: curiousily
- License: mit
- Created: 2020-05-18T19:34:34.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T09:57:09.000Z (about 2 years ago)
- Last Synced: 2024-08-14T07:07:40.286Z (5 months ago)
- Topics: deep-learning, dvc, experiment-tracking, linear-regression, machine-learning, metrics, python, random-forest, reproducibility, scikit-learn, tracking
- Language: Python
- Homepage: https://www.curiousily.com/posts/reproducible-machine-learning-and-experiment-tracking-pipiline-with-python-and-dvc/
- Size: 92.8 KB
- Stars: 17
- Watchers: 2
- Forks: 4
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Setup
[Read the complete tutorial here](https://www.curiousily.com/posts/reproducible-machine-learning-and-experiment-tracking-pipiline-with-python-and-dvc/)
```
git clone [email protected]:curiousily/Reproducible-ML-with-DVC.git
``````
pipenv install --dev
``````
git checkout pre-dvc
```## DVC
Initialize DVC
```
dvc init
```and add remote storage (local in this case)
```
dvc remote add -d localremote /tmp/dvc-storage
```disable analytics (optional)
```
dvc config core.analytics false
```## Experiment with Linear Regression
Build Dataset
```
dvc run -f assets/data.dvc \
-d studentpredictor/create_dataset.py \
-o assets/data \
python studentpredictor/create_dataset.py
```Create features
```
dvc run -f assets/features.dvc \
-d studentpredictor/create_features.py \
-d assets/data \
-o assets/features \
python studentpredictor/create_features.py
```Train model
```
dvc run -f assets/models.dvc \
-d studentpredictor/train_model.py \
-d assets/features \
-o assets/models \
python studentpredictor/train_model.py
```Evaluate the model and save metrics (RMSE and r^2)
```
dvc run -f assets/evaluate.dvc \
-d studentpredictor/evaluate_model.py \
-d assets/features \
-d assets/models \
-M assets/metrics.json \
python studentpredictor/evaluate_model.py
```Check the metrics for your current model:
```sh
dvc metrics show -T
```## Experiment with Random Forest
Checkout the Random Forest experiment:
```
git checkout rf-experiment
```Reproduce everything with the RF model
```
dvc repro assets/evaluate.dvc
```Check the metrics for the Random Forest model compared to the Linear Regression:
```sh
dvc metrics show -T
```[Read the complete tutorial here](https://www.curiousily.com/posts/reproducible-machine-learning-and-experiment-tracking-pipiline-with-python-and-dvc/)
## License
MIT