{"id":14064785,"url":"https://github.com/curiousily/Reproducible-ML-with-DVC","last_synced_at":"2025-07-29T19:31:19.034Z","repository":{"id":37656775,"uuid":"265042406","full_name":"curiousily/Reproducible-ML-with-DVC","owner":"curiousily","description":"Tutorial on experiment tracking and reproducibility for Machine Learning projects with DVC","archived":false,"fork":false,"pushed_at":"2022-12-08T09:57:09.000Z","size":95,"stargazers_count":18,"open_issues_count":5,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-03T03:42:04.544Z","etag":null,"topics":["deep-learning","dvc","experiment-tracking","linear-regression","machine-learning","metrics","python","random-forest","reproducibility","scikit-learn","tracking"],"latest_commit_sha":null,"homepage":"https://www.curiousily.com/posts/reproducible-machine-learning-and-experiment-tracking-pipiline-with-python-and-dvc/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/curiousily.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-18T19:34:34.000Z","updated_at":"2024-08-26T19:15:52.000Z","dependencies_parsed_at":"2023-01-25T06:15:47.087Z","dependency_job_id":null,"html_url":"https://github.com/curiousily/Reproducible-ML-with-DVC","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiousily%2FReproducible-ML-with-DVC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiousily%2FReproducible-ML-with-DVC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiousily%2FReproducible-ML-with-DVC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curiousily%2FReproducible-ML-with-DVC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/curiousily","download_url":"https://codeload.github.com/curiousily/Reproducible-ML-with-DVC/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228040851,"owners_count":17860211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","dvc","experiment-tracking","linear-regression","machine-learning","metrics","python","random-forest","reproducibility","scikit-learn","tracking"],"created_at":"2024-08-13T07:04:04.742Z","updated_at":"2024-12-04T03:31:31.031Z","avatar_url":"https://github.com/curiousily.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"## Setup\n\n[Read the complete tutorial here](https://www.curiousily.com/posts/reproducible-machine-learning-and-experiment-tracking-pipiline-with-python-and-dvc/)\n\n```\ngit clone git@github.com:curiousily/Reproducible-ML-with-DVC.git\n```\n\n```\npipenv install --dev\n```\n\n```\ngit checkout pre-dvc\n```\n\n## DVC\n\nInitialize DVC\n\n```\ndvc init\n```\n\nand add remote storage (local in this case)\n\n```\ndvc remote add -d localremote /tmp/dvc-storage\n```\n\ndisable analytics (optional)\n\n```\ndvc config core.analytics false\n```\n\n## Experiment with Linear Regression\n\nBuild Dataset\n\n```\ndvc run -f assets/data.dvc \\\n    -d studentpredictor/create_dataset.py \\\n    -o assets/data \\\n    python studentpredictor/create_dataset.py\n```\n\nCreate features\n\n```\ndvc run -f assets/features.dvc \\\n    -d studentpredictor/create_features.py \\\n    -d assets/data \\\n    -o assets/features \\\n    python studentpredictor/create_features.py\n```\n\nTrain model\n\n```\ndvc run -f assets/models.dvc \\\n    -d studentpredictor/train_model.py \\\n    -d assets/features \\\n    -o assets/models \\\n    python studentpredictor/train_model.py\n```\n\nEvaluate the model and save metrics (RMSE and r^2)\n\n```\ndvc run -f assets/evaluate.dvc \\\n    -d studentpredictor/evaluate_model.py \\\n    -d assets/features \\\n    -d assets/models \\\n    -M assets/metrics.json \\\n    python studentpredictor/evaluate_model.py\n```\n\nCheck the metrics for your current model:\n\n```sh\ndvc metrics show -T\n```\n\n## Experiment with Random Forest\n\nCheckout the Random Forest experiment:\n\n```\ngit checkout rf-experiment\n```\n\nReproduce everything with the RF model\n\n```\ndvc repro assets/evaluate.dvc\n```\n\nCheck the metrics for the Random Forest model compared to the Linear Regression:\n\n```sh\ndvc metrics show -T\n```\n\n[Read the complete tutorial here](https://www.curiousily.com/posts/reproducible-machine-learning-and-experiment-tracking-pipiline-with-python-and-dvc/)\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcuriousily%2FReproducible-ML-with-DVC","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcuriousily%2FReproducible-ML-with-DVC","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcuriousily%2FReproducible-ML-with-DVC/lists"}