{"id":28794504,"url":"https://github.com/iterative/cml_dvc_case","last_synced_at":"2025-08-26T06:33:09.096Z","repository":{"id":38001104,"uuid":"270839127","full_name":"iterative/cml_dvc_case","owner":"iterative","description":null,"archived":false,"fork":false,"pushed_at":"2023-11-12T17:22:52.000Z","size":43,"stargazers_count":17,"open_issues_count":6,"forks_count":52,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-08-18T07:43:53.810Z","etag":null,"topics":["example"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iterative.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-08T21:55:58.000Z","updated_at":"2024-09-09T20:08:46.000Z","dependencies_parsed_at":"2023-01-19T14:46:23.593Z","dependency_job_id":null,"html_url":"https://github.com/iterative/cml_dvc_case","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/iterative/cml_dvc_case","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fcml_dvc_case","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fcml_dvc_case/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fcml_dvc_case/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fcml_dvc_case/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iterative","download_url":"https://codeload.github.com/iterative/cml_dvc_case/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iterative%2Fcml_dvc_case/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272186211,"owners_count":24888333,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["example"],"created_at":"2025-06-18T02:38:46.987Z","updated_at":"2025-08-26T06:33:09.017Z","avatar_url":"https://github.com/iterative.png","language":"Python","readme":"# CML with DVC use case\n\nThis repository contains a sample project using [CML](https://github.com/iterative/cml) with DVC to push/pull data from cloud storage and track model metrics. When a pull request is made in this repository, the following will occur:\n- GitHub will deploy a runner machine with a specified CML Docker environment\n- DVC will pull data from cloud storage\n- The runner will execute a workflow to train a ML model (`python train.py`)\n- A visual CML report about the model performance with DVC metrics will be returned as a comment in the pull request\n\nThe key file enabling these actions is `.github/workflows/cml.yaml`.\n\n## Secrets and environmental variables\nIn this example, `.github/workflows/cml.yaml` contains three environmental variables that are stored as repository secrets.\n\n| Secret  | Description  | \n|---|---|\n|  GITHUB_TOKEN | This is set by default in every GitHub repository. It does not need to be manually added.  |\n| AWS_ACCESS_KEY_ID  | AWS credential for accessing S3 storage  | \n| AWS_SECRET_ACCESS_KEY | AWS credential for accessing S3 storage |\n| AWS_SESSION_TOKEN | Optional AWS credential for accessing S3 storage (if MFA is enabled) |\n\nDVC works with many kinds of remote storage. To configure this example for a different cloud storage provider, see our [documentation on the CML repository](https://github.com/iterative/cml#using-cml-with-dvc).\n\n## Cloning this project\nNote that if you clone this project, you will have to configure your own DVC storage and credentials for the example. We suggest the following procedure:\n\n1. Fork the repository and clone to your local workstation. \n2. Run `python get_data.py` to generate your own copy of the dataset. After initializing DVC in the project directory and configuring your remote storage, run `dvc add data` and `dvc push` to push your dataset to remote storage.\n3. `git add`, `commit` and `push` to push your DVC configuration to GitHub.\n4. Add your storage credentials as repository secrets.\n5. Copy the workflow file `.github/workflows/cml.yaml` from this repository to your fork. By default, workflow files are not copied in forks. When you commit this file to your repository, the first workflow should be initiated. \n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiterative%2Fcml_dvc_case","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiterative%2Fcml_dvc_case","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiterative%2Fcml_dvc_case/lists"}