{"id":16926968,"url":"https://github.com/shcheklein/dvc-docker-example","last_synced_at":"2025-07-02T10:34:46.933Z","repository":{"id":174851225,"uuid":"652882806","full_name":"shcheklein/dvc-docker-example","owner":"shcheklein","description":"An example of DVC pipeline with a Docker-wrapped command","archived":false,"fork":false,"pushed_at":"2023-12-15T11:34:19.000Z","size":11,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-06T18:52:15.292Z","etag":null,"topics":["dvc","dvc-pipeline","example","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shcheklein.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-13T01:46:40.000Z","updated_at":"2024-07-30T19:48:55.000Z","dependencies_parsed_at":"2025-01-25T22:34:30.928Z","dependency_job_id":"439fd0ae-5599-4d19-9434-f338aa9c7779","html_url":"https://github.com/shcheklein/dvc-docker-example","commit_stats":null,"previous_names":["shcheklein/dvc-docker-example"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shcheklein/dvc-docker-example","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shcheklein%2Fdvc-docker-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shcheklein%2Fdvc-docker-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shcheklein%2Fdvc-docker-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shcheklein%2Fdvc-docker-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shcheklein","download_url":"https://codeload.github.com/shcheklein/dvc-docker-example/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shcheklein%2Fdvc-docker-example/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263121359,"owners_count":23416981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dvc","dvc-pipeline","example","machine-learning"],"created_at":"2024-10-13T20:32:23.973Z","updated_at":"2025-07-02T10:34:46.901Z","avatar_url":"https://github.com/shcheklein.png","language":"Dockerfile","readme":"## DVC + Docker example\n\nA few different ways to run DVC with Docker.\n\n- `inside` - the whole DVC pipeline runs inside a container.\n- `outside` - that we run `docker run` as one or more stages of a DVC\n  pipeline while `dvc repro` or `dvc exp run` running outside a container.\n\n\u003e Don't consider this as comprehensive tutorial. It's provided as an example of\ndifferent combinations. There are other ways to manage different aspects of this\nscenario, reach out to us or read the docs to get the right combination of\ncommands or the right way to package things into an image.\n\n### `inside/mount`\n\nMounts a current directory (workspace) as volume to an running image. The\nworkflow to run it. Suits well if you need to run it locally since it takes the\nchanges in the workspace, local Git config. Also, it avoid clones, duplicating\ndata and results to pass them between the container and a host.\n\n1. do any changes to the wokspace, code, data, etc\n2. run any DVC commands if needed - `dvc pull`, etc to get data\n3. run DVC pipeline with:\n\n```cli\ndocker run -it \\\n           -v $(pwd)/../..:/app/workdir \\\n           -v ~/.gitconfig:/etc/gitconfig \\\n           dvc-docker-mount\n```\n\n\u003e On Mac metal add `--platform linux/amd64` since DVC deb package is not\navailable at time of writing this.\n\n### `inside/clone`\n\nSuits well for the scenario when we need to run an experiment, or make an ELT\nremotely - on demand or on schedule. We do a fresh `git clone`, pull data, run\nthe pipelein (can be `dvc repro` ro `dvc exp run`) and push results back-\neither to be consumed within [Studio](https://studio.iterative.ai/) (it renders\n`dvc exp push`-ed experiments), or by `git clone`, `dvc pull`, `dvc exp pull`\ncommands on any other machine.\n\nTo run it:\n\n```cli\ndocker run -e STUDIO_GIT_TOKEN='*********' \\\n           -e AWS_ACCESS_KEY_ID='*******' \\\n           -e AWS_SECRET_ACCESS_KEY='***********' \\\n           -it dvc-docker-clone\n```\n\nWhere:\n\n- `STUDIO_GIT_TOKEN` - a secure way to give access to a specific repo for a\n  certain period of time. Read more\n  [here](https://github.blog/2022-10-18-introducing-fine-grained-personal-access-tokens-for-github/).\n  Other platforms might not provide this. In this case you can use SSH key\n  and/or create a seprate account with a restricted access to make it secure.\n- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` - an example on how to pass\n  AWS credentials to pull / push data if needed. Again, it depends on the\n  environment if you need this or not. On an EC2 instance we would recommend to\n  setup an IAM role and avoid managing credentials manually.\n\n\u003e On Mac metal add `--platform linux/amd64` since DVC deb package is not\navailable at time of writing this.\n\n### `outside`\n\nRun with `dvc repro` or `dvc exp run`. It shows a way to package a Dockerized\ncommands into a DVC pipeline. An additional benefit is that DVC handles the\nDocker image update as well.\n\n### Considerations\n\n1. Storage and Git credentials. Since we need to run the pipeline or certain\n   commands we need to pass Git credentials and/or storage credentials + Git\n   config. See some comments in the ``inside/clone` section that gives a bit\n   more color to this.\n2. Install DVC. We use the `deb` package and Ubuntu base image in these\n   examples. A clear optimization is to create your own image that already\n   includes all the basic steps and can be reused in the projects' repos to\n   simplify their Dockerfiles.\n3. [Iterative Studio](https://studio.iterative.ai/) - gives an excellent way to\n   see the results, trigger runs, register and see all the models, and more.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshcheklein%2Fdvc-docker-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshcheklein%2Fdvc-docker-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshcheklein%2Fdvc-docker-example/lists"}