{"id":25177782,"url":"https://github.com/cr21/docker-training-pytorch","last_synced_at":"2026-05-04T09:31:22.462Z","repository":{"id":258380527,"uuid":"857200407","full_name":"cr21/Docker-Training-Pytorch","owner":"cr21","description":"Docker-Compose Pytorch Training","archived":false,"fork":false,"pushed_at":"2024-09-21T08:36:51.000Z","size":2084,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-06T03:45:39.640Z","etag":null,"topics":["docker","docker-compose","mlops","mlops-workflow","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cr21.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-09-14T03:11:35.000Z","updated_at":"2024-10-16T18:32:02.000Z","dependencies_parsed_at":null,"dependency_job_id":"f5833a9d-e836-40bc-bb47-adfe19a5842c","html_url":"https://github.com/cr21/Docker-Training-Pytorch","commit_stats":null,"previous_names":["cr21/docker-training-pytorch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cr21/Docker-Training-Pytorch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FDocker-Training-Pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FDocker-Training-Pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FDocker-Training-Pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FDocker-Training-Pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cr21","download_url":"https://codeload.github.com/cr21/Docker-Training-Pytorch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cr21%2FDocker-Training-Pytorch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32601478,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T22:12:39.696Z","status":"online","status_checked_at":"2026-05-04T02:00:06.625Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","docker-compose","mlops","mlops-workflow","pytorch"],"created_at":"2025-02-09T14:49:32.091Z","updated_at":"2026-05-04T09:31:22.450Z","avatar_url":"https://github.com/cr21.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Docker-Training-Pytorch\n\n## Table Of Contents\n  1. [Why Docker?](#why-docker)\n  2. [Key Terms In Docker](#key-terms-in-docker)\n  3. [Docker In MLOps](#docker-in-mlops)\n  4. [Multi-Container Applications with Docker-compose : Pytorch+cpu Image classification](#multi-container-applications-with-docker-compose---pytorchcpu-image-classification)\n  5. [Build Container](#build-container)\n        1. [Run Train Service](#run-training-service)\n        2. [Run Evaluate service](#run-evaluate-service)\n        3. [Run Inference service](#run-inference-service)\n  6. [Results](#results)\n  7. [References](#references)\n\n  \n\n## Why Docker?\n- Every Software Engineer or Machine Learning Engineer faced the problem **\"It worked in My system, but it's not working in your\"**.\n- Docker solve this problem by bundling application and depedencies  into container. \n- You can use same container anywhere and it make sure that code and dependencies are consistent across environment.\n\n## Key terms in Docker\n\n1. **Dockerfile**\n- File that contains instruction to build Docker Image. \n- It specifies base Image, environment variable to use in running container, commands, build instruction, depedencies, commands need to set up application code in running container.\n- each line is  a layer which will be build, cached, which facilates reproducible builds,faster build.\n\n2. **Image**\n- A read-only template used to create docker container. IT is list of instructions to assemble image layer by layer.\n- Images can be shared and distributed and reused. Dockerhub is place where shared and distributed image can be accessed.\n- It is stored in docker registry.\n\n3. **Container**\n- Runnable instances of Docker images\n- A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.\n\n\n4. **Volume**\n- Data storage mechanism used to share data between a Docker container and the host machine, or to share data between two running container.\n\n5. **Network**\n- provide network capabilities to communicate between container or external resources.\n\n6. **Docker Compose**\n- Docker compose is a tool to manage multi-container docker application.\n- uses YAML file to configure various application services.\n- it provides a way to segregate different serices running on isolated environment on single host\n- preserved volume when containers are created\n- Only recreate container that have changed\n\n7. **Docker Compose workflow**\n- Docker compose follows below three steps\n    A. **Define app's environment using Dockerfile** this helps to create docker image of application\n    B. **Define services which requires by app in docker-compose file** this file specifies all the services that application requires to function, such as a database, volume, queues, caches, etc.\n    C. **Run docker-compose up and Compose starts and run entire app** this will do magic and starts whole application, If images are not built, it will built new image, it uses dockerfile to build,It then recreate containers.\n\n\n## Docker in MLOps\n- In Machine Learning we can package ML Models, along with dependencies and ML run time environment into container.\n- This facilates seemless deployment of ML models in various platform, without worrying about infrastrcture.\n- In MLOps docker is very important in scaling Machine Learning models.\n- Encapsulating ML Model and run time enviroment in containers will ensure automating deployment, maintain version controls, and reproducibility.\n\n## Multi-Container Applications with Docker-compose :  Pytorch+cpu Image classification \n\n- Train Image classification Model using Pytorch, we will use cpu only version of pytorch and use multiprocessing cpu training ([mnist-example](https://github.com/pytorch/examples/tree/main/mnist_hogwild))\n- Train model on cloth dataset which contains 5 classes\n    1. bottomwear\n    2. eyewear\n    3. footwear\n    4. handbag\n    5. topwear\n- Using Docker-compose to create 3 services\n    1. `Train service` for training model\n    2. `Evaluate service` to evaluate trained model\n    3. `Infernce service` to test the trained model.\n- Each service has its own Dockerfile to build container.\n- shared volume is used to share data and model across container and synchronization purpose.\n\n\n![docker-workflow](readme_images/docker-Flowchart.png?raw=true \"docker workflow\")\n\ndocker-compose.yml\n\n```yaml\nservices:\n    train:\n        build:\n            context: .\n            dockerfile: Dockerfile.train\n        shm_size: \"2gb\"\n        volumes:\n          - ./model:/opt/mount/model\n          - ./data:/opt/mount/data\n          - docker_vol:/opt/mount\n\n    eval:\n        build:\n            context: .\n            dockerfile: Dockerfile.eval\n        volumes:\n          - ./model:/opt/mount/model\n          - ./data:/opt/mount/data\n          - docker_vol:/opt/mount\n    infer:\n        build:\n            context: .\n            dockerfile: Dockerfile.infer\n        volumes:\n          - docker_vol:/opt/mount\n          - ./model:/opt/mount/model\n          - ./data:/opt/mount/data\n          - ./responses:/opt/mount/responses\nvolumes:\n  docker_vol:\n```\n\nDockerfile.train\n```Dockerfile\nFROM python:3.9.19-slim AS stg1\n\n    \nCOPY requirements.txt .\n\nRUN apt-get update -y \u0026\u0026 apt install -y --no-install-recommends git\\\n\u0026\u0026 pip install --no-cache-dir -U pip \\ \n    \u0026\u0026 pip install --user --no-cache-dir https://download.pytorch.org/whl/cpu/torch-1.11.0%2Bcpu-cp39-cp39-linux_x86_64.whl \\\n    \u0026\u0026 pip install --user --no-cache-dir https://download.pytorch.org/whl/cpu/torchvision-0.12.0%2Bcpu-cp39-cp39-linux_x86_64.whl \\\n    \u0026\u0026 pip install --user --no-cache-dir -r requirements.txt \u0026\u0026 rm -rf /root/.cache/pip\n\n# Stage 2: run application code\nFROM python:3.9.19-slim\n\nCOPY --from=stg1 /root/.local /root/.local\nENV PATH=/root/.local/bin:$PATH\n\nWORKDIR  /opt/mount/\nCOPY . .\n\n#ENTRYPOINT [\"/bin/bash\"]\nCMD [\"python3\", \"train.py\"]\n\n```\n\nDockerfile.eval\n```Dockerfile\nFROM python:3.9.19-slim AS stg1\n\n    \nCOPY requirements.txt .\n\nRUN apt-get update -y \u0026\u0026 apt install -y --no-install-recommends git\\\n\u0026\u0026 pip install --no-cache-dir -U pip \\ \n    \u0026\u0026 pip install --user --no-cache-dir https://download.pytorch.org/whl/cpu/torch-1.11.0%2Bcpu-cp39-cp39-linux_x86_64.whl \\\n    \u0026\u0026 pip install --user --no-cache-dir https://download.pytorch.org/whl/cpu/torchvision-0.12.0%2Bcpu-cp39-cp39-linux_x86_64.whl \\\n    \u0026\u0026 pip install --user --no-cache-dir -r requirements.txt \u0026\u0026 rm -rf /root/.cache/pip\n\n# Stage 2: run application code\nFROM python:3.9.19-slim\n\nCOPY --from=stg1 /root/.local /root/.local\nENV PATH=/root/.local/bin:$PATH\n\nWORKDIR  /opt/mount/\nCOPY . .\n\nCMD [\"python3\", \"eval.py\"]\n\n```\n\nDockerfile.infer\n```Dockerfile\nFROM python:3.9.19-slim AS stg1\n\n    \nCOPY requirements.txt .\n\nRUN apt-get update -y \u0026\u0026 apt install -y --no-install-recommends git\\\n\u0026\u0026 pip install --no-cache-dir -U pip \\ \n    \u0026\u0026 pip install --user --no-cache-dir https://download.pytorch.org/whl/cpu/torch-1.11.0%2Bcpu-cp39-cp39-linux_x86_64.whl \\\n    \u0026\u0026 pip install --user --no-cache-dir https://download.pytorch.org/whl/cpu/torchvision-0.12.0%2Bcpu-cp39-cp39-linux_x86_64.whl \\\n    \u0026\u0026 pip install --user --no-cache-dir -r requirements.txt \u0026\u0026 rm -rf /root/.cache/pip\n\n# Stage 2: run application code\nFROM python:3.9.19-slim\n\nCOPY --from=stg1 /root/.local /root/.local\nENV PATH=/root/.local/bin:$PATH\n\nWORKDIR /opt/mount\n\nCOPY . .\n\nENTRYPOINT [\"python3\", \"infer.py\"]\n\n```\n\n## Build Container\n```sh\n# this command will look into docker-compose.yml file;\n#For each of the service it looks for dockerfile defined in build section and build the image\n# `--no-cache` flag will make sure docker will not use cache while building image.\n\ndocker compose build --no-cache \n```\n\n## Run Training Service\n- train service look for `checkpoint path` in `volume`, if it founds checkpoint file, then it will `load trained model` and `resume training`; and then save updated model in checkpoint location.\n- exit\n```sh\ndocker compose run train\n```\n\n## Run Evaluate Service\n- load model from `checkpoint location`, run evaluate and store evaluation metrics in json file.\n- exit\n```sh\ndocker compose run eval\n```\n\n## Run Inference Service\n- load model from `checkpoint location`.\n- run infernce on random 5 images, save the results (images with file name, predicted class and predicted probability)\n- exit\n```sh\ndocker compose run infer\n```\n\n- After running all the services, we can see model and results in volume.\n\nsample testing results (logs)\n```log\n✅  Train Run Successfully. ✅ \nclass_names ['bottomwear', 'eyewear', 'footwear', 'handbag', 'topwear']\nCheckpoint found loading from checkpoint\nPID 1  Validation Loss 1.0127081871032715  Validation acc 0.8125\n++++++++++++++++++++++++++++++++++++++++++++++++++\nPID 1  Validation Loss 15.888932093977928  Validation acc 9.0625\n++++++++++++++++++++++++++++++++++++++++++++++++++\nPID 1  Validation Loss 31.48503577709198  Validation acc 17.1875\n++++++++++++++++++++++++++++++++++++++++++++++++++\nepoch 0 PID 1  Validation Loss 1.5017421083016829  Validation acc 0.8236742424242425\n1.5017421083016829 0.8236742424242425\n{'test_loss': 1.5017421083016829, 'test_acc': 0.8236742424242425}\n✅  Evaluation Run Successfully. ✅ \nCheckpoint found loading from checkpoint\nModel loaded from checkpoints\nInference completed. Results saved in the 'results' folder.\n✅ Inference  finished  Successfully. ✅  \n🔍 Checking for checkpoint file...\n✅ Checkpoint file found.\n🔍 Checking for eval_results.json file...\n✅ eval_results.json file found.\n📄 Printing the content of eval_results.json file...\n{\"test_loss\": 1.5770455646243962, \"test_acc\": 0.8178030303030304}🔍 Checking for inference results...\n✅ 5 inference result images found.\n\n```\n\n## Results\n\n![Tshirt](responses/results/Men_Blue_Floral_Printed_Tropical_Pure_Cotton_T-shirt_topwear.png) ![shoes](responses/results/Men_Maroon_Mesh_Running_Non-Marking_Shoes_footwear.png)\n\n\n![EyeWear](responses/results/Unisex_Grey_Lens__amp__Silver-Toned_Aviator_Sunglasses_with_UV_Protected_Lens_eyewear.png)![Bottomwear](responses/results/Women_Blue_Pure_Cotton_High-Rise_Slash_Knee_Jeans_bottomwear.png)\n\n![Topwear](responses/results/Men_Red_Brand_Logo_Printed_Bomber_Track_Jacket_topwear.png)![Handbag](responses/results/Women_Pink_Structured_Satchel_handbag.png)\n\n![Footwear](responses/results/Women_Gold-Toned_Textured_Wedges_footwear.png)![Jacket](responses/results/Men_Olive_Green_liver_Colourblocked_Denim_Jacket_topwear.png)\n\n\n\n![Jacket](responses/results/Men_White_Classic_Pure_Cotton_Formal_Shirt_handbag.png)![Handbag](responses/results/White_Solid_Structured_Sling_Bag_handbag.png)\n\n\n## References\n- [docker-compose](https://docs.docker.com/compose/compose-application-model/)\n- [pytorch-mnist-hogwild](https://github.com/pytorch/examples/blob/main/mnist_hogwild/main.py)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcr21%2Fdocker-training-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcr21%2Fdocker-training-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcr21%2Fdocker-training-pytorch/lists"}