{"id":20041680,"url":"https://github.com/ajithvcoder/emlo4-session-06-ajithvcoder","last_synced_at":"2025-07-30T14:33:20.086Z","repository":{"id":258467733,"uuid":"873980270","full_name":"ajithvcoder/emlo4-session-06-ajithvcoder","owner":"ajithvcoder","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-19T05:21:20.000Z","size":1072,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-19T06:12:13.301Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ajithvcoder.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-17T03:45:28.000Z","updated_at":"2024-10-19T05:21:23.000Z","dependencies_parsed_at":"2024-10-20T14:40:35.686Z","dependency_job_id":null,"html_url":"https://github.com/ajithvcoder/emlo4-session-06-ajithvcoder","commit_stats":null,"previous_names":["ajithvcoder/emlo4-session-06-ajithvcoder"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajithvcoder%2Femlo4-session-06-ajithvcoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajithvcoder%2Femlo4-session-06-ajithvcoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajithvcoder%2Femlo4-session-06-ajithvcoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajithvcoder%2Femlo4-session-06-ajithvcoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ajithvcoder","download_url":"https://codeload.github.com/ajithvcoder/emlo4-session-06-ajithvcoder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241470401,"owners_count":19968041,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T10:47:27.574Z","updated_at":"2025-03-02T07:14:20.414Z","avatar_url":"https://github.com/ajithvcoder.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## EMLOV4-Session-06 Assignment - Data Version Control\n\n### Contents\n\n**Note: I have completed the optional assignment of integrating comet-ml**\n\n- [Requirements](#requirements)\n- [Development Method](#development-method)\n    - [DVC Integration with Google Cloud Storage](#dvc-integration-with-google-cloud-storage)\n    - [Integrate Comet ML](#integrate-comet-ml)\n    - [Github Actions with DVC Pipeline for training](#github-actions-with-dvc-pipeline-for-training)\n    - [Train-Test-Infer-Comment-CML](#train-test-infer-comment-cml)\n- [Learnings](#learnings)\n- [Results Screenshots](#results)\n\n### Requirements\n\n- Start with your repository from last session\n- Add this dataset: https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip.\n- Add DVC Integration with Google Drive\n- Integrate CometML for logging\n- Create a Github Actions with DVC Pipeline for training\n- Train any ViT model for 5 epochs\n- Here are the Plots you will show\n    - train/acc and val/acc in one plot\n    - train/loss and val/loss in one plot\n    - Confusion Matrix for test dataset and train dataset as image plot\n- Infer on 10 images from test dataset and display the prediction, target along with image in results.md.\n    - You’ll be using your infer.py. script for this\n    - You can save the images in the predictions folder and then add them to the results.md.\n- Change Model to pretrained and create a PR\n\n**Optional Assignment**\n\n- Integrate `CometML` for logging\n\n### Development Method\n\n#### Build Command\n\n**Debug Commands for development**\n\n```docker build -t light_train_test -f ./Dockerfile .```\n\n```docker run -d -v /workspace/emlo4-session-06-ajithvcoder/:/workspace/ light_train_test```\n\n```docker exec -it \u003cc511d4e6ed1a9ca6933c67f02632a2\u003e /bin/bash```\n\n**Train Test Infer Commands**\n\nInstall\n\n```uv sync --extra-index-url https://download.pytorch.org/whl/cpu ```\n\nPull data from cloud\n\n```dvc pull -r myremote1```\n\nTrigger workflow\n\n```dvc repro```\n\nComment in PR or commit\n```cml comment create report.md```\n\n### DVC Integration with Google Cloud Storage\n\n- Follow first point in the `Using service account method `metioned here https://dvc.org/doc/user-guide/data-management/remote-storage/google-drive#using-service-accounts\n- Store the api key in local folder as `credentials.json` but dont commit it to github. if u do so github will raise a warning but inturn google notifies it\nand revokes the credentials.\n- Better to give `owner` permission/`storage admin` permission to the user account \n- Create a folder in google bucket service and get the url for example - `gs://dvcmanager/storage` where `dvcmanager` is bucket name and `storage` is folder name\n- After structuring the train and test images in data folder\n- Run ```dvc init```\n- Now run `dvc remote add -d myremote gs://\u003cmybucket\u003e/\u003cpath\u003e` command. Reference https://dvc.org/doc/user-guide/data-management/remote-storage/google-cloud-storage\n- Run ```dvc add data```\n- Run ```dvc push -r myremote1```\n- Wait for 10 minutes as its 800 MB and if its in github actions wait for 15 minutes.\n- Now add data.yml each and every step using ```dvc stage add``` command\n\n**Add Train, test, infer, report_generation stages**\n\n- `dvc stage add -f -n train -d configs/experiment/catdog_ex.yaml -d src/train.py -d data/cat_dog_medium python src/train.py --config-name=train experiment=catdog_ex trainer.max_epochs=5`\n\n- `dvc stage add -f -n test -d configs/experiment/catdog_ex_eval.yaml -d src/eval.py  python src/eval.py --config-name=eval experiment=catdog_ex_eval.yaml `\n\n- `dvc stage add -f -n infer -d configs/experiment/catdog_ex_eval.yaml -d src/infer.py python src/infer.py --config-name=infer experiment=catdog_ex_eval.yaml` \n\n- `dvc stage add -n report_genration python scripts/metrics_fetch.py`\n\n- You would have generated a `dvc.yaml` file, `data.dvc` file and `dvc.lock` file push all these to github\n\n\n### Integrate Comet ML\n\n- Comet-ML is already inegrated with pytorch lighting so we just need to add config files in \"logger\" folder and use proper api key for it.\n\n\n\n### Github Actions with DVC Pipeline for training\n\n- setup cml, uv packages using github actions and install `python=3.12`\n- Copy the contents of credentials.json and store in github reprository secrets with name `GDRIVE_CREDENTIALS_DATA`\n\n### Train-Test-Infer-Comment-CML\n\n**Debugging and development**\n\nUse a subset of train and test set for faster debugging and development. Also u can reduce the configs of model to generate a `custom 3 million param vit model`. I have reduced from 5 million params to 3 million params by using the config. However to run the pretrained model we can change this config.\n\n**Overall Run**\n- `dvc repro`\n\n**Train**\n- `dvc repro train`\n\n**Test**\n- `dvc repro test`\n\n**Infer**\n- `dvc repro infer`\n\n**Create CML report**\n\n- Install cml pacakge\n- `python scripts/metrics_fetch.py` will fetch the necessary files needed for report and place it in root folder\n- `report_gen.sh`collects and appends every metric to readme file\n- cml tool is used to comment in github and it internally uses github token to authorize\n\n\n### CI Pipeline Development\n\n- Using GitHub Actions and the `dvc-pipeline.yml`, we are running all above actions and it could be triggered both manually and on pull request given to main branch\n\n\n### Learnings\n\n- Learnt about DVC tool usage, Comet ml, and cml\n\n### Results\n\n**Comet-ML Dashboard**\n\n![comet ml dashboard](./assets/snap_comet_ml.png)\n\n**Work flow success on main branch**\n\nRun details - [here](https://github.com/ajithvcoder/emlo4-session-06-ajithvcoder/actions/runs/11419499613)\n\n![main workflow](./assets/snap_main_workflow.png)\n\n**Work flow success run on PR branch**\n\nRun details - [here](https://github.com/ajithvcoder/emlo4-session-06-ajithvcoder/actions/runs/11419924207)\n\nPull request - [here](https://github.com/ajithvcoder/emlo4-session-06-ajithvcoder/pull/2)\n\n![pr triggered workflow](./assets/snap_pr_testing.png)\n\n**Comments from cml with plots and 10 infer images**\n\nDetails - [here](https://github.com/ajithvcoder/emlo4-session-06-ajithvcoder/pull/2#issuecomment-2424194445)\n\n![cml comment](./assets/snap_cml.png)\n\n\nNote: I used Google cloud Storage bucket for this project as it was faster than gdrive and its paid one so after successfully completing this assignment i am going to remove it. So you need to do the cloud setup again for re-running this experiment.\n\n### Group Members\n\n1. Ajith Kumar V (myself)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajithvcoder%2Femlo4-session-06-ajithvcoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fajithvcoder%2Femlo4-session-06-ajithvcoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajithvcoder%2Femlo4-session-06-ajithvcoder/lists"}