{"id":15664281,"url":"https://github.com/deep-diver/gitmlops-test1","last_synced_at":"2026-03-18T17:38:45.885Z","repository":{"id":40697804,"uuid":"507567918","full_name":"deep-diver/gitmlops-test1","owner":"deep-diver","description":null,"archived":false,"fork":false,"pushed_at":"2022-06-26T13:32:41.000Z","size":85756,"stargazers_count":1,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-23T06:26:19.681Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deep-diver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-26T12:26:27.000Z","updated_at":"2022-06-27T02:12:35.000Z","dependencies_parsed_at":"2022-09-02T05:41:46.704Z","dependency_job_id":null,"html_url":"https://github.com/deep-diver/gitmlops-test1","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":"codingpot/git-mlops","purl":"pkg:github/deep-diver/gitmlops-test1","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deep-diver%2Fgitmlops-test1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deep-diver%2Fgitmlops-test1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deep-diver%2Fgitmlops-test1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deep-diver%2Fgitmlops-test1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deep-diver","download_url":"https://codeload.github.com/deep-diver/gitmlops-test1/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deep-diver%2Fgitmlops-test1/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29133397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-05T19:36:52.185Z","status":"ssl_error","status_checked_at":"2026-02-05T19:35:40.941Z","response_time":65,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T13:41:55.212Z","updated_at":"2026-02-05T20:33:15.524Z","avatar_url":"https://github.com/deep-diver.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Git Based MLOps\n\n\u003cimg src='https://svgshare.com/i/htr.svg' title='git-mlops-overview' /\u003e\n\nThis project shows how to realize MLOps in Git/GitHub. In order to achieve this aim, this project heavily leverages the toolse such as [DVC](https://dvc.org/), [DVC Studio](https://studio.iterative.ai/), [DVCLive](https://dvc.org/doc/dvclive) - all products built by [iterative.ai](https://iterative.ai/), [Google Drive](https://www.google.com/drive/), [Jarvislabs.ai](https://jarvislabs.ai/), and [HuggingFace Hub](https://github.com/huggingface/huggingface_hub).\n\n## Instructions\n\n### Prior work\n\n1. Click \"Use this template\" button to create your own repository\n2. Wait for few seconds, then `Initial Setup` PR will be automatically created\n3. Merge the PR, and you are good to go\n\n### Basic setup\n\n0. Run `pip install -r requirements.txt` ([requirements.txt](https://github.com/codingpot/git-mlops/blob/main/requirements.txt))\n1. Run `dvc init` to enable DVC\n2. Add your data under `data` directory\n3. Run `git rm -r --cached 'data' \u0026\u0026 git commit -m \"stop tracking data\"`\n4. Run `dvc add [ADDED FILE OR DIRECTORY]` to track your data with DVC\n5. Run `dvc remote add -d gdrive_storage gdrive://[ID of specific folder in gdrive]` to add Google Drive as the remote data storage\n6. Run `dvc push`, then URL to auth is provided. Copy and paste it to the browser, and autheticate\n7. Copy the content of `.dvc/tmp/gdrive-user-credentials.json` and put it as in [GitHub Secret](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository) with the name of `GDRIVE_CREDENTIAL`\n8. Run `git add . \u0026\u0026 git commit -m \"initial commit\" \u0026\u0026 git push origin main` to keep the initial setup\n9. Write your own pipeline under `pipeline` directory. Codes for basic image classification in TensorFlow are provided initially.\n10. Run the following `dvc stage add` for training stage\n```bash\n# if you want to use Iterative Studio / DVCLive for tracking training progress\n$ dvc stage add -n train \\\n                -p train.train_size,train.batch_size,train.epoch,train.lr \\\n                -d pipeline/modeling.py -d pipeline/train.py -d data \\\n                --plots-no-cache dvclive/scalars/train/loss.tsv \\\n                --plots-no-cache dvclive/scalars/train/sparse_categorical_accuracy.tsv \\\n                --plots-no-cache dvclive/scalars/eval/loss.tsv \\\n                --plots-no-cache dvclive/scalars/eval/sparse_categorical_accuracy.tsv \\\n                -o outputs/model \\\n                python pipeline/train.py outputs/model\n\n# if you want to use W\u0026B for tracking training progress\n$ dvc stage add -n train \\\n                -p train.train_size,train.batch_size,train.epoch,train.lr \\\n                -d pipeline/modeling.py -d pipeline/train.py -d data \\\n                -o outputs/model \\\n                python pipeline/train.py outputs/model\n```\n11. Run the following `dvc stage add` for evaluate stage\n```bash\n# if you want to use Iterative Studio / DVCLive for tracking training progress\n$ dvc stage add -n evaluate \\\n                -p evaluate.test,evaluate.batch_size \\\n                -d pipeline/evaluate.py -d data/test -d outputs/model \\\n                -M outputs/metrics.json \\\n                python pipeline/evaluate.py outputs/model\n\n# if you want to use W\u0026B for tracking training progress\n$ dvc stage add -n evaluate \\\n                -p evaluate.test,evaluate.batch_size \\\n                -d pipeline/evaluate.py -d data/test -d outputs/model \\\n                python pipeline/evaluate.py outputs/model\n```\n12. Update `params.yaml` as you need.\n13. Run `git add . \u0026\u0026 git commit -m \"add initial pipeline setup\" \u0026\u0026 git push origin main`\n14. Run `dvc repro` to run the pipeline initially\n15. Run `dvc add outputs/model.tar.gz` to add compressed version of model \n16. Run `dvc push outputs/model.tar.gz`\n17. Run `echo \"/pipeline/__pycache__\" \u003e\u003e .gitignore` to ignore unnecessary directory\n18. Run `git add . \u0026\u0026 git commit -m \"add initial pipeline run\" \u0026\u0026 git push origin main`\n19. Add access token and user email of [JarvisLabs.ai](https://jarvislabs.ai/) to GitHub Secret as `JARVISLABS_ACCESS_TOKEN` and `JARVISLABS_USER_EMAIL`\n20. Add GitHub access token to GitHub Secret as `GH_ACCESS_TOKEN`\n21. Create a PR and write `#train` as in comment (you have to be the onwer of the repo)\n\n### W\u0026B Integration Setup\n\n1. Add W\u0026B's project name to GitHub Secret as `WANDB_PROJECT`\n2. Add W\u0026B's API KEY to GitHub Secret as `WANDB_API_KEY`\n\n### HuggingFace Integration Setup\n\n1. Add access token of HugginFace to GitHub Secret as `HF_AT`\n2. Add username of HugginfFace to GitHub Secret as `HF_USER_ID`\n3. Write `#deploy-hf` in comment of PR you want to deploy to HuggingFace Space\n   - GitHub Action assumes your model is archieved as `model.tar.gz` under `outputs` directory\n   - Algo GitHub Action assumes your HuggingFace Space app is written in [Gradio](https://gradio.app/) under `hf-space` directory. You need to change [`app_template.py`](https://github.com/codingpot/git-mlops/blob/main/hf-space/app_template.py) as you need(you shouldn't remove any environment variables in the file).\n\n## TODO\n\n- [X] Write solid steps to reproduce this repo for other tasks \n- [X] Deploy experimental model to [HF Space](https://huggingface.co/spaces)\n- [ ] Deploy current model to [GKE](https://cloud.google.com/kubernetes-engine) with [auto TFServing deployment project](https://github.com/deep-diver/ml-deployment-k8s-tfserving)\n- [ ] Add more cloud providers offering GPU VMs\n  - [X] [JarvisLabs.ai](https://jarvislabs.ai/)\n  - [ ] [DataCrunch.io](https://datacrunch.io/)\n  - [ ] [GCP Vertex AI Training](https://cloud.google.com/vertex-ai#section-9)\n- [ ] Integrate more managed services for management\n  - [ ] [W\u0026B Artifact](https://wandb.ai/site) for dataset/model versioning and experiment tracking\n  - [ ] [HugginfFace](https://huggingface.co) for dataset/model versioning\n- [ ] Integrate more managed services for deployment\n  - [ ] [AKS](https://docs.microsoft.com/en-us/azure/aks/)\n  - [ ] [EKS](https://aws.amazon.com/ko/eks/)\n  - [ ] [App Engine](https://cloud.google.com/appengine/)\n  - [ ] [AWS Lambda](https://aws.amazon.com/ko/lambda/)\n- [ ] Add more example codebase (pipeline)\n  - [ ] TensorFlow based Object Detection \n  - [ ] PyTorch based Image Classification\n  - [ ] HuggingFace Transformers\n\n## Brief description of each tools\n\n- **DVC(Data Version Control)**: Manages data in somewhere else(i.e. cloud storage) while keeping the version and remote information in metadata file in Git repository.\n- **DVCLive**: Provides callbacks for ML framework(i.e. TensorFlow, Keras) to record metrics during training in tsv format. \n- **DVC Studio**: Visuallize the metrics from files in Git repository. What to visuallize is recorded in `dvc.yaml`.\n- **Google Drive**: Is used as a remote data repository. However, you can use others such as AWS S3, Google Cloud Storage, or your own file server.\n- **Jarvislabs.ai**:  Is used to provision cloud GPU VM instances to conduct each experiments. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeep-diver%2Fgitmlops-test1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeep-diver%2Fgitmlops-test1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeep-diver%2Fgitmlops-test1/lists"}