{"id":24573921,"url":"https://github.com/hadii-tech/mlops-blueprint","last_synced_at":"2025-09-06T15:34:46.628Z","repository":{"id":271646394,"uuid":"914128926","full_name":"hadii-tech/mlops-blueprint","owner":"hadii-tech","description":"Modern mlops template for machine learning projects","archived":false,"fork":false,"pushed_at":"2025-02-11T18:37:02.000Z","size":35,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-02-11T19:36:45.145Z","etag":null,"topics":["cloud","gitops","machine-learning","mlops"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hadii-tech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-09T02:07:32.000Z","updated_at":"2025-02-11T18:37:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"de7b72ef-6d1b-4e81-9585-6c65dbcc8abb","html_url":"https://github.com/hadii-tech/mlops-blueprint","commit_stats":null,"previous_names":["hadii-tech/mlops-template"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadii-tech%2Fmlops-blueprint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadii-tech%2Fmlops-blueprint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadii-tech%2Fmlops-blueprint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadii-tech%2Fmlops-blueprint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hadii-tech","download_url":"https://codeload.github.com/hadii-tech/mlops-blueprint/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244022724,"owners_count":20385134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud","gitops","machine-learning","mlops"],"created_at":"2025-01-23T20:39:57.670Z","updated_at":"2025-03-17T11:15:24.917Z","avatar_url":"https://github.com/hadii-tech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Modern MLOps Blueprint\n\nThis project demonstrates a **multi-stage** ML system designed to detect anamolous GitHub Pull Requests. It incorporates **modern cloud-native** technologies tested on Digital Ocean:\n\n- **Vault** for secure secret management.\n- **GitHub Actions** for CI/CD (building, ephemeral testing).\n- **Argo CD** (GitOps) for continuous deployment to **DigitalOcean** Kubernetes clusters.\n- **Argo Rollouts** for blue/green deployment on the model-serving stage.\n- **Spark** for distributed preprocessing.\n- **MLflow** for experiment tracking and model versioning.\n- **Locust** for performance testing.\n- **MongoDB** and **DigitalOcean Spaces** for data storage.\n\n\nCheckout this [blog post](https://mfadhel.com/mlops-blueprint/) outlining the motivation for this repository. \n\nYou will need existing k8s clusters to to execute the machine learning pipelines in this repo (preferably one per environment) which are deployed via ArgoCD. If you need help with this requirement, check out our [existing templates](https://github.com/hadii-tech/cloud-infra) to quickly setup and deploy your own production-ready k8s clusters pre-configred with monitoring, logging, and alerting capabilities in Digital Ocean.\n\n---\n\n##  Stages of the Pipeline\n\n1. **Data-Fetch** (Job)  \n   - Runs `fetch_github_data.py`.  \n   - Pulls secrets from Vault (GitHub token, Mongo URI).  \n   - Upserts PR data into MongoDB.\n\n2. **Spark Preprocess** (Job)  \n   - Runs `spark_preprocess.py`.  \n   - Uses **Spark** to read from Mongo, create features, stores as parquet files in **DigitalOcean Spaces**.  \n   - Secrets for DO Spaces, etc., come from Vault.\n\n3. **ML Training** (Job)  \n   - Runs `train_autoencoder.py`. A simple feedforward autoencoder with:\n      - Encoder: Compresses input data into a lower-dimensional representation.\n      - Decoder: Attempts to reconstruct the input from the compressed representation.\n      - A threshold-based anomaly detection method is used:\n           - Anomalies are detected when reconstruction errors exceed `mean + 2 * std_dev`.\n   - Uses **MLflow** for experiment tracking; references preprocessed data from DO Spaces.  \n   - Also references secrets from Vault (like `mlflow_uri`).\n\n4. **Model-Serving** (**Blue/Green** with Argo Rollouts)  \n   - A Flask-based ML model serving API that is designed to detect anomalies in pull requests, it is a **long-running** service.  \n   - We define a **Rollout** with `blueGreen` strategy in `model-serving-rollout.yaml`.  \n   - We can do a “preview” environment, then “promote” to active service for minimal downtime.  \n   - We run **Locust** performance tests in ephemeral containers to confirm throughput, latency, etc.\n\n---\n\n## Multi-Branch Approach (Staging \u0026 Production)\n\nWe use **two** main branches:\n\n1. **staging**  \n2. **main** (production)\n\nEach environment references the **same** `argo-apps/base` folder but **different** branches. Thus:\n- The **staging** cluster’s Argo CD points to the **staging** branch.  \n- The **production** cluster’s Argo CD points to the **main** branch.  \n\nA typical workflow:\n1. **Developer** merges changes into **staging** → triggers ephemeral container tests + staging cluster update.\n2. Once validated in staging, we **merge staging → main** → triggers ephemeral tests + production cluster update.\n\nNo separate “overlays/staging” or “overlays/production” in a single branch are needed. Instead, each environment is captured by its dedicated branch.\n\n---\n\n## CI/CD with GitHub Actions\n\nWe define **four** primary workflow files (one per pipeline stage):\n\n1. **data-fetch.yml**\n2. **spark-preprocess.yml**\n3. **ml-training.yml**\n4. **model-serving.yml**\n\nEach workflow:\n\n- **Lint** (flake8) + **Unit Tests** (pytest).\n- **Build** the Docker image if that stage’s code changed.\n- **Spin up** an **ephemeral container** for integration tests (e.g., checking logs, or hitting an endpoint).\n- If tests **pass**, we **push** the Docker image to **DigitalOcean Container Registry**.\n- We then **update** the environment (staging or main) references in `argo-apps/base/*.yaml` so Argo CD sees it.\n\n### Single-Build / Reuse of Images\n\nWe **build once**, ephemeral-test that image in the pipeline, then reuse the **identical** image for both staging and production. That ensures environment parity: production runs the same artifact tested in staging—no separate rebuild.\n\n---\n\n## 4. Vault for Secrets\n\nEach Python code references environment variables like:\n\n- `VAULT_ADDR`\n- `VAULT_PATH_AUTH`\n- `VAULT_ROLE`\n- `VAULT_SECRET_PATH`\n\n… then uses **hvac** to authenticate with the **Kubernetes auth** method. Actual secrets (e.g., `mongo_uri`, `github_token`, `mlflow_uri`, etc.) reside in Vault under those paths, ensuring we never store secrets in environment variables or Git.\n\n---\n## Blue-Green for Model-Serving\n\nInstead of a standard Deployment, we use:\n\n```yaml\napiVersion: argoproj.io/v1alpha1\nkind: Rollout\nmetadata:\n  name: model-serving\n  labels:\n    app: model-serving\nspec:\n  strategy:\n    blueGreen:\n      activeService: model-serving-active\n      previewService: model-serving-preview\n      autoPromotionEnabled: false\n      # autoPromotionSeconds: 30 # if you want auto promote\n```\nThis approach ensures no downtime when updating the serving container. We keep the old version active while spinning up the new version in “preview” mode. After validation, we promote the new version.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadii-tech%2Fmlops-blueprint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhadii-tech%2Fmlops-blueprint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadii-tech%2Fmlops-blueprint/lists"}