{"id":29777611,"url":"https://github.com/a920604a/stock-mlops","last_synced_at":"2026-04-14T03:32:18.275Z","repository":{"id":302368660,"uuid":"1012201040","full_name":"a920604a/stock-mlops","owner":"a920604a","description":null,"archived":false,"fork":false,"pushed_at":"2025-11-05T02:05:12.000Z","size":1665,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-05T03:22:43.391Z","etag":null,"topics":["celery","clickhouse","grafana","kafka","minio","mlflow","postgresql","prefect","prometheus","redis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/a920604a.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-02T01:26:14.000Z","updated_at":"2025-11-05T02:05:15.000Z","dependencies_parsed_at":"2025-07-16T10:36:29.011Z","dependency_job_id":"284b66b0-5dda-4e1a-93d3-86bf5ca975d0","html_url":"https://github.com/a920604a/stock-mlops","commit_stats":null,"previous_names":["a920604a/stock-mlops"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/a920604a/stock-mlops","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a920604a%2Fstock-mlops","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a920604a%2Fstock-mlops/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a920604a%2Fstock-mlops/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a920604a%2Fstock-mlops/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/a920604a","download_url":"https://codeload.github.com/a920604a/stock-mlops/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a920604a%2Fstock-mlops/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31781292,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T02:24:21.117Z","status":"ssl_error","status_checked_at":"2026-04-14T02:24:20.627Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["celery","clickhouse","grafana","kafka","minio","mlflow","postgresql","prefect","prometheus","redis"],"created_at":"2025-07-27T11:42:50.641Z","updated_at":"2026-04-14T03:32:18.270Z","avatar_url":"https://github.com/a920604a.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n# Stock Price Prediction with MLOps\n\n\n[繁體中文版](./readme_zh.md)\n## 🎯 Course Project\n\n### Objective\n\nThe goal of this project is to apply everything learned in the course to build an end-to-end machine learning system with full MLOps workflow.\n\n---\n\n## 📍 Problem Statement\n\nThis project aims to build a sustainable and maintainable stock price prediction system, implementing the complete MLOps lifecycle including data collection, feature engineering, model training, experiment tracking, real-time inference, deployment, and monitoring.\n\nUsers can query predicted stock prices and historical trend charts through a web interface. Developers can periodically retrain models, track experiments, monitor performance and data drift, and trigger auto-retraining.\n\n---\n\n## 🧰 Technologies Used\n\n| Category                   | Tools \u0026 Frameworks                                                |\n| -------------------------- | ----------------------------------------------------------------- |\n| **Cloud / Infra**          | Docker Compose (extendable to EC2), MinIO, PostgreSQL, ClickHouse |\n| **ML Pipeline**            | FastAPI, Scikit-learn, Pandas, MLflow                             |\n| **Workflow Orchestration** | Prefect 2                                                         |\n| **Monitoring**             | Evidently + Prometheus + Grafana                                  |\n| **CI/CD**                  | GitHub Actions                                                    |\n| **Testing**                | pytest (unit + integration tests)                                 |\n| **Formatting / Hooks**     | black, pre-commit, flake8                                         |\n| **IaC**                    | Docker Compose + Volume + Network (extendable to Terraform)       |\n\n---\n\n## 🏗️ Project Structure\n\n```\n.\n├── backend/                  # Backend with API, ML logic, workflows\n│   ├── api/                  # FastAPI routes (train, predict)\n│   ├── src/                  # Feature engineering, model training/inference\n│   ├── monitor/              # Monitoring logic using Evidently\n│   ├── tasks/                # Celery async tasks\n│   ├── workflows/            # Prefect ETL \u0026 training flows\n│   └── tests/                # Unit \u0026 integration tests\n├── frontend/                 # Frontend (Vite + React)\n├── data/, db/, pgdata/       # Data and DB initialization folders\n├── monitor/                  # Prometheus \u0026 Grafana configurations\n├── Dockerfile.*, docker-compose.yml\n├── Makefile, setup.md, implementation_log.md\n├── .github/                  # GitHub Actions configuration\n│   └── workflows/            # GitHub Actions CI/CD workflow\n├── .pre-commit-config.yaml   # Pre-commit configuration\n├── README.md\n```\n\n---\n\n## 🔁 Model Lifecycle\n\n1. ETL and training pipelines are triggered regularly via Prefect\n2. Training results are logged to MLflow and registered as versioned models\n3. FastAPI serves `/predict` and `/train` APIs (Celery-supported)\n4. Evidently exports model drift metrics to Prometheus\n5. Grafana dashboards visualize prediction accuracy, drift metrics, and system metrics\n\n---\n\n## 🖥️ System Architecture (Mermaid)\n\n```mermaid\ngraph TD\n  %% ------------------- User / Frontend -------------------\n  U[User Browser] --\u003e|HTTP/WS Requests| NG[Nginx\u003cbr\u003eStatic + Reverse Proxy]\n\n  subgraph Nginx_Proxy[\"Nginx Proxy\"]\n    NG --\u003e|/api/predict| UP1\n    NG --\u003e|/api/train| UP2\n    NG --\u003e|/api/| UP3\n    NG --\u003e|/ws| W\n    NG --\u003e|Static files\u003cbr\u003e/index.html, /js, /css...| Static[React Build]\n  end\n\n  %% ------------------- Upstream Pools -------------------\n  subgraph Upstream_Pools[\"Upstream Pools\"]\n    direction TB\n    UP1[\"backend_predict\u003cbr\u003e70% to backend1\u003cbr\u003e30% to backend2\"]\n    UP2[\"backend_train\u003cbr\u003e30% to backend1\u003cbr\u003e70% to backend2\"]\n    UP3[\"backend_api\u003cbr\u003e1:1 to backend1, backend2\"]\n  end\n\n  %% ------------------- Backend Containers -------------------\n  subgraph Backend_API[\"Backend API multiple containers\"]\n    B1[backend1:8000]\n    B2[backend2:8000]\n  end\n\n  UP1 --\u003e B1\n  UP1 --\u003e B2\n  UP2 --\u003e B1\n  UP2 --\u003e B2\n  UP3 --\u003e B1\n  UP3 --\u003e B2\n\n  %% ------------------- Data / ETL -------------------\n  subgraph Data_ETL[\"Data and ETL\"]\n    P[Prefect Workflow\u003cbr\u003ebackend/src/workflows] --\u003e|ETL processing| D1[(raw_db\u003cbr\u003ePostgreSQL)]\n    P --\u003e|Cleaned data| D2[(OLAP\u003cbr\u003eClickHouse)]\n  end\n\n  B1 --\u003e|Query cleaned data| D2\n  B2 --\u003e|Query cleaned data| D2\n  B1 --\u003e|Push task| E[Redis]\n  B2 --\u003e|Push task| E\n\n  %% ------------------- Model Training -------------------\n  subgraph Model_Training[\"Model Training \u0026 MLflow\"]\n    L[Celery Worker] --\u003e|Read cleaned data| D2\n    L --\u003e|Execute training| G[Model training logic]\n    G --\u003e|Model version management| H[MLflow Registry]\n    G --\u003e|Update model metadata| D3[(mlflow-db\u003cbr\u003ePostgreSQL)]\n    H --\u003e|Model Artifact| S[(MinIO\u003cbr\u003eModel storage)]\n    H --\u003e D4[(mlflow internal DB\u003cbr\u003ePostgreSQL)]\n  end\n\n  %% ------------------- Monitoring -------------------\n  subgraph Monitoring[\"Monitoring \u0026 Real-time Push\"]\n    W[ws_monitor\u003cbr\u003eKafka Consumer + WebSocket]\n    Q[metrics_publisher\u003cbr\u003eFetch \u0026 send to Kafka every 5s]\n    N1[Kafka - prediction topic] --\u003e|Prediction result| W\n    N2[Kafka - metrics topic]\n    Q --\u003e N2\n    N2 --\u003e|Metrics| W\n    J[Prometheus]\n    J --\u003e|Historical data| K[Grafana Dashboard]\n  end\n\n  %% ------------------- Async Queue -------------------\n  subgraph Async_Tasks[\"Async Task Queue\"]\n    E --\u003e |Execute| L\n  end\n\n  %% ------------------- Styles -------------------\n  classDef frontend fill:#FFD966,stroke:#333,stroke-width:2px;\n  classDef nginx fill:#FFB347,stroke:#333,stroke-width:2px;\n  classDef upstream fill:#85C1E9,stroke:#333,stroke-width:2px;\n  classDef backend fill:#ABEBC6,stroke:#333,stroke-width:2px;\n  classDef db fill:#F9E79F,stroke:#333,stroke-width:2px;\n  classDef cache fill:#F5B7B1,stroke:#333,stroke-width:2px;\n  classDef mlflow fill:#D7BDE2,stroke:#333,stroke-width:2px;\n  classDef monitoring fill:#FAD7A0,stroke:#333,stroke-width:2px;\n  classDef prom fill:#D5F5E3,stroke:#333,stroke-width:2px;\n\n  class U frontend\n  class NG,Static nginx\n  class UP1,UP2,UP3 upstream\n  class B1,B2 backend\n  class D1,D2,D3,D4,S db\n  class E,L,M cache\n  class G,H mlflow\n  class W,Q,N1,N2 monitoring\n  class J,K prom\n\n\n```\n\n- Visual diagram of the Docker Compose services\n```mermaid\ngraph TD\n  subgraph Users\n    A[Browser]\n  end\n\n  subgraph Frontend\n    B[Vite + React]\n  end\n\n  subgraph Backend\n    C[FastAPI API]\n    D[Model Training / Inference]\n    E[Celery Worker]\n    F[Prefect Flows]\n  end\n\n  subgraph Storage\n    G[PostgreSQL as raw_db]\n    H[ClickHouse as cleaned data]\n    I[MinIO as Model Artifacts]\n    J[MLflow as Tracking DB]\n  end\n\n  subgraph Monitoring\n    K[Prometheus]\n    L[Grafana]\n    M[Evidently]\n  end\n\n  subgraph Messaging\n    N[Kafka]\n    O[Redis]\n  end\n\n  subgraph CI/CD\n    P[GitHub Actions]\n  end\n\n  A --\u003e B\n  B --\u003e C\n  C --\u003e D\n  D --\u003e E\n  E --\u003e G\n  E --\u003e H\n  D --\u003e J\n  D --\u003e I\n  F --\u003e G\n  F --\u003e H\n  M --\u003e K\n  K --\u003e L\n  D --\u003e N\n  M --\u003e N\n  E --\u003e O\n  P --\u003e|CI/CD| C\n\n```\n\n---\n\n## 📈 Evaluation Checklist\n\n### ✅ Problem Definition\n\n* ✔️ Well-defined scope: stock prediction + model lifecycle\n\n### ☁️ Infrastructure\n\n* ✔️ Docker Compose setup with multiple services\n* ✔️ IaC-friendly (MinIO, DB volumes, Prometheus)\n\n### 🔬 Experiment Tracking\n\n* ✔️ MLflow for logging experiments and model versioning\n  - [here](backend/src/model_training/train.py)\n\n### 📅 Workflow Orchestration\n\n* ✔️ Prefect 2 for ETL and training flows\n   - [here](./backend/workflows/etl_core.py)\n\n### 🚀 Model Deployment\n\n* ✔️ FastAPI for model inference (containerized API)\n\n### 📊 Monitoring\n\n* ✔️ Evidently + Prometheus + Grafana for data/model monitoring\n    - [docker-compose.monitor.yml](./docker-compose.monitor.yml)\n    - [docker-compose.kafka.yml](./docker-compose.kafka.yml)\n\n* [Webhook to discord](./.github/workflows/cd-deploy.yml)\n\n### 🔁 Reproducibility\n\n* ✔️ Makefile + setup.md + requirements + Docker for consistent setup\n    ```\n    make dev-setup\n    ```\n\n### 🧪 Best Practices\n\n* [x] Unit tests\n    - [train unit test code](./backend/tests/test_train.py)\n    - [predict unit test code](./backend/tests/test_predict.py)\n* [x] Integration tests\n    - [predict api test code](backend/integraton-test/test_predict_api.py)\n    - [train api test code](backend/integraton-test/test_train_api.py)\n* [x] Code formatting (black, flake8)\n    - [refer to pre-commit-config.yaml](.pre-commit-config.yaml)\n* [x] Makefile automation\n    - [refer to Makefile](./Makefile)\n* [x] Pre-commit hooks\n    - [refer to pre-commit-config.yaml](.pre-commit-config.yaml)\n* [x] GitHub Actions for CI\n    - [refer to .github/workflows/ci-tests.yml](.github/workflows/ci-tests.yml)\n    - [refer to .github/workflows/cd-deploy.yml](.github/workflows/cd-deploy.yml)\n\n---\n\n## ⚙️ Installation Guide\n\n```bash\n# Create virtual environment\npython -m venv .venv\nsource .venv/bin/activate\npip install -r backend/requirements.txt\n\n# Start all services\ndocker compose up --build\n\n# Run Prefect workflow or one-off training\nmake train\nmake workflow\n```\n\n---\n\n## 📊 Dataset\n\nHistorical stock data from TW \u0026 US markets (e.g., 2330.TW, AAPL, TSM):\n\n* Source: Yahoo Finance\n* Transformed via ETL and stored in Parquet format (see `workflows/parquet/`)\n\n---\n\n## 🔗 Useful Resources\n\n* [MLFlow Documentation](https://mlflow.org/)\n* [Evidently AI Docs](https://docs.evidentlyai.com/)\n* [Prefect 2 Docs](https://docs.prefect.io/)\n* [Grafana Dashboards](https://grafana.com/grafana/dashboards)\n\n---\n\n## 📜 License\n\nMIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa920604a%2Fstock-mlops","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fa920604a%2Fstock-mlops","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa920604a%2Fstock-mlops/lists"}