{"id":47184452,"url":"https://github.com/a5chin/ml-pipelines","last_synced_at":"2026-03-13T08:36:25.410Z","repository":{"id":317147016,"uuid":"1064805019","full_name":"a5chin/ml-pipelines","owner":"a5chin","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-08T21:36:43.000Z","size":757,"stargazers_count":1,"open_issues_count":10,"forks_count":0,"subscribers_count":0,"default_branch":"develop","last_synced_at":"2026-03-09T02:23:32.500Z","etag":null,"topics":["kubeflow","kubeflow-pipelines","mlops","python","python314","uv","vertex-ai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/a5chin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-26T15:21:09.000Z","updated_at":"2026-03-08T21:35:51.000Z","dependencies_parsed_at":"2025-09-29T07:18:16.606Z","dependency_job_id":"d3589fd4-8ed0-4273-aca8-293fdae5330d","html_url":"https://github.com/a5chin/ml-pipelines","commit_stats":null,"previous_names":["a5chin/ml-pipelines"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/a5chin/ml-pipelines","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a5chin%2Fml-pipelines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a5chin%2Fml-pipelines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a5chin%2Fml-pipelines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a5chin%2Fml-pipelines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/a5chin","download_url":"https://codeload.github.com/a5chin/ml-pipelines/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/a5chin%2Fml-pipelines/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30462529,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-13T06:34:02.089Z","status":"ssl_error","status_checked_at":"2026-03-13T06:33:49.182Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["kubeflow","kubeflow-pipelines","mlops","python","python314","uv","vertex-ai"],"created_at":"2026-03-13T08:36:24.732Z","updated_at":"2026-03-13T08:36:25.388Z","avatar_url":"https://github.com/a5chin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ML Pipelines Template\n\n\u003cdiv align=\"center\"\u003e\n\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![ty](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ty/main/assets/badge/v0.json)](https://github.com/astral-sh/ty)\n\n[![Versions](https://img.shields.io/badge/python-3.11%20|%203.12%20|%203.13%20|%203.14%20-green.svg)](https://www.python.org/downloads/)\n[![codecov](https://codecov.io/github/a5chin/ml-pipelines/graph/badge.svg?token=F87CNI6390)](https://codecov.io/github/a5chin/ml-pipelines)\n\n[![Docker](https://github.com/a5chin/ml-pipelines/actions/workflows/docker.yml/badge.svg)](https://github.com/a5chin/ml-pipelines/actions/workflows/docker.yml)\n[![Format](https://github.com/a5chin/ml-pipelines/actions/workflows/format.yml/badge.svg)](https://github.com/a5chin/ml-pipelines/actions/workflows/format.yml)\n[![Lint](https://github.com/a5chin/ml-pipelines/actions/workflows/lint.yml/badge.svg)](https://github.com/a5chin/ml-pipelines/actions/workflows/lint.yml)\n\n\u003c/div\u003e\n\n---\n\n## 📑 Table of Contents\n\n- [📋 Overview](#-overview)\n- [📦 Prerequisites](#-prerequisites)\n- [🚀 Getting Started](#-getting-started)\n- [📁 Project Structure](#-project-structure)\n- [🛠️ Development Commands](#️-development-commands)\n- [🏗️ Architecture Overview](#️-architecture-overview)\n- [➕ Adding New Pipelines](#-adding-new-pipelines)\n- [📚 Related Resources](#-related-resources)\n- [🤝 Contributing](#-contributing)\n- [📄 License](#-license)\n\n---\n\n## 📋 Overview\n\nThis is a production-ready template for building **Kubeflow Pipelines (KFP)** workflows with Python.\nIt provides a structured, scalable architecture for ML pipelines with containerized task execution, type-safe configuration, and comprehensive testing.\n\n### ✨ Key Features\n\n- 🔄 **Kubeflow Pipelines Integration**: Build, compile, and deploy KFP workflows\n- 🧩 **Task-Based Architecture**: Modular ML tasks (feature engineering, training, evaluation, inference, export)\n- 🌍 **Environment Management**: Multi-environment support (dev, prod) with isolated configurations\n- ⚡ **Modern Python Tooling**: Built with [uv](https://github.com/astral-sh/uv) and [Ruff](https://github.com/astral-sh/ruff)\n- 🔒 **Type Safety**: Full type hints with ty and Pydantic validation\n- 📝 **SQL Linting**: Automated SQL quality checks with [SQLFluff](https://github.com/sqlfluff/sqlfluff) for BigQuery\n- 🚀 **CI/CD Ready**: GitHub Actions workflows for testing, linting, and Docker builds\n\n## 📦 Prerequisites\n\n- 🐍 [Python 3.10+](https://www.python.org/downloads/) - Programming language\n- 📦 [uv](https://docs.astral.sh/uv/getting-started/installation/) - Fast Python package installer and resolver\n- 🐳 [Docker](https://docs.docker.com/get-docker/) - Container platform (for builds)\n- ☸️ [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/v2/installation/) - ML workflow orchestration platform\n\n\u003e 💡 **Quick Install uv**:\n\u003e ```bash\n\u003e # macOS/Linux\n\u003e curl -LsSf https://astral.sh/uv/install.sh | sh\n\u003e\n\u003e # Windows\n\u003e powershell -c \"irm https://astral.sh/uv/install.ps1 | iex\"\n\u003e ```\n\n## 🚀 Getting Started\n\n### 1️⃣ Install Dependencies\n\n```bash\nuv sync\n```\n\n### 2️⃣ Run Tests\n\n```bash\nuv run nox -s test\n```\n\n### 3️⃣ Compile a Pipeline\n\n```bash\nuv run nox -s compile_pipeline -- \\\n  --env dev \\\n  --pipeline_name sample-pipeline \\\n  --tag test \\\n  --model_type sample\n```\n\n## 📁 Project Structure\n\n```\n.\n├── const/                          # Shared enumerations\n│   ├── environment.py              # Environment enum (dev, prod)\n│   ├── model_type.py               # Model type enum (sample, ...)\n│   └── task.py                     # Task enum (feature_engineering, training, ...)\n├── environments/                   # Environment-specific settings\n│   ├── dev.py                      # Development environment config\n│   ├── prod.py                     # Production environment config\n│   └── settings.py                 # Settings loader\n├── pipelines/                      # KFP pipeline definitions\n│   ├── components.py               # KFP container components\n│   ├── graphs/                     # Pipeline graph definitions\n│   │   └── sample.py               # Sample pipeline graph\n│   ├── main.py                     # Pipeline compiler \u0026 uploader\n│   └── settings.py                 # Pipeline compilation settings\n├── tasks/                          # ML task implementations\n│   ├── base.py                     # BaseTask protocol\n│   ├── feature_engineering/        # Feature engineering task\n│   ├── training/                   # Model training task\n│   ├── evaluation/                 # Model evaluation task\n│   ├── inference/                  # Inference task\n│   └── export/                     # Export task\n├── tests/                          # Test suite (mirrors src structure)\n├── main.py                         # Task executor (runs inside KFP containers)\n├── noxfile.py                      # Task automation with Nox\n├── pyproject.toml                  # Project dependencies \u0026 metadata\n├── pytest.ini                      # Pytest configuration\n├── ruff.toml                       # Ruff linter configuration\n└── .sqlfluff                       # SQLFluff SQL linter configuration\n```\n\n**Key Files**:\n- [`main.py`](./main.py) - Entry point for task execution in containers\n- [`noxfile.py`](./noxfile.py) - Development task automation (test, lint, fmt, compile_pipeline)\n- [`pyproject.toml`](./pyproject.toml) - Project configuration and dependencies\n- [`.sqlfluff`](./.sqlfluff) - SQL linter configuration (BigQuery dialect)\n- [`CLAUDE.md`](./CLAUDE.md) - Architecture guide for Claude Code\n\n## 🛠️ Development Commands\n\n### 🧪 Testing\n```bash\n# Run all tests\nuv run nox -s test\n\n# Run specific test file\nuv run pytest tests/path/to/test__file.py\n\n# Run with JUnit XML output\nuv run nox -s test -- --junitxml=results.xml\n```\n\n### ✅ Code Quality\n```bash\n# Format code (Python)\nuv run nox -s fmt -- --ruff\n\n# Format SQL files\nuv run nox -s fmt -- --sqlfluff\n\n# Format all\nuv run nox -s fmt -- --ruff --sqlfluff\n\n# Run all linters\nuv run nox -s lint -- --ruff --sqlfluff --ty\n\n# Run individual linters\nuv run nox -s lint -- --ruff     # Python linting\nuv run nox -s lint -- --sqlfluff # SQL linting\nuv run nox -s lint -- --ty       # Type checking\n```\n\n### 🔧 Pipeline Development\n```bash\n# Compile and upload pipeline\nuv run nox -s compile_pipeline -- \\\n  --env \u003cdev|prod\u003e \\\n  --pipeline_name \u003cname\u003e \\\n  --tag \u003ctag\u003e \\\n  --model_type \u003csample|...\u003e\n```\n\n## 🏗️ Architecture Overview\n\nThis project uses a **dual-mode architecture**:\n\n1. **Pipeline Compilation Mode** (`pipelines/main.py`): Compiles KFP pipeline definitions to YAML and uploads to Kubeflow\n2. **Task Execution Mode** (`main.py`): Runs individual tasks inside KFP containers\n\n### 🔄 How It Works\n\n1. 📝 **Define tasks** in `tasks/\u003ctask_name\u003e/` with settings and run logic\n2. 🔗 **Create pipeline graphs** in `pipelines/graphs/` that chain tasks together\n3. 📋 **Register components**: tasks in `main.py` task_maps and pipelines in `pipelines/main.py` pipeline_types\n4. 📦 **Compile pipeline** with `compile_pipeline` - generates KFP YAML and uploads to registry\n5. ▶️ **Execute**: KFP runs pipeline - each component executes `main.py` with task-specific arguments in containers\n\n## ➕ Adding New Pipelines\n\n### Step-by-Step Guide\n\n#### 1️⃣ Define Model Type\nAdd your model type to [`const/model_type.py`](./const/model_type.py):\n```python\nclass ModelType(StrEnum):\n    \"\"\"Enumeration for different Model Types.\"\"\"\n\n    SAMPLE = \"sample\"\n    YOUR_MODEL = \"your_model\"  # ← Add this\n```\n\n#### 2️⃣ Create Pipeline Graph\nCreate a new file `pipelines/graphs/your_model.py`:\n```python\nfrom typing import TYPE_CHECKING\n\nfrom kfp import dsl\nfrom pipelines.components import (\n    evaluation,\n    feature_engineering,\n    training,\n    inference,\n    export,\n)\n\nif TYPE_CHECKING:\n    from kfp.dsl.graph_component import GraphComponent\n    from pipelines.settings import PipelineCompileArgs\n\n\ndef get_pipeline(args: PipelineCompileArgs) -\u003e GraphComponent:\n    \"\"\"Get your model pipeline.\n\n    Args:\n        args (PipelineCompileArgs): Pipeline arguments for compilation.\n\n    Returns:\n        GraphComponent: Pipeline Graph Component.\n    \"\"\"\n\n    @dsl.pipeline(name=args.pipeline_name)\n    def pipeline_def(execution_date: str) -\u003e None:\n        fe_task = feature_engineering(\n            image=args.image,\n            execution_date=execution_date,\n            model_type=args.model_type,\n        ).set_display_name(\"Feature Engineering\")\n\n        training_task = (\n            training(\n                image=args.image,\n                execution_date=execution_date,\n                model_type=args.model_type,\n            )\n            .after(fe_task)\n            .set_display_name(\"Train Model\")\n        )\n        # Add more tasks...\n\n    return pipeline_def\n```\n\n#### 3️⃣ Implement Tasks\nCreate task implementations in `tasks/\u003ctask_name\u003e/run.py`:\n```python\nfrom logging import getLogger\n\nfrom tasks.base import T_co\nfrom tasks.training.settings import TrainingSettings\n\nlogger = getLogger(__name__)\n\n\nclass TrainingTask:\n    \"\"\"Training Task.\"\"\"\n\n    def __init__(\n        self,\n        *args: tuple[T_co],\n        **kwargs: dict[str, T_co],\n    ) -\u003e None:\n        \"\"\"Initialize the Training Task.\"\"\"\n        self.settings = TrainingSettings()\n\n    def run(self) -\u003e None:\n        \"\"\"Run the Training Task.\"\"\"\n        logger.info(\"settings=%s\", self.settings)\n        # Your training logic here\n```\n\n#### 4️⃣ Register Components\n**Register tasks** in [`main.py`](./main.py):\n```python\ntask_maps: dict[ModelType, dict[Task, type[BaseTask]]] = {\n    ModelType.SAMPLE: {\n        Task.FEATURE_ENGINEERING: FeatureEngineeringTask,\n        Task.TRAINING: TrainingTask,\n        # ...\n    },\n    ModelType.YOUR_MODEL: {  # ← Add this\n        Task.TRAINING: YourTrainingTask,\n        # ...\n    },\n}\n```\n\n**Register pipeline** in [`pipelines/main.py`](./pipelines/main.py):\n```python\nfrom pipelines.graphs import sample, your_model\n\npipeline_types = {\n    ModelType.SAMPLE: sample.get_pipeline,\n    ModelType.YOUR_MODEL: your_model.get_pipeline,  # ← Add this\n}\n```\n\n#### 5️⃣ Compile \u0026 Deploy\n```bash\nuv run nox -s compile_pipeline -- \\\n  --env dev \\\n  --pipeline_name your-model-pipeline \\\n  --tag v1.0.0 \\\n  --model_type your_model\n```\n\n\u003e 💡 **Tip**: See [CLAUDE.md](./CLAUDE.md) for detailed architecture patterns and development guidelines.\n\n---\n\n## 📚 Related Resources\n\n### Official Documentation\n- 📘 [Kubeflow Pipelines v2](https://www.kubeflow.org/docs/components/pipelines/v2/) - KFP documentation\n- 📦 [uv Documentation](https://docs.astral.sh/uv/) - Python package manager\n- 🔍 [Ruff Documentation](https://docs.astral.sh/ruff/) - Linter and formatter\n- 📝 [SQLFluff Documentation](https://docs.sqlfluff.com/) - SQL linter and formatter\n- ✅ [ty](https://github.com/astral-sh/ty) - Static type checker\n- 🧪 [Pytest](https://docs.pytest.org/) - Testing framework\n- 🔧 [Nox](https://nox.thea.codes/) - Task automation tool\n\n### Kubeflow Pipelines\n- [KFP SDK Reference](https://kubeflow-pipelines.readthedocs.io/en/stable/) - Python SDK documentation\n- [Container Components Guide](https://www.kubeflow.org/docs/components/pipelines/v2/components/container-components/) - Building container-based components\n- [Pipeline Compilation](https://www.kubeflow.org/docs/components/pipelines/v2/compile-a-pipeline/) - Compiling pipelines to YAML\n\n### Python Libraries\n- [Pydantic](https://docs.pydantic.dev/) - Data validation using Python type annotations\n- [Pydantic Settings](https://docs.pydantic.dev/latest/concepts/pydantic_settings/) - Settings management from environment variables\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions! Please follow these steps:\n\n### Development Workflow\n\n1. 🍴 **Fork** the repository\n2. 📥 **Clone** your fork:\n   ```bash\n   git clone https://github.com/YOUR_USERNAME/ml-pipelines.git\n   cd ml-pipelines\n   ```\n3. 🌿 **Create** a feature branch:\n   ```bash\n   git checkout -b feature/amazing-feature\n   ```\n4. 📦 **Install** dependencies:\n   ```bash\n   uv sync\n   ```\n5. ✏️ **Make** your changes with tests\n6. 🎨 **Format** code:\n   ```bash\n   uv run nox -s fmt\n   ```\n7. 🔍 **Lint** code:\n   ```bash\n   uv run nox -s lint -- --ruff --ty\n   ```\n8. ✅ **Test** changes:\n   ```bash\n   uv run nox -s test\n   ```\n9. 💾 **Commit** your changes:\n   ```bash\n   git commit -m 'Add amazing feature'\n   ```\n10. 📤 **Push** to your branch:\n    ```bash\n    git push origin feature/amazing-feature\n    ```\n11. 📮 **Submit** a pull request\n\n### Code Standards\n\n- ✅ Maintain **75%+ test coverage** (enforced by pytest)\n- 🎨 Follow **Ruff** formatting and linting rules ([`ruff.toml`](./ruff.toml))\n- 📝 Follow **SQLFluff** SQL formatting rules ([`.sqlfluff`](./.sqlfluff))\n- 🔍 Pass **ty** type checking ([`ty.toml`](./ty.toml))\n- 📝 Write **clear commit messages**\n- 🧪 Add **tests** for new features\n- 📚 Update **documentation** as needed\n\n### Testing Naming Convention\n\nTest files must follow the `test__*.py` format (note the double underscore):\n- ✅ `test__base.py`\n- ✅ `test__training.py`\n- ❌ `test_base.py` (single underscore - won't be discovered)\n\n---\n\n## 📄 License\n\nThis project is licensed under the terms specified in the [LICENSE](./LICENSE) file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa5chin%2Fml-pipelines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fa5chin%2Fml-pipelines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fa5chin%2Fml-pipelines/lists"}