https://github.com/a5chin/ml-pipelines
https://github.com/a5chin/ml-pipelines
kubeflow kubeflow-pipelines mlops python python314 uv vertex-ai
Last synced: 15 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/a5chin/ml-pipelines
- Owner: a5chin
- License: mit
- Created: 2025-09-26T15:21:09.000Z (6 months ago)
- Default Branch: develop
- Last Pushed: 2026-03-08T21:36:43.000Z (19 days ago)
- Last Synced: 2026-03-09T02:23:32.500Z (19 days ago)
- Topics: kubeflow, kubeflow-pipelines, mlops, python, python314, uv, vertex-ai
- Language: Python
- Homepage:
- Size: 739 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# ML Pipelines Template
[](https://github.com/astral-sh/uv)
[](https://github.com/astral-sh/ruff)
[](https://github.com/astral-sh/ty)
[](https://www.python.org/downloads/)
[](https://codecov.io/github/a5chin/ml-pipelines)
[](https://github.com/a5chin/ml-pipelines/actions/workflows/docker.yml)
[](https://github.com/a5chin/ml-pipelines/actions/workflows/format.yml)
[](https://github.com/a5chin/ml-pipelines/actions/workflows/lint.yml)
---
## ๐ Table of Contents
- [๐ Overview](#-overview)
- [๐ฆ Prerequisites](#-prerequisites)
- [๐ Getting Started](#-getting-started)
- [๐ Project Structure](#-project-structure)
- [๐ ๏ธ Development Commands](#๏ธ-development-commands)
- [๐๏ธ Architecture Overview](#๏ธ-architecture-overview)
- [โ Adding New Pipelines](#-adding-new-pipelines)
- [๐ Related Resources](#-related-resources)
- [๐ค Contributing](#-contributing)
- [๐ License](#-license)
---
## ๐ Overview
This is a production-ready template for building **Kubeflow Pipelines (KFP)** workflows with Python.
It provides a structured, scalable architecture for ML pipelines with containerized task execution, type-safe configuration, and comprehensive testing.
### โจ Key Features
- ๐ **Kubeflow Pipelines Integration**: Build, compile, and deploy KFP workflows
- ๐งฉ **Task-Based Architecture**: Modular ML tasks (feature engineering, training, evaluation, inference, export)
- ๐ **Environment Management**: Multi-environment support (dev, prod) with isolated configurations
- โก **Modern Python Tooling**: Built with [uv](https://github.com/astral-sh/uv) and [Ruff](https://github.com/astral-sh/ruff)
- ๐ **Type Safety**: Full type hints with ty and Pydantic validation
- ๐ **SQL Linting**: Automated SQL quality checks with [SQLFluff](https://github.com/sqlfluff/sqlfluff) for BigQuery
- ๐ **CI/CD Ready**: GitHub Actions workflows for testing, linting, and Docker builds
## ๐ฆ Prerequisites
- ๐ [Python 3.10+](https://www.python.org/downloads/) - Programming language
- ๐ฆ [uv](https://docs.astral.sh/uv/getting-started/installation/) - Fast Python package installer and resolver
- ๐ณ [Docker](https://docs.docker.com/get-docker/) - Container platform (for builds)
- โธ๏ธ [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/v2/installation/) - ML workflow orchestration platform
> ๐ก **Quick Install uv**:
> ```bash
> # macOS/Linux
> curl -LsSf https://astral.sh/uv/install.sh | sh
>
> # Windows
> powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
> ```
## ๐ Getting Started
### 1๏ธโฃ Install Dependencies
```bash
uv sync
```
### 2๏ธโฃ Run Tests
```bash
uv run nox -s test
```
### 3๏ธโฃ Compile a Pipeline
```bash
uv run nox -s compile_pipeline -- \
--env dev \
--pipeline_name sample-pipeline \
--tag test \
--model_type sample
```
## ๐ Project Structure
```
.
โโโ const/ # Shared enumerations
โ โโโ environment.py # Environment enum (dev, prod)
โ โโโ model_type.py # Model type enum (sample, ...)
โ โโโ task.py # Task enum (feature_engineering, training, ...)
โโโ environments/ # Environment-specific settings
โ โโโ dev.py # Development environment config
โ โโโ prod.py # Production environment config
โ โโโ settings.py # Settings loader
โโโ pipelines/ # KFP pipeline definitions
โ โโโ components.py # KFP container components
โ โโโ graphs/ # Pipeline graph definitions
โ โ โโโ sample.py # Sample pipeline graph
โ โโโ main.py # Pipeline compiler & uploader
โ โโโ settings.py # Pipeline compilation settings
โโโ tasks/ # ML task implementations
โ โโโ base.py # BaseTask protocol
โ โโโ feature_engineering/ # Feature engineering task
โ โโโ training/ # Model training task
โ โโโ evaluation/ # Model evaluation task
โ โโโ inference/ # Inference task
โ โโโ export/ # Export task
โโโ tests/ # Test suite (mirrors src structure)
โโโ main.py # Task executor (runs inside KFP containers)
โโโ noxfile.py # Task automation with Nox
โโโ pyproject.toml # Project dependencies & metadata
โโโ pytest.ini # Pytest configuration
โโโ ruff.toml # Ruff linter configuration
โโโ .sqlfluff # SQLFluff SQL linter configuration
```
**Key Files**:
- [`main.py`](./main.py) - Entry point for task execution in containers
- [`noxfile.py`](./noxfile.py) - Development task automation (test, lint, fmt, compile_pipeline)
- [`pyproject.toml`](./pyproject.toml) - Project configuration and dependencies
- [`.sqlfluff`](./.sqlfluff) - SQL linter configuration (BigQuery dialect)
- [`CLAUDE.md`](./CLAUDE.md) - Architecture guide for Claude Code
## ๐ ๏ธ Development Commands
### ๐งช Testing
```bash
# Run all tests
uv run nox -s test
# Run specific test file
uv run pytest tests/path/to/test__file.py
# Run with JUnit XML output
uv run nox -s test -- --junitxml=results.xml
```
### โ
Code Quality
```bash
# Format code (Python)
uv run nox -s fmt -- --ruff
# Format SQL files
uv run nox -s fmt -- --sqlfluff
# Format all
uv run nox -s fmt -- --ruff --sqlfluff
# Run all linters
uv run nox -s lint -- --ruff --sqlfluff --ty
# Run individual linters
uv run nox -s lint -- --ruff # Python linting
uv run nox -s lint -- --sqlfluff # SQL linting
uv run nox -s lint -- --ty # Type checking
```
### ๐ง Pipeline Development
```bash
# Compile and upload pipeline
uv run nox -s compile_pipeline -- \
--env \
--pipeline_name \
--tag \
--model_type
```
## ๐๏ธ Architecture Overview
This project uses a **dual-mode architecture**:
1. **Pipeline Compilation Mode** (`pipelines/main.py`): Compiles KFP pipeline definitions to YAML and uploads to Kubeflow
2. **Task Execution Mode** (`main.py`): Runs individual tasks inside KFP containers
### ๐ How It Works
1. ๐ **Define tasks** in `tasks//` with settings and run logic
2. ๐ **Create pipeline graphs** in `pipelines/graphs/` that chain tasks together
3. ๐ **Register components**: tasks in `main.py` task_maps and pipelines in `pipelines/main.py` pipeline_types
4. ๐ฆ **Compile pipeline** with `compile_pipeline` - generates KFP YAML and uploads to registry
5. โถ๏ธ **Execute**: KFP runs pipeline - each component executes `main.py` with task-specific arguments in containers
## โ Adding New Pipelines
### Step-by-Step Guide
#### 1๏ธโฃ Define Model Type
Add your model type to [`const/model_type.py`](./const/model_type.py):
```python
class ModelType(StrEnum):
"""Enumeration for different Model Types."""
SAMPLE = "sample"
YOUR_MODEL = "your_model" # โ Add this
```
#### 2๏ธโฃ Create Pipeline Graph
Create a new file `pipelines/graphs/your_model.py`:
```python
from typing import TYPE_CHECKING
from kfp import dsl
from pipelines.components import (
evaluation,
feature_engineering,
training,
inference,
export,
)
if TYPE_CHECKING:
from kfp.dsl.graph_component import GraphComponent
from pipelines.settings import PipelineCompileArgs
def get_pipeline(args: PipelineCompileArgs) -> GraphComponent:
"""Get your model pipeline.
Args:
args (PipelineCompileArgs): Pipeline arguments for compilation.
Returns:
GraphComponent: Pipeline Graph Component.
"""
@dsl.pipeline(name=args.pipeline_name)
def pipeline_def(execution_date: str) -> None:
fe_task = feature_engineering(
image=args.image,
execution_date=execution_date,
model_type=args.model_type,
).set_display_name("Feature Engineering")
training_task = (
training(
image=args.image,
execution_date=execution_date,
model_type=args.model_type,
)
.after(fe_task)
.set_display_name("Train Model")
)
# Add more tasks...
return pipeline_def
```
#### 3๏ธโฃ Implement Tasks
Create task implementations in `tasks//run.py`:
```python
from logging import getLogger
from tasks.base import T_co
from tasks.training.settings import TrainingSettings
logger = getLogger(__name__)
class TrainingTask:
"""Training Task."""
def __init__(
self,
*args: tuple[T_co],
**kwargs: dict[str, T_co],
) -> None:
"""Initialize the Training Task."""
self.settings = TrainingSettings()
def run(self) -> None:
"""Run the Training Task."""
logger.info("settings=%s", self.settings)
# Your training logic here
```
#### 4๏ธโฃ Register Components
**Register tasks** in [`main.py`](./main.py):
```python
task_maps: dict[ModelType, dict[Task, type[BaseTask]]] = {
ModelType.SAMPLE: {
Task.FEATURE_ENGINEERING: FeatureEngineeringTask,
Task.TRAINING: TrainingTask,
# ...
},
ModelType.YOUR_MODEL: { # โ Add this
Task.TRAINING: YourTrainingTask,
# ...
},
}
```
**Register pipeline** in [`pipelines/main.py`](./pipelines/main.py):
```python
from pipelines.graphs import sample, your_model
pipeline_types = {
ModelType.SAMPLE: sample.get_pipeline,
ModelType.YOUR_MODEL: your_model.get_pipeline, # โ Add this
}
```
#### 5๏ธโฃ Compile & Deploy
```bash
uv run nox -s compile_pipeline -- \
--env dev \
--pipeline_name your-model-pipeline \
--tag v1.0.0 \
--model_type your_model
```
> ๐ก **Tip**: See [CLAUDE.md](./CLAUDE.md) for detailed architecture patterns and development guidelines.
---
## ๐ Related Resources
### Official Documentation
- ๐ [Kubeflow Pipelines v2](https://www.kubeflow.org/docs/components/pipelines/v2/) - KFP documentation
- ๐ฆ [uv Documentation](https://docs.astral.sh/uv/) - Python package manager
- ๐ [Ruff Documentation](https://docs.astral.sh/ruff/) - Linter and formatter
- ๐ [SQLFluff Documentation](https://docs.sqlfluff.com/) - SQL linter and formatter
- โ
[ty](https://github.com/astral-sh/ty) - Static type checker
- ๐งช [Pytest](https://docs.pytest.org/) - Testing framework
- ๐ง [Nox](https://nox.thea.codes/) - Task automation tool
### Kubeflow Pipelines
- [KFP SDK Reference](https://kubeflow-pipelines.readthedocs.io/en/stable/) - Python SDK documentation
- [Container Components Guide](https://www.kubeflow.org/docs/components/pipelines/v2/components/container-components/) - Building container-based components
- [Pipeline Compilation](https://www.kubeflow.org/docs/components/pipelines/v2/compile-a-pipeline/) - Compiling pipelines to YAML
### Python Libraries
- [Pydantic](https://docs.pydantic.dev/) - Data validation using Python type annotations
- [Pydantic Settings](https://docs.pydantic.dev/latest/concepts/pydantic_settings/) - Settings management from environment variables
---
## ๐ค Contributing
We welcome contributions! Please follow these steps:
### Development Workflow
1. ๐ด **Fork** the repository
2. ๐ฅ **Clone** your fork:
```bash
git clone https://github.com/YOUR_USERNAME/ml-pipelines.git
cd ml-pipelines
```
3. ๐ฟ **Create** a feature branch:
```bash
git checkout -b feature/amazing-feature
```
4. ๐ฆ **Install** dependencies:
```bash
uv sync
```
5. โ๏ธ **Make** your changes with tests
6. ๐จ **Format** code:
```bash
uv run nox -s fmt
```
7. ๐ **Lint** code:
```bash
uv run nox -s lint -- --ruff --ty
```
8. โ
**Test** changes:
```bash
uv run nox -s test
```
9. ๐พ **Commit** your changes:
```bash
git commit -m 'Add amazing feature'
```
10. ๐ค **Push** to your branch:
```bash
git push origin feature/amazing-feature
```
11. ๐ฎ **Submit** a pull request
### Code Standards
- โ
Maintain **75%+ test coverage** (enforced by pytest)
- ๐จ Follow **Ruff** formatting and linting rules ([`ruff.toml`](./ruff.toml))
- ๐ Follow **SQLFluff** SQL formatting rules ([`.sqlfluff`](./.sqlfluff))
- ๐ Pass **ty** type checking ([`ty.toml`](./ty.toml))
- ๐ Write **clear commit messages**
- ๐งช Add **tests** for new features
- ๐ Update **documentation** as needed
### Testing Naming Convention
Test files must follow the `test__*.py` format (note the double underscore):
- โ
`test__base.py`
- โ
`test__training.py`
- โ `test_base.py` (single underscore - won't be discovered)
---
## ๐ License
This project is licensed under the terms specified in the [LICENSE](./LICENSE) file.