https://github.com/prassanna-ravishankar/modalkit
A powerful Python framework for deploying ML models on Modal with production-ready features
https://github.com/prassanna-ravishankar/modalkit
mlops model-serving
Last synced: 9 months ago
JSON representation
A powerful Python framework for deploying ML models on Modal with production-ready features
- Host: GitHub
- URL: https://github.com/prassanna-ravishankar/modalkit
- Owner: prassanna-ravishankar
- License: mit
- Created: 2025-07-10T07:37:36.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-08-26T12:24:20.000Z (10 months ago)
- Last Synced: 2025-08-31T09:39:39.908Z (9 months ago)
- Topics: mlops, model-serving
- Language: Python
- Homepage: http://prassanna.io/modalkit/
- Size: 1.23 MB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Modalkit
A powerful Python framework for deploying ML models on Modal with production-ready features
## 🎯 What Modalkit Offers Over Raw Modal
While Modal provides excellent serverless infrastructure, Modalkit adds a complete ML deployment framework:
### 🏗️ **Standardized ML Architecture**
- **Structured Inference Pipeline**: Enforced `preprocess()` → `predict()` → `postprocess()` pattern
- **Consistent API Endpoints**: `/predict_sync`, `/predict_batch`, `/predict_async` across all deployments
- **Type-Safe Interfaces**: Pydantic models ensure data validation at API boundaries
### ⚙️ **Configuration-Driven Deployments**
- **YAML Configuration**: Version-controlled deployment settings instead of scattered code
- **Environment Management**: Easy dev/staging/prod configs with override capabilities
- **Reproducible Builds**: Declarative infrastructure removes deployment inconsistencies
### 👥 **Team-Friendly Workflows**
- **Shared Standards**: All team members deploy models the same way
- **Code Separation**: Model logic decoupled from Modal deployment boilerplate
- **Collaboration**: Config files in git enable infrastructure review and collaboration
### 🚀 **Production Features Out-of-the-Box**
- **Authentication Middleware**: Built-in API key or Modal proxy auth
- **Queue Integration**: Async processing with multiple backend support
- **Cloud Storage**: Direct S3/GCS/R2 mounting without manual setup
- **Batch Processing**: Intelligent request batching for GPU efficiency
- **Error Handling**: Comprehensive error responses and logging
### 💡 **Developer Experience**
- **Less Boilerplate**: Focus on model code, not FastAPI/Modal setup
- **Modern Tooling**: Pre-configured with ruff, mypy, pre-commit hooks
- **Testing Framework**: Built-in patterns for testing ML deployments
**In short**: Modalkit transforms Modal from infrastructure primitives into a complete ML platform, letting teams deploy models consistently while maintaining Modal's performance and scalability.
## ✨ Key Features
- 🚀 **Native Modal Integration**: Seamless deployment on Modal's serverless infrastructure
- 🔐 **Flexible Authentication**: Modal proxy auth or custom API keys with AWS SSM support
- ☁️ **Cloud Storage Support**: Direct mounting of S3, GCS, and R2 buckets
- 🔄 **Flexible Queue Integration**: Optional queue backends with dependency injection - use TaskIQ, SQS, or any custom queue system
- 📦 **Batch Inference**: Efficient batch processing with configurable batch sizes
- 🎯 **Type Safety**: Full Pydantic integration for request/response validation
- 🛠️ **Developer Friendly**: Pre-configured with modern Python tooling (ruff, pre-commit)
- 📊 **Production Ready**: Comprehensive error handling and logging
## 🚀 Quick Start
### Installation
```bash
# Using pip (recommended)
pip install modalkit
# Using uv
uv pip install modalkit
# Development/latest version from GitHub
pip install git+https://github.com/prassanna-ravishankar/modalkit.git
```
### 📚 Complete Examples
Working examples are available in the documentation:
- **[Queue Backend Patterns](https://prassanna-ravishankar.github.io/modalkit/examples/queue-patterns/)** - Queue backend patterns and dependency injection
- **[TaskIQ Integration](https://prassanna-ravishankar.github.io/modalkit/examples/taskiq-integration/)** - Full TaskIQ integration tutorial
Follow the step-by-step tutorials to build complete working examples with your own ML models.
### 1. Define Your Model
Create an inference class that inherits from `InferencePipeline`:
```python
from modalkit.inference_pipeline import InferencePipeline
from pydantic import BaseModel
from typing import List
# Define input/output schemas with Pydantic
class TextInput(BaseModel):
text: str
language: str = "en"
class TextOutput(BaseModel):
translated_text: str
confidence: float
# Implement your model logic
class TranslationModel(InferencePipeline):
def __init__(self, model_name: str, all_model_data_folder: str, common_settings: dict, *args, **kwargs):
super().__init__(model_name, all_model_data_folder, common_settings)
# Load your model here
# self.model = load_model(...)
def preprocess(self, input_list: List[TextInput]) -> dict:
"""Prepare inputs for the model"""
texts = [item.text for item in input_list]
return {"texts": texts, "languages": [item.language for item in input_list]}
def predict(self, input_list: List[TextInput], preprocessed_data: dict) -> dict:
"""Run model inference"""
# Your model prediction logic
translations = [text.upper() for text in preprocessed_data["texts"]] # Example
return {"translations": translations, "scores": [0.95] * len(translations)}
def postprocess(self, input_list: List[TextInput], raw_output: dict) -> List[TextOutput]:
"""Format model outputs"""
return [
TextOutput(translated_text=text, confidence=score)
for text, score in zip(raw_output["translations"], raw_output["scores"])
]
```
### 2. Create Your Modal App
```python
import modal
from modalkit.modal_service import ModalService, create_web_endpoints
from modalkit.modal_config import ModalConfig
# Initialize with your config
modal_config = ModalConfig()
app = modal.App(name=modal_config.app_name)
# Define your Modal app class
@app.cls(**modal_config.get_app_cls_settings())
class TranslationApp(ModalService):
inference_implementation = TranslationModel
model_name: str = modal.parameter(default="translation_model")
modal_utils: ModalConfig = modal_config
# Optional: Inject custom queue backend
# def __init__(self, queue_backend=None):
# super().__init__(queue_backend=queue_backend)
# Create API endpoints
@app.function(**modal_config.get_handler_settings())
@modal.asgi_app(**modal_config.get_asgi_app_settings())
def web_endpoints():
return create_web_endpoints(
app_cls=TranslationApp,
input_model=TextInput,
output_model=TextOutput
)
```
> 💡 **Queue backends are optional** - your service works perfectly without any queue configuration. Add TaskIQ or custom queues when you need async processing. See the [documentation examples](https://prassanna-ravishankar.github.io/modalkit/examples/) for working implementations.
### 3. Configure Your Deployment
Create a `modalkit.yaml` configuration file:
```yaml
# modalkit.yaml
app_settings:
app_prefix: "translation-service"
# Authentication configuration
auth_config:
# Option 1: Use API key from AWS SSM
ssm_key: "/translation/api-key"
auth_header: "x-api-key"
# Option 2: Use hardcoded API key (not recommended for production)
# api_key: "your-api-key-here"
# auth_header: "x-api-key"
# Container configuration
build_config:
image: "python:3.11-slim" # or your custom image
tag: "latest"
workdir: "/app"
env:
MODEL_VERSION: "v1.0"
# Deployment settings
deployment_config:
gpu: "T4" # Options: T4, A10G, A100, or null for CPU
concurrency_limit: 10
container_idle_timeout: 300
secure: false # Set to true for Modal proxy auth
# Cloud storage mounts (optional)
cloud_bucket_mounts:
- mount_point: "/mnt/models"
bucket_name: "my-model-bucket"
secret: "aws-credentials"
read_only: true
key_prefix: "models/"
# Batch processing settings
batch_config:
max_batch_size: 32
wait_ms: 100 # Wait up to 100ms to fill batch
# Queue configuration (optional - for async endpoints)
# Leave empty to disable queues, or configure fallback backend
queue_config:
backend: "memory" # Options: "sqs", "memory", or omit for no queues
# broker_url: "redis://localhost:6379" # For TaskIQ via dependency injection
# Model configuration
model_settings:
local_model_repository_folder: "./models"
common:
cache_dir: "./cache"
device: "cuda" # or "cpu"
model_entries:
translation_model:
model_path: "path/to/model.pt"
vocab_size: 50000
```
### 4. Deploy to Modal
```bash
# Test locally
modal serve app.py
# Deploy to production
modal deploy app.py
# View logs
modal logs -f
```
### 5. Use Your API
```python
import requests
import asyncio
# For standard API key auth
headers = {"x-api-key": "your-api-key"}
# Synchronous endpoint
response = requests.post(
"https://your-org--translation-service.modal.run/predict_sync",
json={"text": "Hello world", "language": "en"},
headers=headers
)
print(response.json())
# {"translated_text": "HELLO WORLD", "confidence": 0.95}
# Asynchronous endpoint (returns immediately)
response = requests.post(
"https://your-org--translation-service.modal.run/predict_async",
json={"text": "Hello world", "language": "en"},
headers=headers
)
print(response.json())
# {"message_id": "550e8400-e29b-41d4-a716-446655440000"}
# Batch endpoint
response = requests.post(
"https://your-org--translation-service.modal.run/predict_batch",
json=[
{"text": "Hello", "language": "en"},
{"text": "World", "language": "en"}
],
headers=headers
)
print(response.json())
# [{"translated_text": "HELLO", "confidence": 0.95}, {"translated_text": "WORLD", "confidence": 0.95}]
```
## 🔐 Authentication
Modalkit provides flexible authentication options:
### Option 1: Custom API Key (Default)
Configure with `secure: false` in your deployment config.
```yaml
# modalkit.yaml
deployment_config:
secure: false
auth_config:
# Store in AWS SSM (recommended)
ssm_key: "/myapp/api-key"
# OR hardcode (not recommended)
# api_key: "sk-1234567890"
auth_header: "x-api-key"
```
```python
# Client usage
headers = {"x-api-key": "your-api-key"}
response = requests.post(url, json=data, headers=headers)
```
### Option 2: Modal Proxy Authentication
Configure with `secure: true` for Modal's built-in auth:
```yaml
# modalkit.yaml
deployment_config:
secure: true # Enables Modal proxy auth
```
```python
# Client usage
headers = {
"Modal-Key": "your-modal-key",
"Modal-Secret": "your-modal-secret"
}
response = requests.post(url, json=data, headers=headers)
```
> 💡 **Tip**: Modal proxy auth is recommended for production as it's managed by Modal and requires no additional setup.
## ⚙️ Configuration
### Configuration Structure
Modalkit uses YAML configuration with two main sections:
```yaml
# modalkit.yaml
app_settings: # Application deployment settings
app_prefix: str # Prefix for your Modal app name
auth_config: # Authentication configuration
build_config: # Container build settings
deployment_config: # Runtime deployment settings
batch_config: # Batch processing settings
queue_config: # Async queue settings
model_settings: # Model-specific settings
local_model_repository_folder: str
common: dict # Shared settings across models
model_entries: # Model-specific configurations
model_name: dict
```
### Environment Variables
Set configuration file location:
```bash
# Default location
export MODALKIT_CONFIG="modalkit.yaml"
# Multiple configs (later files override earlier ones)
export MODALKIT_CONFIG="base.yaml,prod.yaml"
# Other environment variables
export MODALKIT_APP_POSTFIX="-prod" # Appended to app name
```
### Advanced Configuration Options
```yaml
deployment_config:
# GPU configuration
gpu: "T4" # T4, A10G, A100, H100, or null
# Resource limits
concurrency_limit: 10
container_idle_timeout: 300
retries: 3
# Memory/CPU (when gpu is null)
memory: 8192 # MB
cpu: 4.0 # cores
# Volumes and mounts
volumes:
"/mnt/cache": "model-cache-vol"
mounts:
- local_path: "configs/prod.json"
remote_path: "/app/config.json"
type: "file"
```
## ☁️ Cloud Storage Integration
Modalkit seamlessly integrates with cloud storage providers through Modal's CloudBucketMount:
### Supported Providers
| Provider | Configuration |
|----------|--------------|
| AWS S3 | Native support with IAM credentials |
| Google Cloud Storage | Service account authentication |
| Cloudflare R2 | S3-compatible API |
| MinIO/Others | Any S3-compatible endpoint |
### Quick Examples
AWS S3 Configuration
```yaml
cloud_bucket_mounts:
- mount_point: "/mnt/models"
bucket_name: "my-ml-models"
secret: "aws-credentials" # Modal secret name
key_prefix: "production/" # Only mount this prefix
read_only: true
```
First, create the Modal secret:
```bash
modal secret create aws-credentials \
AWS_ACCESS_KEY_ID=xxx \
AWS_SECRET_ACCESS_KEY=yyy \
AWS_DEFAULT_REGION=us-east-1
```
Google Cloud Storage
```yaml
cloud_bucket_mounts:
- mount_point: "/mnt/datasets"
bucket_name: "my-datasets"
bucket_endpoint_url: "https://storage.googleapis.com"
secret: "gcp-credentials"
```
Create secret from service account:
```bash
modal secret create gcp-credentials \
--from-gcp-service-account path/to/key.json
```
Cloudflare R2
```yaml
cloud_bucket_mounts:
- mount_point: "/mnt/artifacts"
bucket_name: "ml-artifacts"
bucket_endpoint_url: "https://accountid.r2.cloudflarestorage.com"
secret: "r2-credentials"
```
### Using Mounted Storage
```python
class MyInference(InferencePipeline):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Load model from mounted bucket
model_path = "/mnt/models/my_model.pt"
self.model = torch.load(model_path)
# Load dataset
with open("/mnt/datasets/vocab.json") as f:
self.vocab = json.load(f)
```
### Best Practices
- ✅ Use read-only mounts for model artifacts
- ✅ Mount only required prefixes with `key_prefix`
- ✅ Use separate buckets for models vs. data
- ✅ Cache frequently accessed files locally
- ❌ Avoid writing logs to mounted buckets
- ❌ Don't mount entire buckets if you only need specific files
## 🚀 Advanced Features
### Flexible Queue Processing
Modalkit offers **optional** queue processing with multiple approaches:
#### 1. No Queues (Default)
Perfect for sync-only APIs:
```python
class MyService(ModalService):
inference_implementation = MyModel
# No queue backend - async requests process but don't queue responses
service = MyService()
```
#### 2. TaskIQ Integration (Recommended for Production)
Use dependency injection for full TaskIQ support:
```python
from taskiq_redis import AsyncRedisTaskiqBroker
class TaskIQBackend:
def __init__(self):
self.broker = AsyncRedisTaskiqBroker("redis://localhost:6379")
async def send_message(self, queue_name: str, message: str) -> bool:
@self.broker.task(task_name=f"process_{queue_name}")
async def process_result(msg: str) -> str:
# Your custom processing logic
return f"Processed: {msg}"
await process_result.kiq(message)
return True
# Inject TaskIQ backend
service = MyService(queue_backend=TaskIQBackend())
```
#### 3. Configuration-Based Queues
Use YAML configuration for simple setups:
```yaml
queue_config:
backend: "sqs" # or "memory"
# Additional backend-specific settings
```
#### 4. Custom Queue Systems
Implement any queue system:
```python
class MyCustomQueue:
async def send_message(self, queue_name: str, message: str) -> bool:
# Your custom queue implementation (RabbitMQ, Kafka, etc.)
return True
service = MyService(queue_backend=MyCustomQueue())
```
#### Working Examples
See complete tutorials in the documentation:
- **[Queue Backend Patterns](https://prassanna-ravishankar.github.io/modalkit/examples/queue-patterns/)** - Queue backend patterns
- **[TaskIQ Integration](https://prassanna-ravishankar.github.io/modalkit/examples/taskiq-integration/)** - Full TaskIQ integration
```python
# Async endpoint usage
response = requests.post("/predict_async", json={
"message": {"text": "Process this"},
"success_queue": "results",
"failure_queue": "errors"
})
# {"job_id": "uuid"}
```
### Batch Processing
Configure intelligent batching for better GPU utilization:
```yaml
batch_config:
max_batch_size: 32
wait_ms: 100 # Max time to wait for batch to fill
```
### Volume Reloading
Auto-reload Modal volumes for model updates:
```yaml
deployment_config:
volumes:
"/mnt/models": "model-volume"
volume_reload_interval_seconds: 300 # Reload every 5 minutes
```
## 🛠️ Development
### Setup
```bash
# Clone repository
git clone https://github.com/prassanna-ravishankar/modalkit.git
cd modalkit
# Install with uv (recommended)
uv sync
# Install pre-commit hooks
uv run pre-commit install
```
### Testing
```bash
# Run all tests
uv run pytest --cov --cov-config=pyproject.toml --cov-report=xml
# Run specific tests
uv run pytest tests/test_modal_service.py -v
# Run with HTML coverage report
uv run pytest --cov=modalkit --cov-report=html
```
### Code Quality
```bash
# Run all checks
uv run pre-commit run -a
# Run type checking
uv run mypy modalkit/
# Format code
uv run ruff format modalkit/ tests/
# Lint code
uv run ruff check modalkit/ tests/
```
## 📖 API Reference
### Endpoints
| Endpoint | Method | Description | Returns |
|----------|---------|-------------|----------|
| `/predict_sync` | POST | Synchronous inference | Model output |
| `/predict_async` | POST | Async inference (queued) | Message ID |
| `/predict_batch` | POST | Batch inference | List of outputs |
| `/health` | GET | Health check | Status |
### InferencePipeline Methods
Your model class must implement:
```python
def preprocess(self, input_list: List[InputModel]) -> dict
def predict(self, input_list: List[InputModel], preprocessed_data: dict) -> dict
def postprocess(self, input_list: List[InputModel], raw_output: dict) -> List[OutputModel]
```
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
### Development Workflow
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests and linting (`uv run pytest && uv run pre-commit run -a`)
5. Commit your changes (pre-commit hooks will run automatically)
6. Push to your fork and open a Pull Request
## 📝 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
Built with ❤️ using:
- [Modal](https://modal.com) - Serverless infrastructure for ML
- [FastAPI](https://fastapi.tiangolo.com) - Modern web framework
- [Pydantic](https://pydantic-docs.helpmanual.io) - Data validation
- [Taskiq](https://taskiq-python.github.io) - Async task processing
---