https://github.com/prassanna-ravishankar/modalkit

A powerful Python framework for deploying ML models on Modal with production-ready features
https://github.com/prassanna-ravishankar/modalkit
mlops model-serving
Last synced: 9 months ago
JSON representation
A powerful Python framework for deploying ML models on Modal with production-ready features
Host: GitHub
URL: https://github.com/prassanna-ravishankar/modalkit
Owner: prassanna-ravishankar
License: mit
Created: 2025-07-10T07:37:36.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-08-26T12:24:20.000Z (10 months ago)
Last Synced: 2025-08-31T09:39:39.908Z (9 months ago)
Topics: mlops, model-serving
Language: Python
Homepage: http://prassanna.io/modalkit/
Size: 1.23 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # Modalkit



  

    

  

  

    

  

  

    

  

  

    

  





  





  A powerful Python framework for deploying ML models on Modal with production-ready features



## 🎯 What Modalkit Offers Over Raw Modal

While Modal provides excellent serverless infrastructure, Modalkit adds a complete ML deployment framework:

### 🏗️ **Standardized ML Architecture**

- **Structured Inference Pipeline**: Enforced `preprocess()` → `predict()` → `postprocess()` pattern

- **Consistent API Endpoints**: `/predict_sync`, `/predict_batch`, `/predict_async` across all deployments

- **Type-Safe Interfaces**: Pydantic models ensure data validation at API boundaries

### ⚙️ **Configuration-Driven Deployments**

- **YAML Configuration**: Version-controlled deployment settings instead of scattered code

- **Environment Management**: Easy dev/staging/prod configs with override capabilities

- **Reproducible Builds**: Declarative infrastructure removes deployment inconsistencies

### 👥 **Team-Friendly Workflows**

- **Shared Standards**: All team members deploy models the same way

- **Code Separation**: Model logic decoupled from Modal deployment boilerplate

- **Collaboration**: Config files in git enable infrastructure review and collaboration

### 🚀 **Production Features Out-of-the-Box**

- **Authentication Middleware**: Built-in API key or Modal proxy auth

- **Queue Integration**: Async processing with multiple backend support

- **Cloud Storage**: Direct S3/GCS/R2 mounting without manual setup

- **Batch Processing**: Intelligent request batching for GPU efficiency

- **Error Handling**: Comprehensive error responses and logging

### 💡 **Developer Experience**

- **Less Boilerplate**: Focus on model code, not FastAPI/Modal setup

- **Modern Tooling**: Pre-configured with ruff, mypy, pre-commit hooks

- **Testing Framework**: Built-in patterns for testing ML deployments

**In short**: Modalkit transforms Modal from infrastructure primitives into a complete ML platform, letting teams deploy models consistently while maintaining Modal's performance and scalability.

## ✨ Key Features

- 🚀 **Native Modal Integration**: Seamless deployment on Modal's serverless infrastructure

- 🔐 **Flexible Authentication**: Modal proxy auth or custom API keys with AWS SSM support

- ☁️ **Cloud Storage Support**: Direct mounting of S3, GCS, and R2 buckets

- 🔄 **Flexible Queue Integration**: Optional queue backends with dependency injection - use TaskIQ, SQS, or any custom queue system

- 📦 **Batch Inference**: Efficient batch processing with configurable batch sizes

- 🎯 **Type Safety**: Full Pydantic integration for request/response validation

- 🛠️ **Developer Friendly**: Pre-configured with modern Python tooling (ruff, pre-commit)

- 📊 **Production Ready**: Comprehensive error handling and logging

## 🚀 Quick Start

### Installation

```bash

# Using pip (recommended)

pip install modalkit

# Using uv

uv pip install modalkit

# Development/latest version from GitHub

pip install git+https://github.com/prassanna-ravishankar/modalkit.git

```

### 📚 Complete Examples

Working examples are available in the documentation:

- **[Queue Backend Patterns](https://prassanna-ravishankar.github.io/modalkit/examples/queue-patterns/)** - Queue backend patterns and dependency injection

- **[TaskIQ Integration](https://prassanna-ravishankar.github.io/modalkit/examples/taskiq-integration/)** - Full TaskIQ integration tutorial

Follow the step-by-step tutorials to build complete working examples with your own ML models.

### 1. Define Your Model

Create an inference class that inherits from `InferencePipeline`:

```python

from modalkit.inference_pipeline import InferencePipeline

from pydantic import BaseModel

from typing import List

# Define input/output schemas with Pydantic

class TextInput(BaseModel):

    text: str

    language: str = "en"

class TextOutput(BaseModel):

    translated_text: str

    confidence: float

# Implement your model logic

class TranslationModel(InferencePipeline):

    def __init__(self, model_name: str, all_model_data_folder: str, common_settings: dict, *args, **kwargs):

        super().__init__(model_name, all_model_data_folder, common_settings)

        # Load your model here

        # self.model = load_model(...)

    def preprocess(self, input_list: List[TextInput]) -> dict:

        """Prepare inputs for the model"""

        texts = [item.text for item in input_list]

        return {"texts": texts, "languages": [item.language for item in input_list]}

    def predict(self, input_list: List[TextInput], preprocessed_data: dict) -> dict:

        """Run model inference"""

        # Your model prediction logic

        translations = [text.upper() for text in preprocessed_data["texts"]]  # Example

        return {"translations": translations, "scores": [0.95] * len(translations)}

    def postprocess(self, input_list: List[TextInput], raw_output: dict) -> List[TextOutput]:

        """Format model outputs"""

        return [

            TextOutput(translated_text=text, confidence=score)

            for text, score in zip(raw_output["translations"], raw_output["scores"])

        ]

```

### 2. Create Your Modal App

```python

import modal

from modalkit.modal_service import ModalService, create_web_endpoints

from modalkit.modal_config import ModalConfig

# Initialize with your config

modal_config = ModalConfig()

app = modal.App(name=modal_config.app_name)

# Define your Modal app class

@app.cls(**modal_config.get_app_cls_settings())

class TranslationApp(ModalService):

    inference_implementation = TranslationModel

    model_name: str = modal.parameter(default="translation_model")

    modal_utils: ModalConfig = modal_config

    # Optional: Inject custom queue backend

    # def __init__(self, queue_backend=None):

    #     super().__init__(queue_backend=queue_backend)

# Create API endpoints

@app.function(**modal_config.get_handler_settings())

@modal.asgi_app(**modal_config.get_asgi_app_settings())

def web_endpoints():

    return create_web_endpoints(

        app_cls=TranslationApp,

        input_model=TextInput,

        output_model=TextOutput

    )

```

> 💡 **Queue backends are optional** - your service works perfectly without any queue configuration. Add TaskIQ or custom queues when you need async processing. See the [documentation examples](https://prassanna-ravishankar.github.io/modalkit/examples/) for working implementations.

### 3. Configure Your Deployment

Create a `modalkit.yaml` configuration file:

```yaml

# modalkit.yaml

app_settings:

  app_prefix: "translation-service"

  # Authentication configuration

  auth_config:

    # Option 1: Use API key from AWS SSM

    ssm_key: "/translation/api-key"

    auth_header: "x-api-key"

    # Option 2: Use hardcoded API key (not recommended for production)

    # api_key: "your-api-key-here"

    # auth_header: "x-api-key"

  # Container configuration

  build_config:

    image: "python:3.11-slim"  # or your custom image

    tag: "latest"

    workdir: "/app"

    env:

      MODEL_VERSION: "v1.0"

  # Deployment settings

  deployment_config:

    gpu: "T4"  # Options: T4, A10G, A100, or null for CPU

    concurrency_limit: 10

    container_idle_timeout: 300

    secure: false  # Set to true for Modal proxy auth

    # Cloud storage mounts (optional)

    cloud_bucket_mounts:

      - mount_point: "/mnt/models"

        bucket_name: "my-model-bucket"

        secret: "aws-credentials"

        read_only: true

        key_prefix: "models/"

  # Batch processing settings

  batch_config:

    max_batch_size: 32

    wait_ms: 100  # Wait up to 100ms to fill batch

  # Queue configuration (optional - for async endpoints)

  # Leave empty to disable queues, or configure fallback backend

  queue_config:

    backend: "memory"  # Options: "sqs", "memory", or omit for no queues

    # broker_url: "redis://localhost:6379"  # For TaskIQ via dependency injection

# Model configuration

model_settings:

  local_model_repository_folder: "./models"

  common:

    cache_dir: "./cache"

    device: "cuda"  # or "cpu"

  model_entries:

    translation_model:

      model_path: "path/to/model.pt"

      vocab_size: 50000

```

### 4. Deploy to Modal

```bash

# Test locally

modal serve app.py

# Deploy to production

modal deploy app.py

# View logs

modal logs -f

```

### 5. Use Your API

```python

import requests

import asyncio

# For standard API key auth

headers = {"x-api-key": "your-api-key"}

# Synchronous endpoint

response = requests.post(

    "https://your-org--translation-service.modal.run/predict_sync",

    json={"text": "Hello world", "language": "en"},

    headers=headers

)

print(response.json())

# {"translated_text": "HELLO WORLD", "confidence": 0.95}

# Asynchronous endpoint (returns immediately)

response = requests.post(

    "https://your-org--translation-service.modal.run/predict_async",

    json={"text": "Hello world", "language": "en"},

    headers=headers

)

print(response.json())

# {"message_id": "550e8400-e29b-41d4-a716-446655440000"}

# Batch endpoint

response = requests.post(

    "https://your-org--translation-service.modal.run/predict_batch",

    json=[

        {"text": "Hello", "language": "en"},

        {"text": "World", "language": "en"}

    ],

    headers=headers

)

print(response.json())

# [{"translated_text": "HELLO", "confidence": 0.95}, {"translated_text": "WORLD", "confidence": 0.95}]

```

## 🔐 Authentication

Modalkit provides flexible authentication options:

### Option 1: Custom API Key (Default)

Configure with `secure: false` in your deployment config.

```yaml

# modalkit.yaml

deployment_config:

  secure: false

auth_config:

  # Store in AWS SSM (recommended)

  ssm_key: "/myapp/api-key"

  # OR hardcode (not recommended)

  # api_key: "sk-1234567890"

  auth_header: "x-api-key"

```

```python

# Client usage

headers = {"x-api-key": "your-api-key"}

response = requests.post(url, json=data, headers=headers)

```

### Option 2: Modal Proxy Authentication

Configure with `secure: true` for Modal's built-in auth:

```yaml

# modalkit.yaml

deployment_config:

  secure: true  # Enables Modal proxy auth

```

```python

# Client usage

headers = {

    "Modal-Key": "your-modal-key",

    "Modal-Secret": "your-modal-secret"

}

response = requests.post(url, json=data, headers=headers)

```

> 💡 **Tip**: Modal proxy auth is recommended for production as it's managed by Modal and requires no additional setup.

## ⚙️ Configuration

### Configuration Structure

Modalkit uses YAML configuration with two main sections:

```yaml

# modalkit.yaml

app_settings:        # Application deployment settings

  app_prefix: str    # Prefix for your Modal app name

  auth_config:       # Authentication configuration

  build_config:      # Container build settings

  deployment_config: # Runtime deployment settings

  batch_config:      # Batch processing settings

  queue_config:      # Async queue settings

model_settings:      # Model-specific settings

  local_model_repository_folder: str

  common: dict       # Shared settings across models

  model_entries:     # Model-specific configurations

    model_name: dict

```

### Environment Variables

Set configuration file location:

```bash

# Default location

export MODALKIT_CONFIG="modalkit.yaml"

# Multiple configs (later files override earlier ones)

export MODALKIT_CONFIG="base.yaml,prod.yaml"

# Other environment variables

export MODALKIT_APP_POSTFIX="-prod"  # Appended to app name

```

### Advanced Configuration Options

```yaml

deployment_config:

  # GPU configuration

  gpu: "T4"  # T4, A10G, A100, H100, or null

  # Resource limits

  concurrency_limit: 10

  container_idle_timeout: 300

  retries: 3

  # Memory/CPU (when gpu is null)

  memory: 8192  # MB

  cpu: 4.0      # cores

  # Volumes and mounts

  volumes:

    "/mnt/cache": "model-cache-vol"

  mounts:

    - local_path: "configs/prod.json"

      remote_path: "/app/config.json"

      type: "file"

```

## ☁️ Cloud Storage Integration

Modalkit seamlessly integrates with cloud storage providers through Modal's CloudBucketMount:

### Supported Providers

| Provider | Configuration |

|----------|--------------|

| AWS S3 | Native support with IAM credentials |

| Google Cloud Storage | Service account authentication |

| Cloudflare R2 | S3-compatible API |

| MinIO/Others | Any S3-compatible endpoint |

### Quick Examples

AWS S3 Configuration

```yaml

cloud_bucket_mounts:

  - mount_point: "/mnt/models"

    bucket_name: "my-ml-models"

    secret: "aws-credentials"  # Modal secret name

    key_prefix: "production/"  # Only mount this prefix

    read_only: true

```

First, create the Modal secret:

```bash

modal secret create aws-credentials \

  AWS_ACCESS_KEY_ID=xxx \

  AWS_SECRET_ACCESS_KEY=yyy \

  AWS_DEFAULT_REGION=us-east-1

```

Google Cloud Storage

```yaml

cloud_bucket_mounts:

  - mount_point: "/mnt/datasets"

    bucket_name: "my-datasets"

    bucket_endpoint_url: "https://storage.googleapis.com"

    secret: "gcp-credentials"

```

Create secret from service account:

```bash

modal secret create gcp-credentials \

  --from-gcp-service-account path/to/key.json

```

Cloudflare R2

```yaml

cloud_bucket_mounts:

  - mount_point: "/mnt/artifacts"

    bucket_name: "ml-artifacts"

    bucket_endpoint_url: "https://accountid.r2.cloudflarestorage.com"

    secret: "r2-credentials"

```

### Using Mounted Storage

```python

class MyInference(InferencePipeline):

    def __init__(self, *args, **kwargs):

        super().__init__(*args, **kwargs)

        # Load model from mounted bucket

        model_path = "/mnt/models/my_model.pt"

        self.model = torch.load(model_path)

        # Load dataset

        with open("/mnt/datasets/vocab.json") as f:

            self.vocab = json.load(f)

```

### Best Practices

- ✅ Use read-only mounts for model artifacts

- ✅ Mount only required prefixes with `key_prefix`

- ✅ Use separate buckets for models vs. data

- ✅ Cache frequently accessed files locally

- ❌ Avoid writing logs to mounted buckets

- ❌ Don't mount entire buckets if you only need specific files

## 🚀 Advanced Features

### Flexible Queue Processing

Modalkit offers **optional** queue processing with multiple approaches:

#### 1. No Queues (Default)

Perfect for sync-only APIs:

```python

class MyService(ModalService):

    inference_implementation = MyModel

# No queue backend - async requests process but don't queue responses

service = MyService()

```

#### 2. TaskIQ Integration (Recommended for Production)

Use dependency injection for full TaskIQ support:

```python

from taskiq_redis import AsyncRedisTaskiqBroker

class TaskIQBackend:

    def __init__(self):

        self.broker = AsyncRedisTaskiqBroker("redis://localhost:6379")

    async def send_message(self, queue_name: str, message: str) -> bool:

        @self.broker.task(task_name=f"process_{queue_name}")

        async def process_result(msg: str) -> str:

            # Your custom processing logic

            return f"Processed: {msg}"

        await process_result.kiq(message)

        return True

# Inject TaskIQ backend

service = MyService(queue_backend=TaskIQBackend())

```

#### 3. Configuration-Based Queues

Use YAML configuration for simple setups:

```yaml

queue_config:

  backend: "sqs"  # or "memory"

  # Additional backend-specific settings

```

#### 4. Custom Queue Systems

Implement any queue system:

```python

class MyCustomQueue:

    async def send_message(self, queue_name: str, message: str) -> bool:

        # Your custom queue implementation (RabbitMQ, Kafka, etc.)

        return True

service = MyService(queue_backend=MyCustomQueue())

```

#### Working Examples

See complete tutorials in the documentation:

- **[Queue Backend Patterns](https://prassanna-ravishankar.github.io/modalkit/examples/queue-patterns/)** - Queue backend patterns

- **[TaskIQ Integration](https://prassanna-ravishankar.github.io/modalkit/examples/taskiq-integration/)** - Full TaskIQ integration

```python

# Async endpoint usage

response = requests.post("/predict_async", json={

    "message": {"text": "Process this"},

    "success_queue": "results",

    "failure_queue": "errors"

})

# {"job_id": "uuid"}

```

### Batch Processing

Configure intelligent batching for better GPU utilization:

```yaml

batch_config:

  max_batch_size: 32

  wait_ms: 100  # Max time to wait for batch to fill

```

### Volume Reloading

Auto-reload Modal volumes for model updates:

```yaml

deployment_config:

  volumes:

    "/mnt/models": "model-volume"

  volume_reload_interval_seconds: 300  # Reload every 5 minutes

```

## 🛠️ Development

### Setup

```bash

# Clone repository

git clone https://github.com/prassanna-ravishankar/modalkit.git

cd modalkit

# Install with uv (recommended)

uv sync

# Install pre-commit hooks

uv run pre-commit install

```

### Testing

```bash

# Run all tests

uv run pytest --cov --cov-config=pyproject.toml --cov-report=xml

# Run specific tests

uv run pytest tests/test_modal_service.py -v

# Run with HTML coverage report

uv run pytest --cov=modalkit --cov-report=html

```

### Code Quality

```bash

# Run all checks

uv run pre-commit run -a

# Run type checking

uv run mypy modalkit/

# Format code

uv run ruff format modalkit/ tests/

# Lint code

uv run ruff check modalkit/ tests/

```

## 📖 API Reference

### Endpoints

| Endpoint | Method | Description | Returns |

|----------|---------|-------------|----------|

| `/predict_sync` | POST | Synchronous inference | Model output |

| `/predict_async` | POST | Async inference (queued) | Message ID |

| `/predict_batch` | POST | Batch inference | List of outputs |

| `/health` | GET | Health check | Status |

### InferencePipeline Methods

Your model class must implement:

```python

def preprocess(self, input_list: List[InputModel]) -> dict

def predict(self, input_list: List[InputModel], preprocessed_data: dict) -> dict

def postprocess(self, input_list: List[InputModel], raw_output: dict) -> List[OutputModel]

```

## 🤝 Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

### Development Workflow

1. Fork the repository

2. Create a feature branch (`git checkout -b feature/amazing-feature`)

3. Make your changes

4. Run tests and linting (`uv run pytest && uv run pre-commit run -a`)

5. Commit your changes (pre-commit hooks will run automatically)

6. Push to your fork and open a Pull Request

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

Built with ❤️ using:

- [Modal](https://modal.com) - Serverless infrastructure for ML

- [FastAPI](https://fastapi.tiangolo.com) - Modern web framework

- [Pydantic](https://pydantic-docs.helpmanual.io) - Data validation

- [Taskiq](https://taskiq-python.github.io) - Async task processing

---



  Report Bug •

  Request Feature •

  Documentation
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/prassanna-ravishankar/modalkit

Awesome Lists containing this project

README