https://github.com/runpod/flash
Application framework for Multimodal Distributed inference & Orchestration.
https://github.com/runpod/flash
Last synced: 3 months ago
JSON representation
Application framework for Multimodal Distributed inference & Orchestration.
- Host: GitHub
- URL: https://github.com/runpod/flash
- Owner: runpod
- License: mit
- Created: 2025-03-25T22:16:46.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-04-10T21:30:18.000Z (3 months ago)
- Last Synced: 2026-04-10T22:12:27.826Z (3 months ago)
- Language: Python
- Homepage:
- Size: 4.86 MB
- Stars: 100
- Watchers: 1
- Forks: 10
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
# Flash
Flash is a Python SDK for developing cloud-native AI apps where you define everything—hardware, remote functions, and dependencies—using local code.
```python
import asyncio
from runpod_flash import Endpoint, GpuType
# Mark the function below for remote execution
@Endpoint(name="hello-gpu", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch"])
async def hello(): # This function runs on Runpod
import torch
gpu_name = torch.cuda.get_device_name(0)
print(f"Hello from your GPU! ({gpu_name})")
return {"gpu": gpu_name}
asyncio.run(hello())
print("Done!") # This runs locally
```
Write `@Endpoint` decorated Python functions on your local machine. Run them, and Flash automatically handles GPU/CPU provisioning and worker scaling on [Runpod Serverless](https://docs.runpod.io/serverless/overview).
## Setup
### Install Flash
Install Flash using `pip` or `uv`:
```bash
# Install with pip
pip install runpod-flash
# Or uv
uv add runpod-flash
```
Flash requires [Python 3.10+](https://www.python.org/downloads/), and is currently available for macOS and Linux. Windows support is in development.
### Authentication
Before you can use Flash, you need to authenticate with your Runpod account:
```bash
flash login
```
This saves your API key securely and allows you to use the Flash CLI and run `@Endpoint` functions.
### Coding agent integration (optional)
Install the Flash skill package for AI coding agents like Claude Code, Cline, and Cursor:
```bash
npx skills add runpod/skills
```
You can review the `SKILL.md` file in the [runpod/skills repository](https://github.com/runpod/skills/blob/main/flash/SKILL.md).
## Quickstart
Create `gpu_demo.py`:
```python
import asyncio
from runpod_flash import Endpoint, GpuType
@Endpoint(
name="flash-quickstart",
gpu=GpuType.NVIDIA_GEFORCE_RTX_4090,
workers=3,
dependencies=["numpy", "torch"]
)
def gpu_matrix_multiply(size):
# IMPORTANT: Import packages INSIDE the function
import numpy as np
import torch
# Get GPU name
device_name = torch.cuda.get_device_name(0)
# Create random matrices
A = np.random.rand(size, size)
B = np.random.rand(size, size)
# Multiply matrices
C = np.dot(A, B)
return {
"matrix_size": size,
"result_mean": float(np.mean(C)),
"gpu": device_name
}
# Call the function
async def main():
print("Running matrix multiplication on Runpod GPU...")
result = await gpu_matrix_multiply(1000)
print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}")
print(f"✓ Result mean: {result['result_mean']:.4f}")
print(f"✓ GPU used: {result['gpu']}")
if __name__ == "__main__":
asyncio.run(main())
```
Run it:
```bash
python gpu_demo.py
```
First run takes 30-60 seconds (provisioning). Subsequent runs take 2-3 seconds.
## What Flash does
- **Remote execution**: `@Endpoint` functions run on Runpod Serverless GPUs/CPUs
- **Auto-scaling**: Workers scale from 0 to N based on demand
- **Dependency management**: Packages install automatically on remote workers
- **Two patterns**: Queue-based (`@Endpoint`) for batch work, load-balanced (`Endpoint()` + routes) for REST APIs
- **Concurrency control**: `max_concurrency` lets each worker process multiple jobs simultaneously
## Documentation
Full documentation: **[docs.runpod.io/flash](https://docs.runpod.io/flash)**
- [Quickstart](https://docs.runpod.io/flash/quickstart) - First GPU workload in 5 minutes
- [Create endpoints](https://docs.runpod.io/flash/endpoint-functions) - Queue-based, load-balancing, and custom Docker endpoints
- [CLI reference](https://docs.runpod.io/flash/cli/overview) - `flash run`, `flash deploy`, `flash build`
- [Configuration](https://docs.runpod.io/flash/configuration/parameters) - All endpoint parameters
## Flash apps
When you're ready to move beyond scripts and build a production-ready API, you can create a [Flash app](https://docs.runpod.io/flash/apps/overview) (a collection of interconnected endpoints with diverse hardware configurations) and deploy it to Runpod.
[Follow this tutorial to build your first Flash app](https://docs.runpod.io/flash/apps/build-app).
## Flash CLI
The Flash CLI provides a set of commands for managing your Flash apps and endpoints.
```bash
flash --help
```
[Learn more about the Flash CLI](https://docs.runpod.io/flash/cli/overview).
## Examples
Browse working examples: **[github.com/runpod/flash-examples](https://github.com/runpod/flash-examples)**
## Requirements
- Python 3.12
- macOS or Linux (Windows support in development)
- A [Runpod account](https://runpod.io/console) (email must be verified) with an API key
## Contributing
We welcome contributions! See [RELEASE_SYSTEM.md](RELEASE_SYSTEM.md) for development workflow.
```bash
# Clone and install
git clone https://github.com/runpod/flash.git
cd flash
pip install -e ".[dev]"
# Use conventional commits
git commit -m "feat: add new feature"
git commit -m "fix: resolve issue"
```
## Support
- [Discord](https://discord.gg/cUpRmau42V) - Community support
- [GitHub Issues](https://github.com/runpod/flash/issues) - Bug reports
## License
MIT License - see [LICENSE](LICENSE) for details.