{"id":28352291,"url":"https://github.com/runpod/tetra-rp","last_synced_at":"2026-02-03T01:26:38.301Z","repository":{"id":285417837,"uuid":"954961206","full_name":"runpod/tetra-rp","owner":"runpod","description":"Application framework for Multimodal Distributed inference \u0026 Orchestration. ","archived":false,"fork":false,"pushed_at":"2026-01-29T15:27:48.000Z","size":3018,"stargazers_count":18,"open_issues_count":9,"forks_count":6,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-29T17:05:11.153Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/runpod.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-25T22:16:46.000Z","updated_at":"2026-01-29T01:41:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"ef4780a9-6253-4609-b9ae-2423c076a659","html_url":"https://github.com/runpod/tetra-rp","commit_stats":null,"previous_names":["runpod/tetra-rp"],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/runpod/tetra-rp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod%2Ftetra-rp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod%2Ftetra-rp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod%2Ftetra-rp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod%2Ftetra-rp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/runpod","download_url":"https://codeload.github.com/runpod/tetra-rp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runpod%2Ftetra-rp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28929864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-31T04:05:25.756Z","status":"ssl_error","status_checked_at":"2026-01-31T04:02:35.005Z","response_time":128,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-27T23:06:17.090Z","updated_at":"2026-02-03T01:26:38.292Z","avatar_url":"https://github.com/runpod.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flash: Serverless computing for AI workloads\n\nRunpod Flash is a Python SDK that streamlines the development and deployment of AI workflows on Runpod's [Serverless infrastructure](http://docs.runpod.io/serverless/overview). Write Python functions locally, and Flash handles the infrastructure, provisioning GPUs and CPUs, managing dependencies, and transferring data, allowing you to focus on building AI applications.\n\nYou can find a repository of prebuilt Flash examples at [runpod/flash-examples](https://github.com/runpod/flash-examples).\n\n\u003e [!Note]\n\u003e **New feature - Consolidated template management:** `PodTemplate` overrides now seamlessly integrate with `ServerlessResource` defaults, providing more consistent resource configuration and reducing deployment complexity.\n\n## Table of contents\n\n- [Overview](#overview)\n- [Get started](#get-started)\n- [Create Flash API endpoints](#create-flash-api-endpoints)\n- [Key concepts](#key-concepts)\n- [How it works](#how-it-works)\n- [Advanced features](#advanced-features)\n- [Configuration](#configuration)\n- [Workflow examples](#workflow-examples)\n- [Use cases](#use-cases)\n- [Limitations](#limitations)\n- [Contributing](#contributing)\n- [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThere are two basic modes for using Flash. You can:\n\n- Build and run standalone Python scripts using the `@remote` decorator.\n- Create Flash API endpoints with FastAPI (using the same script syntax).\n\nFollow the steps in the next section to install Flash and create your first script before learning how to [create Flash API endpoints](#create-flash-api-endpoints).\n\nTo learn more about how Flash works, see [Key concepts](#key-concepts).\n\n## Get started\n\nBefore you can use Flash, you'll need:\n\n- Python 3.9 (or higher) installed on your local machine.\n- A Runpod account with API key ([sign up here](https://runpod.io/console)).\n- Basic knowledge of Python and async programming.\n\n### Step 1: Install Flash\n\n```bash\npip install tetra_rp\n```\n\n### Step 2: Set your API key\n\nGenerate an API key from the [Runpod account settings](https://docs.runpod.io/get-started/api-keys) page and set it as an environment variable:\n\n```bash\nexport RUNPOD_API_KEY=[YOUR_API_KEY]\n```\n\nOr save it in a `.env` file in your project directory:\n\n```bash\necho \"RUNPOD_API_KEY=[YOUR_API_KEY]\" \u003e .env\n```\n\n### Step 3: Create your first Flash function\n\nAdd the following code to a new Python file:\n\n```python\nimport asyncio\nfrom tetra_rp import remote, LiveServerless\nfrom dotenv import load_dotenv\n\n# Uncomment if using a .env file\n# load_dotenv()\n\n# Configure GPU resources\ngpu_config = LiveServerless(name=\"flash-quickstart\")\n\n@remote(\n    resource_config=gpu_config,\n    dependencies=[\"torch\", \"numpy\"]\n)\ndef gpu_compute(data):\n    import torch\n    import numpy as np\n    \n    # This runs on a GPU in Runpod's cloud\n    tensor = torch.tensor(data, device=\"cuda\")\n    result = tensor.sum().item()\n    \n    return {\n        \"result\": result,\n        \"device\": torch.cuda.get_device_name(0)\n    }\n\nasync def main():\n    # This runs locally\n    result = await gpu_compute([1, 2, 3, 4, 5])\n    print(f\"Sum: {result['result']}\")\n    print(f\"Computed on: {result['device']}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nRun the example:\n\n```bash\npython your_script.py\n```\n\nThe first time you run the script, it will take significantly longer to process than successive runs (about one minute for first run vs. one second for future runs), as your endpoint must be initialized.\n\nWhen it's finished, you should see output similar to this:\n\n```bash\n2025-11-19 12:35:15,109 | INFO  | Created endpoint: rb50waqznmn2kg - flash-quickstart-fb\n2025-11-19 12:35:15,112 | INFO  | URL: https://console.runpod.io/serverless/user/endpoint/rb50waqznmn2kg\n2025-11-19 12:35:15,114 | INFO  | LiveServerless:rb50waqznmn2kg | API /run\n2025-11-19 12:35:15,655 | INFO  | LiveServerless:rb50waqznmn2kg | Started Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2\n2025-11-19 12:35:15,762 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: IN_QUEUE\n2025-11-19 12:35:16,301 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | .\n2025-11-19 12:35:17,756 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | ..\n2025-11-19 12:35:22,610 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | ...\n2025-11-19 12:35:37,163 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | ....\n2025-11-19 12:35:59,248 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | .....\n2025-11-19 12:36:09,983 | INFO  | Job:b0b341e7-e460-4305-9acd-fc2dfd1bd65c-u2 | Status: COMPLETED\n2025-11-19 12:36:10,068 | INFO  | Worker:icmkdgnrmdf8gz | Delay Time: 51842 ms\n2025-11-19 12:36:10,068 | INFO  | Worker:icmkdgnrmdf8gz | Execution Time: 1533 ms\n2025-11-19 17:36:07,485 | INFO  | Installing Python dependencies: ['torch', 'numpy']\nSum: 15\nComputed on: NVIDIA GeForce RTX 4090\n```\n\n## Create Flash API endpoints\n\n\u003e [!Note]\n\u003e **Flash API endpoints are currently only available for local testing:** Using `flash run` will start the API server on your local machine. Future updates will add the ability to build and deploy API servers for production deployments.\n\nYou can use Flash to deploy and serve API endpoints that compute responses using GPU and CPU Serverless workers. These endpoints will run scripts using the same Python remote decorators [demonstrated above](#get-started)\n\n### Step 1: Initialize a new project\n\nUse the `flash init` command to generate a structured project template with a preconfigured FastAPI application entry point.\n\nRun this command to initialize a new project directory:\n\n```bash\nflash init my_project\n```\n\nYou can also initialize your current directory:\n```\nflash init\n```\n\n### Step 2: Explore the project template\n\nThis is the structure of the project template created by `flash init`:\n\n```txt\nmy_project/\n├── main.py                    # FastAPI application entry point\n├── workers/\n│   ├── gpu/                   # GPU worker example\n│   │   ├── __init__.py        # FastAPI router\n│   │   └── endpoint.py        # GPU script @remote decorated function\n│   └── cpu/                   # CPU worker example\n│       ├── __init__.py        # FastAPI router\n│       └── endpoint.py        # CPU script with @remote decorated function\n├── .env               # Environment variable template\n├── .gitignore                 # Git ignore patterns\n├── .flashignore               # Flash deployment ignore patterns\n├── requirements.txt           # Python dependencies\n└── README.md                  # Project documentation\n```\n\nThis template includes:\n\n- A FastAPI application entry point and routers.\n- Templates for Python dependencies, `.env`, `.gitignore`, etc.\n- Flash scripts (`endpoint.py`) for both GPU and CPU workers, which include:\n    - Pre-configured worker scaling limits using the `LiveServerless()` object.\n    - A `@remote` decorated function that returns a response from a worker.\n\nWhen you start the FastAPI server, it creates API endpoints at `/gpu/hello` and `/cpu/hello`, which call the remote function described in their respective `endpoint.py` files.\n\n### Step 3: Install Python dependencies\n\nAfter initializing the project, navigate into the project directory:\n\n```bash\ncd my_project\n```\n\nInstall required dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n### Step 4: Configure your API key\n\nOpen the `.env` template file in a text editor and add your [Runpod API key](https://docs.runpod.io/get-started/api-keys):\n\n```bash\n# Use your text editor of choice, e.g.\ncursor .env\n```\n\nRemove the `#` symbol from the beginning of the `RUNPOD_API_KEY` line and replace `your_api_key_here` with your actual Runpod API key:\n\n```txt\nRUNPOD_API_KEY=your_api_key_here\n# FLASH_HOST=localhost\n# FLASH_PORT=8888\n# LOG_LEVEL=INFO\n```\n\nSave the file and close it.\n\n### Step 5: Start the local API server\n\nUse `flash run` to start the API server:\n\n```bash\nflash run\n```\n\nOpen a new terminal tab or window and test your GPU API using cURL:\n\n```bash\ncurl -X POST http://localhost:8888/gpu/hello \\\n    -H \"Content-Type: application/json\" \\\n    -d '{\"message\": \"Hello from the GPU!\"}'\n```\n\nIf you switch back to the terminal tab where you used `flash run`, you'll see the details of the job's progress.\n\n### Faster testing with auto-provisioning\n\nFor development with multiple endpoints, use `--auto-provision` to deploy all resources before testing:\n\n```bash\nflash run --auto-provision\n```\n\nThis eliminates cold-start delays by provisioning all serverless endpoints upfront. Endpoints are cached and reused across server restarts, making subsequent runs much faster. Resources are identified by name, so the same endpoint won't be re-deployed if configuration hasn't changed.\n\n### Step 6: Open the API explorer\n\nBesides starting the API server, `flash run` also starts an interactive API explorer. Point your web browser at [http://localhost:8888/docs](http://localhost:8888/docs) to explore the API.\n\nTo run remote functions in the explorer:\n\n1. Expand one of the functions under **GPU Workers** or **CPU Workers**.\n2. Click **Try it out** and then **Execute**\n\nYou'll get a response from your workers right in the explorer.\n\n### Step 7: Customize your API\n\nTo customize your API endpoint and functionality:\n\n1. Add/edit remote functions in your `endpoint.py` files.\n2. Test the scripts individually by running `python endpoint.py`.\n3. Configure your FastAPI routers by editing the `__init__.py` files.\n4. Add any new endpoints to your `main.py` file.\n\n## Key concepts\n\n### Remote functions\n\nThe Flash `@remote` decorator marks functions for execution on Runpod's infrastructure. Everything inside the decorated function runs remotely, while code outside runs locally.\n\n```python\n@remote(resource_config=config, dependencies=[\"pandas\"])\ndef process_data(data):\n    # This code runs remotely\n    import pandas as pd\n    df = pd.DataFrame(data)\n    return df.describe().to_dict()\n\nasync def main():\n    # This code runs locally\n    result = await process_data(my_data)\n```\n\n### Resource configuration\n\nFlash provides fine-grained control over hardware allocation through configuration objects:\n\n```python\nfrom tetra_rp import LiveServerless, GpuGroup, CpuInstanceType, PodTemplate\n\n# GPU configuration\ngpu_config = LiveServerless(\n    name=\"ml-inference\",\n    gpus=[GpuGroup.AMPERE_80],  # A100 80GB\n    workersMax=5,\n    template=PodTemplate(containerDiskInGb=100)  # Extra disk space\n)\n\n# CPU configuration\ncpu_config = LiveServerless(\n    name=\"data-processor\",\n    instanceIds=[CpuInstanceType.CPU5C_4_16],  # 4 vCPU, 16GB RAM\n    workersMax=3\n)\n```\n\n### Dependency management\n\nSpecify Python packages in the decorator, and Flash installs them automatically:\n\n```python\n@remote(\n    resource_config=gpu_config,\n    dependencies=[\"transformers==4.36.0\", \"torch\", \"pillow\"]\n)\ndef generate_image(prompt):\n    # Import inside the function\n    from transformers import pipeline\n    import torch\n    from PIL import Image\n    \n    # Your code here\n```\n\n### Parallel execution\n\nRun multiple remote functions concurrently using Python's async capabilities:\n\n```python\n# Process multiple items in parallel\nresults = await asyncio.gather(\n    process_item(item1),\n    process_item(item2),\n    process_item(item3)\n)\n```\n\n### Load-Balanced Endpoints with HTTP Routing\n\nFor API endpoints requiring low-latency HTTP access with direct routing, use load-balanced endpoints:\n\n```python\nfrom tetra_rp import LiveLoadBalancer, remote\n\napi = LiveLoadBalancer(name=\"api-service\")\n\n@remote(api, method=\"POST\", path=\"/api/process\")\nasync def process_data(x: int, y: int):\n    return {\"result\": x + y}\n\n@remote(api, method=\"GET\", path=\"/api/health\")\ndef health_check():\n    return {\"status\": \"ok\"}\n\n# Call functions directly\nresult = await process_data(5, 3)  # → {\"result\": 8}\n```\n\n**Key differences from queue-based endpoints:**\n- **Direct HTTP routing** - Requests routed directly to workers, no queue\n- **Lower latency** - No queuing overhead\n- **Custom HTTP methods** - GET, POST, PUT, DELETE, PATCH support\n- **No automatic retries** - Users handle errors directly\n\nLoad-balanced endpoints are ideal for REST APIs, webhooks, and real-time services. Queue-based endpoints are better for batch processing and fault-tolerant workflows.\n\nFor detailed information:\n- **User guide:** [Using @remote with Load-Balanced Endpoints](docs/Using_Remote_With_LoadBalancer.md)\n- **Runtime architecture:** [LoadBalancer Runtime Architecture](docs/LoadBalancer_Runtime_Architecture.md) - details on deployment, request flows, and execution\n\n## How it works\n\nFlash orchestrates workflow execution through a sophisticated multi-step process:\n\n1. **Function identification**: The `@remote` decorator marks functions for remote execution, enabling Flash to distinguish between local and remote operations.\n2. **Dependency analysis**: Flash automatically analyzes function dependencies to construct an optimal execution order, ensuring data flows correctly between sequential and parallel operations.\n3. **Resource provisioning and execution**: For each remote function, Flash:\n   - Dynamically provisions endpoint and worker resources on Runpod's infrastructure.\n   - Serializes and securely transfers input data to the remote worker.\n   - Executes the function on the remote infrastructure with the specified GPU or CPU resources.\n   - Returns results to your local environment for further processing.\n4. **Data orchestration**: Results flow seamlessly between functions according to your local Python code structure, maintaining the same programming model whether functions run locally or remotely.\n\n\n## Advanced features\n\n### Custom Docker images\n\n`LiveServerless` resources use a fixed Docker image that's optimized for Flash runtime, and supports full remote code execution. For specialized environments that require a custom Docker image, use `ServerlessEndpoint` or `CpuServerlessEndpoint`:\n\n```python\nfrom tetra_rp import ServerlessEndpoint\n\ncustom_gpu = ServerlessEndpoint(\n    name=\"custom-ml-env\",\n    imageName=\"pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime\",\n    gpus=[GpuGroup.AMPERE_80]\n)\n```\n\nUnlike `LiveServerless`, these endpoints only support dictionary payloads in the form of `{\"input\": {...}}` (similar to a traditional [Serverless endpoint request](https://docs.runpod.io/serverless/endpoints/send-requests)), and cannot execute arbitrary Python functions remotely.\n\n### Persistent storage with network volumes\n\nAttach [network volumes](https://docs.runpod.io/storage/network-volumes) for persistent storage across workers and endpoints:\n\n```python\nconfig = LiveServerless(\n    name=\"model-server\",\n    networkVolumeId=\"vol_abc123\",  # Your volume ID\n    template=PodTemplate(containerDiskInGb=100)\n)\n```\n\n### Environment variables\n\nPass configuration to remote functions:\n\n```python\nconfig = LiveServerless(\n    name=\"api-worker\",\n    env={\"HF_TOKEN\": \"your_token\", \"MODEL_ID\": \"gpt2\"}\n)\n```\n\nEnvironment variables are excluded from configuration hashing, which means changing environment values won't trigger endpoint recreation. This allows different processes to load environment variables from `.env` files without causing false drift detection. Only structural changes (like GPU type, image, or template modifications) trigger endpoint updates.\n\n### Build Process\n\nFlash uses a sophisticated build process to package your application for deployment.\n\n#### How Flash Builds Your Application\n\nWhen you run `flash build`, the following happens:\n\n1. **Discovery**: Flash scans your code for `@remote` decorated functions\n2. **Grouping**: Functions are grouped by their `resource_config`\n3. **Manifest Creation**: A `flash_manifest.json` file maps functions to their endpoints\n4. **Dependency Installation**: Python packages are installed with Linux x86_64 compatibility\n5. **Packaging**: Everything is bundled into `artifact.tar.gz` for deployment\n\n#### Cross-Platform Builds\n\nFlash automatically handles cross-platform builds, ensuring your deployments work correctly regardless of your development platform:\n\n- **Automatic Platform Targeting**: Dependencies are installed for Linux x86_64 (RunPod's serverless platform), even when building on macOS or Windows\n- **Python Version Matching**: The build uses your current Python version to ensure package compatibility\n- **Binary Wheel Enforcement**: Only pre-built binary wheels are used, preventing platform-specific compilation issues\n\nThis means you can build on macOS ARM64, Windows, or any other platform, and the resulting package will run correctly on RunPod serverless.\n\n#### Cross-Endpoint Function Calls\n\nFlash enables functions on different endpoints to call each other. The runtime automatically discovers endpoints using the manifest and routes calls appropriately:\n\n```python\n# CPU endpoint function\n@remote(resource_config=cpu_config)\ndef preprocess(data):\n    return clean_data\n\n# GPU endpoint function\n@remote(resource_config=gpu_config)\nasync def inference(data):\n    # Can call CPU endpoint function\n    clean = preprocess(data)\n    return result\n```\n\nThe runtime wrapper handles service discovery and routing automatically.\n\n#### Build Artifacts\n\nAfter `flash build` completes:\n- `.flash/.build/`: Temporary build directory (removed unless `--keep-build`)\n- `.flash/artifact.tar.gz`: Deployment package\n- `.flash/flash_manifest.json`: Service discovery configuration\n\nFor information on load-balanced endpoints (required for Mothership and HTTP services), see [docs/Load_Balancer_Endpoints.md](docs/Load_Balancer_Endpoints.md).\n\n#### Troubleshooting Build Issues\n\n**No @remote functions found:**\n- Ensure your functions are decorated with `@remote(resource_config)`\n- Check that Python files are not excluded by `.gitignore` or `.flashignore`\n- Verify function decorators have valid syntax\n\n**Build succeeded but deployment failed:**\n- Verify all function imports work in the deployment environment\n- Check that environment variables required by your functions are available\n- Review the generated `flash_manifest.json` for correct function mappings\n\n**Dependency installation failed:**\n- If a package doesn't have pre-built Linux x86_64 wheels, the build will fail with an error\n- For newer Python versions (3.13+), some packages may require manylinux_2_27 or higher\n- Ensure you have standard pip installed (`python -m ensurepip --upgrade`) for best compatibility\n- uv pip has known issues with newer manylinux tags - standard pip is recommended\n- Check PyPI to verify the package supports your Python version on Linux\n\n#### Managing Bundle Size\n\nRunPod serverless has a **500MB deployment limit**. Exceeding this limit will cause deployment failures.\n\nUse `--exclude` to skip packages already in your worker-tetra Docker image:\n\n```bash\n# For GPU deployments (PyTorch pre-installed)\nflash build --exclude torch,torchvision,torchaudio\n\n# Check your resource config to determine which base image you're using\n```\n\n**Which packages to exclude depends on your resource config:**\n- **GPU resources** → PyTorch images have torch/torchvision/torchaudio pre-installed\n- **CPU resources** → Python slim images have NO ML frameworks pre-installed\n- **Load-balanced** → Same as above, depends on GPU vs CPU variant\n\nSee [worker-tetra](https://github.com/runpod-workers/worker-tetra) for base image details.\n\n## Configuration\n\n### GPU configuration parameters\n\nThe following parameters can be used with `LiveServerless` (full remote code execution) and `ServerlessEndpoint` (dictionary payload only) to configure your Runpod GPU endpoints:\n\n| Parameter          | Description                                     | Default       | Example Values                      |\n|--------------------|-------------------------------------------------|---------------|-------------------------------------|\n| `name`             | (Required) Name for your endpoint               | `\"\"`          | `\"stable-diffusion-server\"`         |\n| `gpus`             | GPU pool IDs that can be used by workers        | `[GpuGroup.ANY]` | `[GpuGroup.ADA_24]` for RTX 4090 |\n| `gpuCount`         | Number of GPUs per worker                       | 1             | 1, 2, 4                             |\n| `workersMin`       | Minimum number of workers                       | 0             | Set to 1 for persistence            |\n| `workersMax`       | Maximum number of workers                       | 3             | Higher for more concurrency         |\n| `idleTimeout`      | Minutes before scaling down                     | 5             | 10, 30, 60                          |\n| `env`              | Environment variables                           | `None`        | `{\"HF_TOKEN\": \"xyz\"}`               |\n| `networkVolumeId`  | Persistent storage ID                           | `None`        | `\"vol_abc123\"`                      |\n| `executionTimeoutMs`| Max execution time (ms)                        | 0 (no limit)  | 600000 (10 min)                     |\n| `scalerType`       | Scaling strategy                                | `QUEUE_DELAY` | `REQUEST_COUNT`                     |\n| `scalerValue`      | Scaling parameter value                         | 4             | 1-10 range typical                  |\n| `locations`        | Preferred datacenter locations                  | `None`        | `\"us-east,eu-central\"`              |\n| `imageName`        | Custom Docker image (`ServerlessEndpoint` only)   | Fixed for LiveServerless | `\"pytorch/pytorch:latest\"`, `\"my-registry/custom:v1.0\"` |\n\n### CPU configuration parameters\n\nThe same GPU configuration parameters above apply to `LiveServerless` (full remote code execution) and `CpuServerlessEndpoint` (dictionary payload only), with these additional CPU-specific parameters:\n\n| Parameter          | Description                                     | Default       | Example Values                      |\n|--------------------|-------------------------------------------------|---------------|-------------------------------------|\n| `instanceIds`      | CPU Instance Types (forces a CPU endpoint type) | `None`        | `[CpuInstanceType.CPU5C_2_4]`       |\n| `imageName`        | Custom Docker image (`CpuServerlessEndpoint` only) | Fixed for `LiveServerless` | `\"python:3.11-slim\"`, `\"my-registry/custom:v1.0\"` |\n\n### Resource class comparison\n\n| Feature | LiveServerless | ServerlessEndpoint | CpuServerlessEndpoint |\n|---------|----------------|-------------------|----------------------|\n| **Remote code execution** | ✅ Full Python function execution | ❌ Dictionary payload only | ❌ Dictionary payload only |\n| **Custom Docker images** | ❌ Fixed optimized images | ✅ Any Docker image | ✅ Any Docker image |\n| **Use case** | Dynamic remote functions | Traditional API endpoints | Traditional CPU endpoints |\n| **Function returns** | Any Python object | Dictionary only | Dictionary only |\n| **@remote decorator** | Full functionality | Limited to payload passing | Limited to payload passing |\n\n### Available GPU types\n\nSome common GPU groups available through `GpuGroup`:\n\n- `GpuGroup.ANY` - Any available GPU (default)\n- `GpuGroup.ADA_24` - NVIDIA GeForce RTX 4090\n- `GpuGroup.AMPERE_80` - NVIDIA A100 80GB\n- `GpuGroup.AMPERE_48` - NVIDIA A40, RTX A6000\n- `GpuGroup.AMPERE_24` - NVIDIA RTX A5000, L4, RTX 3090\n\n\n### Available CPU instance types\n\n- `CpuInstanceType.CPU3G_1_4` - (cpu3g-1-4) 3rd gen general purpose, 1 vCPU, 4GB RAM\n- `CpuInstanceType.CPU3G_2_8` - (cpu3g-2-8) 3rd gen general purpose, 2 vCPU, 8GB RAM\n- `CpuInstanceType.CPU3G_4_16` - (cpu3g-4-16) 3rd gen general purpose, 4 vCPU, 16GB RAM\n- `CpuInstanceType.CPU3G_8_32` - (cpu3g-8-32) 3rd gen general purpose, 8 vCPU, 32GB RAM\n- `CpuInstanceType.CPU3C_1_2` - (cpu3c-1-2) 3rd gen compute-optimized, 1 vCPU, 2GB RAM\n- `CpuInstanceType.CPU3C_2_4` - (cpu3c-2-4) 3rd gen compute-optimized, 2 vCPU, 4GB RAM\n- `CpuInstanceType.CPU3C_4_8` - (cpu3c-4-8) 3rd gen compute-optimized, 4 vCPU, 8GB RAM\n- `CpuInstanceType.CPU3C_8_16` - (cpu3c-8-16) 3rd gen compute-optimized, 8 vCPU, 16GB RAM\n- `CpuInstanceType.CPU5C_1_2` - (cpu5c-1-2) 5th gen compute-optimized, 1 vCPU, 2GB RAM\n- `CpuInstanceType.CPU5C_2_4` - (cpu5c-2-4) 5th gen compute-optimized, 2 vCPU, 4GB RAM\n- `CpuInstanceType.CPU5C_4_8` - (cpu5c-4-8) 5th gen compute-optimized, 4 vCPU, 8GB RAM\n- `CpuInstanceType.CPU5C_8_16` - (cpu5c-8-16) 5th gen compute-optimized, 8 vCPU, 16GB RAM\n\n## Workflow examples\n\n### Basic GPU workflow\n\n```python\nimport asyncio\nfrom tetra_rp import remote, LiveServerless\n\n# Simple GPU configuration\ngpu_config = LiveServerless(name=\"example-gpu-server\")\n\n@remote(\n    resource_config=gpu_config,\n    dependencies=[\"torch\", \"numpy\"]\n)\ndef gpu_compute(data):\n    import torch\n    import numpy as np\n    \n    # Convert to tensor and perform computation on GPU\n    tensor = torch.tensor(data, device=\"cuda\")\n    result = tensor.sum().item()\n    \n    # Get GPU info\n    gpu_info = torch.cuda.get_device_properties(0)\n    \n    return {\n        \"result\": result,\n        \"gpu_name\": gpu_info.name,\n        \"cuda_version\": torch.version.cuda\n    }\n\nasync def main():\n    result = await gpu_compute([1, 2, 3, 4, 5])\n    print(f\"Result: {result['result']}\")\n    print(f\"Computed on: {result['gpu_name']} with CUDA {result['cuda_version']}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Advanced GPU workflow with template configuration\n\n```python\nimport asyncio\nfrom tetra_rp import remote, LiveServerless, GpuGroup, PodTemplate\nimport base64\n\n# Advanced GPU configuration with consolidated template overrides\nsd_config = LiveServerless(\n    gpus=[GpuGroup.AMPERE_80],  # A100 80GB GPUs\n    name=\"example_image_gen_server\",\n    template=PodTemplate(containerDiskInGb=100),  # Large disk for models\n    workersMax=3,\n    idleTimeout=10\n)\n\n@remote(\n    resource_config=sd_config,\n    dependencies=[\"diffusers\", \"transformers\", \"torch\", \"accelerate\", \"safetensors\"]\n)\ndef generate_image(prompt, width=512, height=512):\n    import torch\n    from diffusers import StableDiffusionPipeline\n    import io\n    import base64\n    \n    # Load pipeline (benefits from large container disk)\n    pipeline = StableDiffusionPipeline.from_pretrained(\n        \"runwayml/stable-diffusion-v1-5\",\n        torch_dtype=torch.float16\n    )\n    pipeline = pipeline.to(\"cuda\")\n    \n    # Generate image\n    image = pipeline(prompt=prompt, width=width, height=height).images[0]\n    \n    # Convert to base64 for return\n    buffered = io.BytesIO()\n    image.save(buffered, format=\"PNG\")\n    img_str = base64.b64encode(buffered.getvalue()).decode()\n    \n    return {\"image\": img_str, \"prompt\": prompt}\n\nasync def main():\n    result = await generate_image(\"A serene mountain landscape at sunset\")\n    print(f\"Generated image for: {result['prompt']}\")\n    # Save image locally if needed\n    # img_data = base64.b64decode(result[\"image\"])\n    # with open(\"output.png\", \"wb\") as f:\n    #     f.write(img_data)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Basic CPU workflow\n\n```python\nimport asyncio\nfrom tetra_rp import remote, LiveServerless, CpuInstanceType\n\n# Simple CPU configuration\ncpu_config = LiveServerless(\n    name=\"example-cpu-server\",\n    instanceIds=[CpuInstanceType.CPU5G_2_8],  # 2 vCPU, 8GB RAM\n)\n\n@remote(\n    resource_config=cpu_config,\n    dependencies=[\"pandas\", \"numpy\"]\n)\ndef cpu_data_processing(data):\n    import pandas as pd\n    import numpy as np\n    import platform\n    \n    # Process data using CPU\n    df = pd.DataFrame(data)\n    \n    return {\n        \"row_count\": len(df),\n        \"column_count\": len(df.columns) if not df.empty else 0,\n        \"mean_values\": df.select_dtypes(include=[np.number]).mean().to_dict(),\n        \"system_info\": platform.processor(),\n        \"platform\": platform.platform()\n    }\n\nasync def main():\n    sample_data = [\n        {\"name\": \"Alice\", \"age\": 30, \"score\": 85},\n        {\"name\": \"Bob\", \"age\": 25, \"score\": 92},\n        {\"name\": \"Charlie\", \"age\": 35, \"score\": 78}\n    ]\n    \n    result = await cpu_data_processing(sample_data)\n    print(f\"Processed {result['row_count']} rows on {result['platform']}\")\n    print(f\"Mean values: {result['mean_values']}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Advanced CPU workflow with template configuration\n\n```python\nimport asyncio\nimport base64\nfrom tetra_rp import remote, LiveServerless, CpuInstanceType, PodTemplate\n\n# Advanced CPU configuration with template overrides\ndata_processing_config = LiveServerless(\n    name=\"advanced-cpu-processor\",\n    instanceIds=[CpuInstanceType.CPU5C_4_16, CpuInstanceType.CPU3C_4_8],  # Fallback options\n    template=PodTemplate(\n        containerDiskInGb=20,  # Extra disk space for data processing\n        env=[{\"key\": \"PYTHONPATH\", \"value\": \"/workspace\"}]  # Custom environment\n    ),\n    workersMax=5,\n    idleTimeout=15,\n    env={\"PROCESSING_MODE\": \"batch\", \"DEBUG\": \"false\"}  # Additional env vars\n)\n\n@remote(\n    resource_config=data_processing_config,\n    dependencies=[\"pandas\", \"numpy\", \"scipy\", \"scikit-learn\"]\n)\ndef advanced_data_analysis(dataset, analysis_type=\"full\"):\n    import pandas as pd\n    import numpy as np\n    from sklearn.preprocessing import StandardScaler\n    from sklearn.decomposition import PCA\n    import platform\n    \n    # Create DataFrame\n    df = pd.DataFrame(dataset)\n    \n    # Perform analysis based on type\n    results = {\n        \"platform\": platform.platform(),\n        \"dataset_shape\": df.shape,\n        \"memory_usage\": df.memory_usage(deep=True).sum()\n    }\n    \n    if analysis_type == \"full\":\n        # Advanced statistical analysis\n        numeric_cols = df.select_dtypes(include=[np.number]).columns\n        if len(numeric_cols) \u003e 0:\n            # Standardize data\n            scaler = StandardScaler()\n            scaled_data = scaler.fit_transform(df[numeric_cols])\n            \n            # PCA analysis\n            pca = PCA(n_components=min(len(numeric_cols), 3))\n            pca_result = pca.fit_transform(scaled_data)\n            \n            results.update({\n                \"correlation_matrix\": df[numeric_cols].corr().to_dict(),\n                \"pca_explained_variance\": pca.explained_variance_ratio_.tolist(),\n                \"pca_shape\": pca_result.shape\n            })\n    \n    return results\n\nasync def main():\n    # Generate sample dataset\n    sample_data = [\n        {\"feature1\": np.random.randn(), \"feature2\": np.random.randn(), \n         \"feature3\": np.random.randn(), \"category\": f\"cat_{i%3}\"}\n        for i in range(1000)\n    ]\n    \n    result = await advanced_data_analysis(sample_data, \"full\")\n    print(f\"Processed dataset with shape: {result['dataset_shape']}\")\n    print(f\"Memory usage: {result['memory_usage']} bytes\")\n    print(f\"PCA explained variance: {result.get('pca_explained_variance', 'N/A')}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Hybrid GPU/CPU workflow\n\n```python\nimport asyncio\nfrom tetra_rp import remote, LiveServerless, GpuGroup, CpuInstanceType, PodTemplate\n\n# GPU configuration for model inference\ngpu_config = LiveServerless(\n    name=\"ml-inference-gpu\",\n    gpus=[GpuGroup.AMPERE_24],  # RTX 3090/A5000\n    template=PodTemplate(containerDiskInGb=50),  # Space for models\n    workersMax=2\n)\n\n# CPU configuration for data preprocessing\ncpu_config = LiveServerless(\n    name=\"data-preprocessor\",\n    instanceIds=[CpuInstanceType.CPU5C_4_16],  # 4 vCPU, 16GB RAM\n    template=PodTemplate(\n        containerDiskInGb=30,\n        env=[{\"key\": \"NUMPY_NUM_THREADS\", \"value\": \"4\"}]\n    ),\n    workersMax=3\n)\n\n@remote(\n    resource_config=cpu_config,\n    dependencies=[\"pandas\", \"numpy\", \"scikit-learn\"]\n)\ndef preprocess_data(raw_data):\n    import pandas as pd\n    import numpy as np\n    from sklearn.preprocessing import StandardScaler\n    \n    # Data cleaning and preprocessing\n    df = pd.DataFrame(raw_data)\n    \n    # Handle missing values\n    df = df.fillna(df.mean(numeric_only=True))\n    \n    # Normalize numeric features\n    numeric_cols = df.select_dtypes(include=[np.number]).columns\n    if len(numeric_cols) \u003e 0:\n        scaler = StandardScaler()\n        df[numeric_cols] = scaler.fit_transform(df[numeric_cols])\n    \n    return {\n        \"processed_data\": df.to_dict('records'),\n        \"shape\": df.shape,\n        \"columns\": list(df.columns)\n    }\n\n@remote(\n    resource_config=gpu_config,\n    dependencies=[\"torch\", \"transformers\", \"numpy\"]\n)\ndef run_inference(processed_data):\n    import torch\n    import numpy as np\n    \n    # Simulate ML model inference on GPU\n    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n    \n    # Convert to tensor\n    data_array = np.array([list(item.values()) for item in processed_data[\"processed_data\"]])\n    tensor = torch.tensor(data_array, dtype=torch.float32).to(device)\n    \n    # Simple neural network simulation\n    with torch.no_grad():\n        # Simulate model computation\n        result = torch.nn.functional.softmax(tensor.mean(dim=1), dim=0)\n        predictions = result.cpu().numpy().tolist()\n    \n    return {\n        \"predictions\": predictions,\n        \"device_used\": str(device),\n        \"input_shape\": tensor.shape\n    }\n\nasync def ml_pipeline(raw_dataset):\n    \"\"\"Complete ML pipeline: CPU preprocessing -\u003e GPU inference\"\"\"\n    print(\"Step 1: Preprocessing data on CPU...\")\n    preprocessed = await preprocess_data(raw_dataset)\n    print(f\"Preprocessed data shape: {preprocessed['shape']}\")\n    \n    print(\"Step 2: Running inference on GPU...\")\n    results = await run_inference(preprocessed)\n    print(f\"Inference completed on: {results['device_used']}\")\n    \n    return {\n        \"preprocessing\": preprocessed,\n        \"inference\": results\n    }\n\nasync def main():\n    # Sample dataset\n    raw_data = [\n        {\"feature1\": np.random.randn(), \"feature2\": np.random.randn(), \n         \"feature3\": np.random.randn(), \"label\": i % 2}\n        for i in range(100)\n    ]\n    \n    # Run the complete pipeline\n    results = await ml_pipeline(raw_data)\n    \n    print(\"\\nPipeline Results:\")\n    print(f\"Data processed: {results['preprocessing']['shape']}\")\n    print(f\"Predictions generated: {len(results['inference']['predictions'])}\")\n    print(f\"GPU device: {results['inference']['device_used']}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n### Multi-stage ML pipeline example\n\n```python\nimport os\nimport asyncio\nfrom tetra_rp import remote, LiveServerless\n\n# Configure Runpod resources\nrunpod_config = LiveServerless(name=\"multi-stage-pipeline-server\")\n\n# Feature extraction on GPU\n@remote(\n    resource_config=runpod_config,\n    dependencies=[\"torch\", \"transformers\"]\n)\ndef extract_features(texts):\n    import torch\n    from transformers import AutoTokenizer, AutoModel\n    \n    tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\")\n    model = AutoModel.from_pretrained(\"bert-base-uncased\")\n    model.to(\"cuda\")\n    \n    features = []\n    for text in texts:\n        inputs = tokenizer(text, return_tensors=\"pt\").to(\"cuda\")\n        with torch.no_grad():\n            outputs = model(**inputs)\n        features.append(outputs.last_hidden_state[:, 0].cpu().numpy().tolist()[0])\n    \n    return features\n\n# Classification on GPU\n@remote(\n    resource_config=runpod_config,\n    dependencies=[\"torch\", \"sklearn\"]\n)\ndef classify(features, labels=None):\n    import torch\n    import numpy as np\n    from sklearn.linear_model import LogisticRegression\n    \n    features_np = np.array(features[1:] if labels is None and isinstance(features, list) and len(features)\u003e0 and isinstance(features[0], dict) else features)\n    \n    if labels is not None:\n        labels_np = np.array(labels)\n        classifier = LogisticRegression()\n        classifier.fit(features_np, labels_np)\n        \n        coefficients = {\n            \"coef\": classifier.coef_.tolist(),\n            \"intercept\": classifier.intercept_.tolist(),\n            \"classes\": classifier.classes_.tolist()\n        }\n        return coefficients\n    else:\n        coefficients = features[0]\n        \n        classifier = LogisticRegression()\n        classifier.coef_ = np.array(coefficients[\"coef\"])\n        classifier.intercept_ = np.array(coefficients[\"intercept\"])\n        classifier.classes_ = np.array(coefficients[\"classes\"])\n        \n        # Predict\n        predictions = classifier.predict(features_np)\n        probabilities = classifier.predict_proba(features_np)\n        \n        return {\n            \"predictions\": predictions.tolist(),\n            \"probabilities\": probabilities.tolist()\n        }\n\n# Complete pipeline\nasync def text_classification_pipeline(train_texts, train_labels, test_texts):\n    train_features = await extract_features(train_texts)\n    test_features = await extract_features(test_texts)\n    \n    model_coeffs = await classify(train_features, train_labels)\n    \n    # For inference, pass model coefficients along with test features\n    # The classify function expects a list where the first element is the model (coeffs)\n    # and subsequent elements are features for prediction.\n    predictions = await classify([model_coeffs] + test_features)\n    \n    return predictions\n```\n\n### More examples\n\nYou can find many more examples in the [flash-examples repository](https://github.com/runpod/flash-examples).\n\n## Use cases\n\nFlash is well-suited for a diverse range of AI and data processing workloads:\n\n- **Multi-modal AI pipelines**: Orchestrate unified workflows combining text, image, and audio models with GPU acceleration.\n- **Distributed model training**: Scale training operations across multiple GPU workers for faster model development.\n- **AI research experimentation**: Rapidly prototype and test complex model combinations without infrastructure overhead.\n- **Production inference systems**: Deploy sophisticated multi-stage inference pipelines for real-world applications.\n- **Data processing workflows**: Efficiently process large datasets using CPU workers for general computation and GPU workers for accelerated tasks.\n- **Hybrid GPU/CPU workflows**: Optimize cost and performance by combining CPU preprocessing with GPU inference.\n\n## Limitations\n\n- Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter.\n- Flash is designed primarily for local development and live-testing workflows.\n- While Flash supports provisioning traditional Serverless endpoints (non-Live endpoints), the interface for interacting with these resources will change in upcoming releases. For now, focus on using `LiveServerless` for the most stable development experience, as it provides full remote code execution without requiring custom Docker images.\n- As you work through the Flash examples repository, you'll accumulate multiple endpoints in your Runpod account. These endpoints persist until manually deleted through the Runpod console. A `flash undeploy` command is in development to streamline cleanup, but for now, regular manual deletion of unused endpoints is recommended to avoid unnecessary charges.\n- Finally, be aware of your account's maximum worker capacity limits. Flash can rapidly scale workers across multiple endpoints, and you may hit capacity constraints faster than with traditional deployment patterns. If you find yourself consistently reaching worker limits, contact Runpod support to increase your account's capacity allocation.\n\n## Contributing\n\nWe welcome contributions to Flash! Whether you're fixing bugs, adding features, or improving documentation, your help makes this project better.\n\n### Development setup\n\n1. Fork and clone the repository.\n2. Set up your development environment following the project guidelines.\n3. Make your changes following our coding standards.\n4. Test your changes thoroughly.\n5. Submit a pull request.\n\n### Release process\n\nThis project uses an automated release system built on Release Please. For detailed information about how releases work, including conventional commits, versioning, and the CI/CD pipeline, see our [Release System Documentation](RELEASE_SYSTEM.md).\n\n**Quick reference for contributors:**\n- Use conventional commits: `feat:`, `fix:`, `docs:`, etc.\n- CI automatically runs quality checks on all PRs.\n- Release PRs are created automatically when changes are merged to main.\n- Releases are published to PyPI automatically when release PRs are merged.\n\n## Troubleshooting\n\n### Authentication errors\n\nVerify your API key is set correctly:\n\n```bash\necho $RUNPOD_API_KEY  # Should show your key\n```\n\n### Import errors in remote functions\n\nRemember to import packages inside remote functions:\n\n```python\n@remote(dependencies=[\"requests\"])\ndef fetch_data(url):\n    import requests  # Import here, not at top of file\n    return requests.get(url).json()\n```\n\n### Performance optimization\n\n- Set `workersMin=1` to keep workers warm and avoid cold starts.\n- Use `idleTimeout` to balance cost and responsiveness.\n- Choose appropriate GPU types for your workload.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/runpod/tetra-rp\"\u003eFlash\u003c/a\u003e •\n  \u003ca href=\"https://runpod.io\"\u003eRunpod\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunpod%2Ftetra-rp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frunpod%2Ftetra-rp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunpod%2Ftetra-rp/lists"}