https://github.com/run-house/kubetorch

Distribute and run AI workloads magically in Python, like PyTorch for ML infra.
https://github.com/run-house/kubetorch

api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless

Last synced: 9 months ago
JSON representation

Distribute and run AI workloads magically in Python, like PyTorch for ML infra.

Host: GitHub
URL: https://github.com/run-house/kubetorch
Owner: run-house
License: apache-2.0
Created: 2022-05-10T14:10:51.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2025-10-23T12:00:37.000Z (9 months ago)
Last Synced: 2025-10-24T09:56:25.129Z (9 months ago)
Topics: api, artificial-intelligence, aws, azure, collaboration, data-science, deployment, distributed, fastapi, gcp, infrastructure, machine-learning, middleware, observability, python, pytorch, ray, sagemaker, serverless
Language: Python
Homepage: https://run.house
Size: 30.8 MB
Stars: 1,061
Watchers: 7
Forks: 43
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 📦Kubetorch🔥

**A Python interface for running ML workloads on Kubernetes**

Kubetorch enables you to run any Python code on Kubernetes at any scale by specifying required resources, distribution, and scaling directly in code. It provides caching and hot redeployment for 1-2 second iteration cycles, handles hardware faults and preemptions programmatically, and orchestrates complex, heterogeneous workloads with built-in observability and fault tolerance.

## Hello World

```python

import kubetorch as kt

def hello_world():

    return "Hello from Kubetorch!"

if __name__ == "__main__":

    # Define your compute

    compute = kt.Compute(cpus=".1")

    # Send local function to freshly launched remote compute

    remote_hello = kt.fn(hello_world).to(compute)

    # Runs remotely on your Kubernetes cluster

    result = remote_hello()

    print(result)  # "Hello from Kubetorch!"

```

## What Kubetorch Enables

- **100x faster iteration** from 10+ minutes to 1-3 seconds for complex ML applications like RL and distributed training

- **50%+ compute cost savings** through intelligent resource allocation, bin-packing, and dynamic scaling

- **95% fewer production faults** with built-in fault handling with programmatic error recovery and resource adjustment

## Installation

### 1. Python Client

```bash

pip install "kubetorch[client]"

```

### 2. Kubernetes Deployment (Helm)

```bash

# Option 1: Install directly from OCI registry

helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \

  --version 0.2.2 -n kubetorch --create-namespace

# Option 2: Download chart locally first

helm pull oci://ghcr.io/run-house/charts/kubetorch --version 0.2.2 --untar

helm upgrade --install kubetorch ./kubetorch -n kubetorch --create-namespace

```

For detailed setup instructions, see our [Installation Guide](https://www.run.house/kubetorch/installation).

## Kubetorch Serverless

Contact us ([email](mailto:hello@run.house), [Slack](https://join.slack.com/t/kubetorch/shared_invite/zt-3g76q5i4j-uP60AdydxnAmjGVAQhtALA)) to try out Kubetorch on our fully managed cloud platform.

## Learn More

- **[Documentation](https://www.run.house/kubetorch/introduction)** - API Reference, concepts, and guides

- **[Examples](https://www.run.house/examples)** - Real-world usage patterns and tutorials

- **[Join our Slack](https://join.slack.com/t/kubetorch/shared_invite/zt-3g76q5i4j-uP60AdydxnAmjGVAQhtALA)** - Connect with the community and get support

---

[Apache 2.0 License](LICENSE)

**🏃‍♀️ Built by [Runhouse](https://www.run.house) 🏠**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/run-house/kubetorch

Awesome Lists containing this project

README