https://github.com/run-house/kubetorch
Distribute and run AI workloads magically in Python, like PyTorch for ML infra.
https://github.com/run-house/kubetorch
api artificial-intelligence aws azure collaboration data-science deployment distributed fastapi gcp infrastructure machine-learning middleware observability python pytorch ray sagemaker serverless
Last synced: 7 months ago
JSON representation
Distribute and run AI workloads magically in Python, like PyTorch for ML infra.
- Host: GitHub
- URL: https://github.com/run-house/kubetorch
- Owner: run-house
- License: apache-2.0
- Created: 2022-05-10T14:10:51.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2025-10-23T12:00:37.000Z (7 months ago)
- Last Synced: 2025-10-24T09:56:25.129Z (7 months ago)
- Topics: api, artificial-intelligence, aws, azure, collaboration, data-science, deployment, distributed, fastapi, gcp, infrastructure, machine-learning, middleware, observability, python, pytorch, ray, sagemaker, serverless
- Language: Python
- Homepage: https://run.house
- Size: 30.8 MB
- Stars: 1,061
- Watchers: 7
- Forks: 43
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 📦Kubetorch🔥
**A Python interface for running ML workloads on Kubernetes**
Kubetorch enables you to run any Python code on Kubernetes at any scale by specifying required resources, distribution, and scaling directly in code. It provides caching and hot redeployment for 1-2 second iteration cycles, handles hardware faults and preemptions programmatically, and orchestrates complex, heterogeneous workloads with built-in observability and fault tolerance.
## Hello World
```python
import kubetorch as kt
def hello_world():
return "Hello from Kubetorch!"
if __name__ == "__main__":
# Define your compute
compute = kt.Compute(cpus=".1")
# Send local function to freshly launched remote compute
remote_hello = kt.fn(hello_world).to(compute)
# Runs remotely on your Kubernetes cluster
result = remote_hello()
print(result) # "Hello from Kubetorch!"
```
## What Kubetorch Enables
- **100x faster iteration** from 10+ minutes to 1-3 seconds for complex ML applications like RL and distributed training
- **50%+ compute cost savings** through intelligent resource allocation, bin-packing, and dynamic scaling
- **95% fewer production faults** with built-in fault handling with programmatic error recovery and resource adjustment
## Installation
### 1. Python Client
```bash
pip install "kubetorch[client]"
```
### 2. Kubernetes Deployment (Helm)
```bash
# Option 1: Install directly from OCI registry
helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \
--version 0.2.2 -n kubetorch --create-namespace
# Option 2: Download chart locally first
helm pull oci://ghcr.io/run-house/charts/kubetorch --version 0.2.2 --untar
helm upgrade --install kubetorch ./kubetorch -n kubetorch --create-namespace
```
For detailed setup instructions, see our [Installation Guide](https://www.run.house/kubetorch/installation).
## Kubetorch Serverless
Contact us ([email](mailto:hello@run.house), [Slack](https://join.slack.com/t/kubetorch/shared_invite/zt-3g76q5i4j-uP60AdydxnAmjGVAQhtALA)) to try out Kubetorch on our fully managed cloud platform.
## Learn More
- **[Documentation](https://www.run.house/kubetorch/introduction)** - API Reference, concepts, and guides
- **[Examples](https://www.run.house/examples)** - Real-world usage patterns and tutorials
- **[Join our Slack](https://join.slack.com/t/kubetorch/shared_invite/zt-3g76q5i4j-uP60AdydxnAmjGVAQhtALA)** - Connect with the community and get support
---
[Apache 2.0 License](LICENSE)
**🏃♀️ Built by [Runhouse](https://www.run.house) 🏠**