https://github.com/cyrildiagne/kuda
Serverless APIs on remote GPUs
https://github.com/cyrildiagne/kuda
gpu kubernetes serverless
Last synced: 10 months ago
JSON representation
Serverless APIs on remote GPUs
- Host: GitHub
- URL: https://github.com/cyrildiagne/kuda
- Owner: cyrildiagne
- License: apache-2.0
- Created: 2019-10-04T11:22:24.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-05-01T21:19:29.000Z (about 3 years ago)
- Last Synced: 2025-04-04T21:39:46.301Z (about 1 year ago)
- Topics: gpu, kubernetes, serverless
- Language: Go
- Homepage:
- Size: 438 KB
- Stars: 57
- Watchers: 3
- Forks: 11
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

[](https://circleci.com/gh/cyrildiagne/kuda)
[](https://github.com/cyrildiagne/kuda/releases)
**Status:** Experimental 🧪
## Easily deploy GPU models as serverless APIs
- Deploy an API from a template
```bash
$ kuda deploy -f https://github.com/cyrildiagne/kuda/releases/download/v0.4.0-preview/example-hello-gpu.yaml
```
- Call it!
```bash
$ curl https://hello-gpu.default.$your_domain
```
```
Hello GPU!
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 37C P8 10W / 70W | 0MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
## Serverless GPU inference
Kuda builds on [Knative](https://knative.dev) to allocate cloud GPUs only when there is traffic.
This is ideal when you want to share ML projects online without keeping
expensive GPUs allocated all the time.
## Turn any model into a serverless API
Kuda deploys APIs as containers, so you can use any language, any
framework, and there is no library to import in your code.
All you need is a Dockerfile.
Here's a minimal example that just prints the result of `nvidia-smi` using
[Flask](http://flask.palletsprojects.com):
- `main.py`
```python
import os
import flask
app = flask.Flask(__name__)
@app.route('/')
def hello():
return 'Hello GPU:\n' + os.popen('nvidia-smi').read()
```
- `Dockerfile`
```Dockerfile
FROM nvidia/cuda:10.1-base
RUN apt-get install -y python3 python3-pip
RUN pip3 install setuptools Flask gunicorn
COPY main.py ./main.py
CMD exec gunicorn --bind :80 --workers 1 --threads 8 main:app
```
- `kuda.yaml`
```yaml
name: hello-gpu
deploy:
dockerfile: ./Dockerfile
```
Running `kuda deploy` in this example would build and deploy the API to a url
such as `https://hello-gpu.my-namespace.example.com`.
Checkout the full example with annotations in
[examples/hello-gpu-flask](examples/hello-gpu-flask).
## Features
- Provision GPUs & scale based on traffic (from zero to N)
- Interactive development on cloud GPUs from any workstation
- Protect & control access to your APIs using API Keys
- HTTPS with TLS termination & automatic certificate management
## Get Started
- [Install](docs/install_cli.md)
- [Getting Started](docs/getting_started.md)
- [CLI Reference](docs/cli.md)