https://github.com/marwan116/raycraft
A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve
https://github.com/marwan116/raycraft
fastapi fault-tolerance ray ray-serve scalability
Last synced: about 1 month ago
JSON representation
A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve
- Host: GitHub
- URL: https://github.com/marwan116/raycraft
- Owner: marwan116
- License: mit
- Created: 2023-11-04T03:58:02.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-07T16:21:02.000Z (over 2 years ago)
- Last Synced: 2025-07-05T02:02:05.291Z (12 months ago)
- Topics: fastapi, fault-tolerance, ray, ray-serve, scalability
- Language: Python
- Homepage:
- Size: 204 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# RayCraft
## Motivation
FastAPI + Ray = <3
Let's take a FastAPI app and supercharge it with raycraft
```python
from fastapi import FastAPI
simple_service = FastAPI()
@simple_service.post("/")
async def read_root() -> dict[str, str]:
return {"Hello": "World"}
```
You can now run it using raycraft using the RayCraftAPI instead of FastAPI with only two lines of code changes
```diff
+ from raycraft import RayCraftAPI
+ simple_service = RayCraftAPI()
@simple_service.post("/")
async def read_root() -> dict[str, str]:
return {"Hello": "World"}
```
## How to use
### Basic example
Ok so an endpoint returning {"Hello": "World"} isn't going to be enough to serve as a basic example so let's try something more interesting and relevant to why you might want to use raycraft!
Let's say you build a translation service using the following fastAPI code:
```python
from fastapi import FastAPI
from transformers import pipeline
app = FastAPI()
def load_model():
return pipeline("translation_en_to_fr", model="t5-small")
@app.post("/")
async def translate(text: str):
model = load_model()
translated = model(text)[0]["translation_text"]
return {"translation": translated}
```
We can now build this app using raycraft with the same two lines of code changes
```python
from raycraft import RayCraftAPI
from transformers import pipeline
app = RayCraftAPI()
def load_model():
return pipeline("translation_en_to_fr", model="t5-small")
def translate(text: str):
model = load_model()
translated = model(text)[0]["translation_text"]
return translated
@app.post("/")
async def translate(text: str):
return translate(text)
```
We then call the following command to run the app:
```bash
raycraft run demo:app
```
Ok now for the distributed part, let's say we want to run this app on 2 "replicas", each "replica" taking half a GPU, and we want to properly load balance between the replicas, we can do this by running the following command:
```python
from raycraft import RayCraftAPI
from transformers import pipeline
app = RayCraftAPI(ray_actor_options={"num_gpus": 0.5}, num_replicas=2)
def load_model():
return pipeline("translation_en_to_fr", model="t5-small")
def translate(text: str):
model = load_model()
translated = model(text)[0]["translation_text"]
return translated
@app.post("/")
async def translate(text: str):
return translate(text)
```
To avoid loading the model on every request, we can load the model in the constructor of the app:
```python
from raycraft import RayCraftAPI, App
from transformers import pipeline
app = RayCraftAPI(ray_actor_options={"num_gpus": 0.5}, num_replicas=2)
@app.init
def model():
return pipeline("translation_en_to_fr", model="t5-small")
def translate(app: App, text: str):
translated = app.model(text)[0]["translation_text"]
return translated
@app.post("/")
async def translate(app: App, text: str):
return translate(app, text)
```
RayCraft is a thin-layer built on top of [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) adopting a functional interface to ease the migration from fastAPI apps.
With Ray Serve, you can now:
- Scale your app deployment to multiple replicas running on different machines
- Define the resources allocated to each replica including fractional GPUs
- Batch requests together to improve throughput
- Get fault tolerance and automatic retries
- Stream responses using websockets
- Compose different services together using RPC calls that are strictly typed and faster than http requests
### Composing models
## How to setup
Using poetry:
```bash
poetry add raycraft
```
Using pip:
```bash
pip install raycraft
```
## Roadmap
- Streaming support using websockets
- Deployment guide