https://github.com/marwan116/raycraft

A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve
https://github.com/marwan116/raycraft

fastapi fault-tolerance ray ray-serve scalability

Last synced: about 1 month ago
JSON representation

A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve

Host: GitHub
URL: https://github.com/marwan116/raycraft
Owner: marwan116
License: mit
Created: 2023-11-04T03:58:02.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-11-07T16:21:02.000Z (over 2 years ago)
Last Synced: 2025-07-05T02:02:05.291Z (12 months ago)
Topics: fastapi, fault-tolerance, ray, ray-serve, scalability
Language: Python
Homepage:
Size: 204 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # RayCraft

## Motivation

FastAPI + Ray = <3

Let's take a FastAPI app and supercharge it with raycraft

```python

from fastapi import FastAPI

simple_service = FastAPI()

@simple_service.post("/")

async def read_root() -> dict[str, str]:

    return {"Hello": "World"}

```

You can now run it using raycraft using the RayCraftAPI instead of FastAPI with only two lines of code changes

```diff

+ from raycraft import RayCraftAPI

+ simple_service = RayCraftAPI()

@simple_service.post("/")

async def read_root() -> dict[str, str]:

    return {"Hello": "World"}

```

## How to use

### Basic example

Ok so an endpoint returning {"Hello": "World"} isn't going to be enough to serve as a basic example so let's try something more interesting and relevant to why you might want to use raycraft!

Let's say you build a translation service using the following fastAPI code:

```python

from fastapi import FastAPI

from transformers import pipeline

app = FastAPI()

def load_model():

    return pipeline("translation_en_to_fr", model="t5-small")

@app.post("/")

async def translate(text: str):

    model = load_model()

    translated = model(text)[0]["translation_text"]

    return {"translation": translated}

```

We can now build this app using raycraft with the same two lines of code changes

```python

from raycraft import RayCraftAPI

from transformers import pipeline

app = RayCraftAPI()

def load_model():

    return pipeline("translation_en_to_fr", model="t5-small")

def translate(text: str):

    model = load_model()

    translated = model(text)[0]["translation_text"]

    return translated

@app.post("/")

async def translate(text: str):

    return translate(text)    

```

We then call the following command to run the app:

```bash

raycraft run demo:app

```

Ok now for the distributed part, let's say we want to run this app on 2 "replicas", each "replica" taking half a GPU, and we want to properly load balance between the replicas, we can do this by running the following command:

```python

from raycraft import RayCraftAPI

from transformers import pipeline

app = RayCraftAPI(ray_actor_options={"num_gpus": 0.5}, num_replicas=2)

def load_model():

    return pipeline("translation_en_to_fr", model="t5-small")

def translate(text: str):

    model = load_model()

    translated = model(text)[0]["translation_text"]

    return translated

@app.post("/")

async def translate(text: str):

    return translate(text)    

```

To avoid loading the model on every request, we can load the model in the constructor of the app:

```python

from raycraft import RayCraftAPI, App

from transformers import pipeline

app = RayCraftAPI(ray_actor_options={"num_gpus": 0.5}, num_replicas=2)

@app.init

def model():

    return pipeline("translation_en_to_fr", model="t5-small")

def translate(app: App, text: str):

    translated = app.model(text)[0]["translation_text"]

    return translated

@app.post("/")

async def translate(app: App, text: str):

    return translate(app, text) 

```

RayCraft is a thin-layer built on top of [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) adopting a functional interface to ease the migration from fastAPI apps.

With Ray Serve, you can now:

- Scale your app deployment to multiple replicas running on different machines

- Define the resources allocated to each replica including fractional GPUs

- Batch requests together to improve throughput

- Get fault tolerance and automatic retries

- Stream responses using websockets

- Compose different services together using RPC calls that are strictly typed and faster than http requests

### Composing models

## How to setup

Using poetry:

```bash

poetry add raycraft

```

Using pip:

```bash

pip install raycraft

```

## Roadmap

- Streaming support using websockets

- Deployment guide

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/marwan116/raycraft

Awesome Lists containing this project

README