https://github.com/epwalsh/batched-fn

🦀 Rust server plugin for deploying deep learning models with batched prediction
https://github.com/epwalsh/batched-fn

batching deep-learning rust

Last synced: about 1 year ago
JSON representation

🦀 Rust server plugin for deploying deep learning models with batched prediction

Host: GitHub
URL: https://github.com/epwalsh/batched-fn
Owner: epwalsh
License: apache-2.0
Created: 2020-03-20T22:06:11.000Z (about 6 years ago)
Default Branch: main
Last Pushed: 2024-03-10T23:17:46.000Z (about 2 years ago)
Last Synced: 2025-03-10T09:09:39.272Z (about 1 year ago)
Topics: batching, deep-learning, rust
Language: Rust
Homepage: https://crates.io/crates/batched-fn
Size: 71.3 KB
Stars: 21
Watchers: 2
Forks: 2
Open Issues: 5
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          


    batched-fn

    Rust server plugin for deploying deep learning models with batched prediction.








    

        

    

    

        

    

    

        

    

    

        

    






Deep learning models are usually implemented to make efficient use of a GPU by batching inputs together

in "mini-batches". However, applications serving these models often receive requests one-by-one.

So using a conventional single or multi-threaded server approach will under-utilize the GPU and lead to latency that increases

linearly with the volume of requests.

`batched-fn` is a drop-in solution for deep learning webservers that queues individual requests and provides them as a batch

to your model. It can be added to any application with minimal refactoring simply by inserting the [`batched_fn`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html)

macro into the function that runs requests through the model.

## Features

- 🚀 Easy to use: drop the `batched_fn!` macro into existing code.

- 🔥 Lightweight and fast: queue system implemented on top of the blazingly fast [flume crate](https://github.com/zesterer/flume).

- 🙌 Easy to tune: simply adjust [`max_delay`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) and [`max_batch_size`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config).

- 🛑 [Back pressure](https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7) mechanism included:

  just set [`channel_cap`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) and handle

  [`Error::Full`](https://docs.rs/batched-fn/latest/batched_fn/enum.Error.html#variant.Full) by returning a 503 from your webserver.

## Examples

Suppose you have a model API that look like this:

```rust

// `Batch` could be anything that implements the `batched_fn::Batch` trait.

type Batch = Vec;

#[derive(Debug)]

struct Input {

    // ...

}

#[derive(Debug)]

struct Output {

    // ...

}

struct Model {

    // ...

}

impl Model {

    fn predict(&self, batch: Batch) -> Batch {

        // ...

    }

    fn load() -> Self {

        // ...

    }

}

```

Without `batched-fn` a webserver route would need to call `Model::predict` on each

individual input, resulting in a bottleneck from under-utilizing the GPU:

```rust

use once_cell::sync::Lazy;

static MODEL: Lazy = Lazy::new(Model::load);

fn predict_for_http_request(input: Input) -> Output {

    let mut batched_input = Batch::with_capacity(1);

    batched_input.push(input);

    MODEL.predict(batched_input).pop().unwrap()

}

```

But by dropping the [`batched_fn`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html) macro into your code you automatically get batched

inference behind the scenes without changing the one-to-one relationship between inputs and

outputs:

```rust

async fn predict_for_http_request(input: Input) -> Output {

    let batch_predict = batched_fn! {

        handler = |batch: Batch, model: &Model| -> Batch {

            model.predict(batch)

        };

        config = {

            max_batch_size: 16,

            max_delay: 50,

        };

        context = {

            model: Model::load(),

        };

    };

    batch_predict(input).await.unwrap()

}

```

❗️ *Note that the `predict_for_http_request` function now has to be `async`.*

Here we set the [`max_batch_size`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config) to 16 and [`max_delay`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#config)

to 50 milliseconds. This means the batched function will wait at most 50 milliseconds after receiving a single

input to fill a batch of 16. If 15 more inputs are not received within 50 milliseconds

then the partial batch will be ran as-is.

## Tuning max batch size and max delay

The optimal batch size and delay will depend on the specifics of your use case, such as how big of a batch you can fit in memory

(typically on the order of 8, 16, 32, or 64 for a deep learning model) and how long of a delay you can afford.

In general you want to set `max_batch_size` as high as you can, assuming the total processing time for `N` examples is minimized

with a batch size of `N`, and keep `max_delay` small relative to the time it takes for your

handler function to process a batch.

## Implementation details

When the `batched_fn` macro is invoked it spawns a new thread where the

[`handler`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#handler) will

be ran. Within that thread, every object specified in the [`context`](https://docs.rs/batched-fn/latest/batched_fn/macro.batched_fn.html#context)

is initialized and then passed by reference to the handler each time it is run.

The object returned by the macro is just a closure that sends a single input and a callback

through an asyncronous channel to the handler thread. When the handler finishes

running a batch it invokes the callback corresponding to each input with the corresponding output,

which triggers the closure to wake up and return the output.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/epwalsh/batched-fn

Awesome Lists containing this project

README

batched-fn