https://github.com/theibrahimstudio/finder.livebatch

A lightweight, framework-agnostic middleware that dynamically batches inference requests in real time to maximize GPU/TPU utilization.
https://github.com/theibrahimstudio/finder.livebatch

dynamic-batching golang gpu grpc microservices ml-inference model-serving performance-optimization

Last synced: about 1 month ago
JSON representation

A lightweight, framework-agnostic middleware that dynamically batches inference requests in real time to maximize GPU/TPU utilization.

Host: GitHub
URL: https://github.com/theibrahimstudio/finder.livebatch
Owner: theIbrahimStudio
License: apache-2.0
Created: 2025-05-23T07:24:34.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-23T11:48:41.000Z (about 1 year ago)
Last Synced: 2025-06-12T20:06:45.134Z (about 1 year ago)
Topics: dynamic-batching, golang, gpu, grpc, microservices, ml-inference, model-serving, performance-optimization
Language: Go
Homepage:
Size: 19.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # LiveBatch ⚡

**LiveBatch** is a lightweight, framework-agnostic middleware that dynamically batches inference requests in real time to maximize GPU/TPU utilization. It's designed for ML teams and microservices that want to increase throughput without modifying their model code.

---

## Features

- Plug-and-play: Drop-in proxy over any HTTP ML server

- Dynamic batching: Configurable by max latency or batch size

- Written in Go: Fast, concurrent, production-ready

- Transparent: Accepts and returns single inference calls

- Efficient: Great for reducing GPU/TPU underutilization

---

## Architecture

LiveBatch acts as a sidecar or proxy in front of your model service.

```

\[Client] ---> \[LiveBatch] ---> \[Model Server]

\|  |

\|  |---> Queues requests

\|------> Batches & dispatches

```

---

## Quick Start

Via command-line flags:

```bash

go run main.go --max-batch-size=4 --max-latency-ms=100 --listen-addr=":9000"

```

Via environment variables:

```bash

LIVEBATCH_MAX_BATCH_SIZE=16 LIVEBATCH_MAX_LATENCY_MS=200 go run main.go

```

---

## Configuration

LiveBatch supports configuration via **environment variables** or **command-line flags**, using `viper` + `pflag`.

| Name              | Flag               | Env Var                    | Default | Description                      |

| ----------------- | ------------------ | -------------------------- | ------- | -------------------------------- |

| Max Batch Size    | `--max-batch-size` | `LIVEBATCH_MAX_BATCH_SIZE` | `8`     | Max number of requests per batch |

| Max Latency (ms)  | `--max-latency-ms` | `LIVEBATCH_MAX_LATENCY_MS` | `50`    | Max wait time before dispatching |

| Listening Address | `--listen-addr`    | `LIVEBATCH_LISTEN_ADDR`    | `:8080` | HTTP server bind address         |

---

## Roadmap

- [x] HTTP dynamic batching proxy (MVP)

- [x] Config via environment or CLI

- [ ] gRPC and ONNX backend support

- [ ] Prometheus metrics

- [ ] Deadline-based and priority queueing

- [ ] Docker + Helm chart for Kubernetes

- [ ] Python client SDK

---

## Contributing

PRs welcome! Check out the [CONTRIBUTING.md](https://github.com/theIbrahimStudio/.github/blob/main/CONTRIBUTING.md) for guidelines.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/theibrahimstudio/finder.livebatch

Awesome Lists containing this project

README