An open API service indexing awesome lists of open source software.

https://github.com/barrel-platform/barrel_inference

OTP-native LLM inference runtime with token-exact tiered KV cache, plus an OpenAI/Anthropic/Ollama-compatible HTTP daemon. Erlang/OTP + llama.cpp.
https://github.com/barrel-platform/barrel_inference

erlang inference kv-cache llama-cpp llm otp

Last synced: 6 days ago
JSON representation

OTP-native LLM inference runtime with token-exact tiered KV cache, plus an OpenAI/Anthropic/Ollama-compatible HTTP daemon. Erlang/OTP + llama.cpp.

Awesome Lists containing this project

README

          

# Barrel Inference

OTP-native LLM inference for the BEAM: dirty NIFs over `llama.cpp`, supervised
per-model processes, and a byte-exact tiered KV cache, with an
OpenAI/Anthropic/Ollama-compatible HTTP daemon on top.

Inference as a first-class OTP citizen, not a Python sidecar. The wedge is
supervision, per-model queues, and cancel-on-disconnect, with the cache more
warm state than fits in RAM.

## Layout

A rebar3 umbrella; each app is a separately publishable Hex package and the
repo is versioned as a whole.

| App | What it is |
|-----|------------|
| [`apps/barrel_inference`](apps/barrel_inference) | The runtime: dirty NIFs over llama.cpp, supervised model processes, byte-exact tiered KV cache. |
| [`apps/barrel_inference_server`](apps/barrel_inference_server) | The API daemon: OpenAI-, Anthropic-, and Ollama-compatible HTTP, model registry, per-model queues, keep-alive, metrics. |
| [`apps/barrel_inference_cli`](apps/barrel_inference_cli) | The `barrel-inference` CLI: `serve` boots the daemon; `pull`/`run`/`ps`/`rm` drive a running one over HTTP. |

A distributed control plane (`barrel_inference_cluster`: routing, cache-aware
placement, node discovery) is a planned follow-up.

## Build

rebar3 compile # builds the NIF (vendored llama.cpp via cmake)
rebar3 as prod release # the barrel_inference_server daemon release
rebar3 escriptize # the barrel-inference CLI

Requires Erlang/OTP 28 and rebar3 3.25+, plus cmake and a C/C++ toolchain for
the NIF. See each app's README for the public API and configuration.

## Run

barrel-inference serve # start the API server
barrel-inference pull # fetch a model
barrel-inference run "hello" # one-shot completion
barrel-inference ps # list loaded models

Or with Docker:

docker compose up

## Documentation

One site covers the whole project, for both operators and contributors:
. It is a mkdocs build at
the repo root (`mkdocs.yml`, `docs/`) that surfaces each app's guides in place.

Function-level API reference ships per package on hexdocs:
[barrel_inference](https://hexdocs.pm/barrel_inference) and
[barrel_inference_server](https://hexdocs.pm/barrel_inference_server).

Build the site locally:

pip install -r docs-requirements.txt
mkdocs serve

## License

MIT. Part of the [barrel-platform](https://github.com/barrel-platform) project.