https://github.com/barrel-platform/barrel_inference

OTP-native LLM inference runtime with token-exact tiered KV cache, plus an OpenAI/Anthropic/Ollama-compatible HTTP daemon. Erlang/OTP + llama.cpp.
https://github.com/barrel-platform/barrel_inference

erlang inference kv-cache llama-cpp llm otp

Last synced: 6 days ago
JSON representation

OTP-native LLM inference runtime with token-exact tiered KV cache, plus an OpenAI/Anthropic/Ollama-compatible HTTP daemon. Erlang/OTP + llama.cpp.

Host: GitHub
URL: https://github.com/barrel-platform/barrel_inference
Owner: barrel-platform
Created: 2026-05-23T20:07:01.000Z (23 days ago)
Default Branch: main
Last Pushed: 2026-06-09T20:42:50.000Z (6 days ago)
Last Synced: 2026-06-09T21:09:36.512Z (6 days ago)
Topics: erlang, inference, kv-cache, llama-cpp, llm, otp
Language: C++
Size: 7.37 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Barrel Inference

OTP-native LLM inference for the BEAM: dirty NIFs over `llama.cpp`, supervised

per-model processes, and a byte-exact tiered KV cache, with an

OpenAI/Anthropic/Ollama-compatible HTTP daemon on top.

Inference as a first-class OTP citizen, not a Python sidecar. The wedge is

supervision, per-model queues, and cancel-on-disconnect, with the cache more

warm state than fits in RAM.

## Layout

A rebar3 umbrella; each app is a separately publishable Hex package and the

repo is versioned as a whole.

| App | What it is |

|-----|------------|

| [`apps/barrel_inference`](apps/barrel_inference) | The runtime: dirty NIFs over llama.cpp, supervised model processes, byte-exact tiered KV cache. |

| [`apps/barrel_inference_server`](apps/barrel_inference_server) | The API daemon: OpenAI-, Anthropic-, and Ollama-compatible HTTP, model registry, per-model queues, keep-alive, metrics. |

| [`apps/barrel_inference_cli`](apps/barrel_inference_cli) | The `barrel-inference` CLI: `serve` boots the daemon; `pull`/`run`/`ps`/`rm` drive a running one over HTTP. |

A distributed control plane (`barrel_inference_cluster`: routing, cache-aware

placement, node discovery) is a planned follow-up.

## Build

    rebar3 compile              # builds the NIF (vendored llama.cpp via cmake)

    rebar3 as prod release      # the barrel_inference_server daemon release

    rebar3 escriptize           # the barrel-inference CLI

Requires Erlang/OTP 28 and rebar3 3.25+, plus cmake and a C/C++ toolchain for

the NIF. See each app's README for the public API and configuration.

## Run

    barrel-inference serve                 # start the API server

    barrel-inference pull           # fetch a model

    barrel-inference run  "hello"   # one-shot completion

    barrel-inference ps                    # list loaded models

Or with Docker:

    docker compose up

## Documentation

One site covers the whole project, for both operators and contributors:

. It is a mkdocs build at

the repo root (`mkdocs.yml`, `docs/`) that surfaces each app's guides in place.

Function-level API reference ships per package on hexdocs:

[barrel_inference](https://hexdocs.pm/barrel_inference) and

[barrel_inference_server](https://hexdocs.pm/barrel_inference_server).

Build the site locally:

    pip install -r docs-requirements.txt

    mkdocs serve

## License

MIT. Part of the [barrel-platform](https://github.com/barrel-platform) project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/barrel-platform/barrel_inference

Awesome Lists containing this project

README