https://github.com/seanmor5/honeycomb

Fast LLM inference with Elixir and Bumblebee
https://github.com/seanmor5/honeycomb

Last synced: 6 months ago
JSON representation

Fast LLM inference with Elixir and Bumblebee

Host: GitHub
URL: https://github.com/seanmor5/honeycomb
Owner: seanmor5
Created: 2024-08-01T16:17:29.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-06T18:24:05.000Z (11 months ago)
Last Synced: 2024-12-18T18:58:46.134Z (6 months ago)
Language: Elixir
Size: 36.1 KB
Stars: 55
Watchers: 3
Forks: 1
Open Issues: 11
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-ml-gen-ai-elixir - Honeycomb - Fast LLM inference service and library built on Elixir, Bumblebee, and EXLA with OpenAI API compatibility. (Generative AI / LLM Tools)

README

# Honeycomb

Fast LLM inference built on Elixir, [Bumblebee](https://github.com/elixir-nx/bumblebee), and [EXLA](https://github.com/elixir-nx/nx/tree/main/exla).

## Usage

Honeycomb can be used as a standalone inference service or as a dependency in an existing Elixir project.

### As a separate service

To use Honeycomb as a separate service, you just need to clone the project and run:

```shell
mix honeycomb.serve
```

The following arguments are required:

* `--model` - HuggingFace model repo to use

* `--chat-template` - Chat template to use

The following arguments are optional:

* `--max-sequence-length` - Text generation max sequence length. Total sequence
length accounts for both input and output tokens.

* `--hf-auth-token` - HuggingFace auth token for accessing private or gated repos.

The Honeycomb server is compatible with the OpenAI API, so you can use it as a drop-in replacement by changing the `api_url` in the OpenAI client.

### As a dependency

To use Honeycomb as a dependency, first add it to your `deps`:

```elixir
defp deps do
[{:honeycomb, github: "seanmor5/honeycomb"}]
end
```

Next, you'll need to configure the serving options:

```elixir
config :honeycomb, Honeycomb.Serving,
model: "microsoft/Phi-3-mini-4k-instruct",
chat_template: "phi3",
auth_token: System.fetch_env!("HF_TOKEN")
```

Then you can call Honeycomb directly:

```elixir
messages = [%{role: "user", content: "Hello!"}]
Honeycomb.chat_completion(messages: messages)
```

## Benchmarks

Honeycomb ships with some basic benchmarks and profiling utilities. You can benchmark and/or profile your inference configuration by running:

```shell
mix honeycomb.benchmark
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seanmor5/honeycomb

Awesome Lists containing this project

README