Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seanmor5/honeycomb
Fast LLM inference with Elixir and Bumblebee
https://github.com/seanmor5/honeycomb
Last synced: 12 days ago
JSON representation
Fast LLM inference with Elixir and Bumblebee
- Host: GitHub
- URL: https://github.com/seanmor5/honeycomb
- Owner: seanmor5
- Created: 2024-08-01T16:17:29.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-06T18:24:05.000Z (3 months ago)
- Last Synced: 2024-10-16T10:38:10.900Z (27 days ago)
- Language: Elixir
- Size: 36.1 KB
- Stars: 51
- Watchers: 3
- Forks: 1
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Honeycomb
Fast LLM inference built on Elixir, [Bumblebee](https://github.com/elixir-nx/bumblebee), and [EXLA](https://github.com/elixir-nx/nx/tree/main/exla).
## Usage
Honeycomb can be used as a standalone inference service or as a dependency in an existing Elixir project.
### As a separate service
To use Honeycomb as a separate service, you just need to clone the project and run:
```shell
mix honeycomb.serve
```The following arguments are required:
* `--model` - HuggingFace model repo to use
* `--chat-template` - Chat template to use
The following arguments are optional:
* `--max-sequence-length` - Text generation max sequence length. Total sequence
length accounts for both input and output tokens.* `--hf-auth-token` - HuggingFace auth token for accessing private or gated repos.
The Honeycomb server is compatible with the OpenAI API, so you can use it as a drop-in replacement by changing the `api_url` in the OpenAI client.
### As a dependency
To use Honeycomb as a dependency, first add it to your `deps`:
```elixir
defp deps do
[{:honeycomb, github: "seanmor5/honeycomb"}]
end
```Next, you'll need to configure the serving options:
```elixir
config :honeycomb, Honeycomb.Serving,
model: "microsoft/Phi-3-mini-4k-instruct",
chat_template: "phi3",
auth_token: System.fetch_env!("HF_TOKEN")
```Then you can call Honeycomb directly:
```elixir
messages = [%{role: "user", content: "Hello!"}]
Honeycomb.chat_completion(messages: messages)
```## Benchmarks
Honeycomb ships with some basic benchmarks and profiling utilities. You can benchmark and/or profile your inference configuration by running:
```shell
mix honeycomb.benchmark
```