https://github.com/noizu-labs-ml/ex_llama
Elixir NIFs for interacting with llama_cpp.rust managed GGUF models.
https://github.com/noizu-labs-ml/ex_llama
Last synced: 5 months ago
JSON representation
Elixir NIFs for interacting with llama_cpp.rust managed GGUF models.
- Host: GitHub
- URL: https://github.com/noizu-labs-ml/ex_llama
- Owner: noizu-labs-ml
- License: mit
- Created: 2024-04-11T18:27:35.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-12T09:30:22.000Z (about 1 year ago)
- Last Synced: 2024-08-04T01:09:52.289Z (9 months ago)
- Language: Elixir
- Size: 45.9 KB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-elixir - ExLLama - LlamaCpp Nif Extensions for Elixir/Erlang. ([Docs](https://hexdocs.pm/ex_llama/ExLLama.html)). (Artificial Intelligence)
README
ExLLama: LlammaCpp.rs NIF wrapper for Elixir/Erlang.
=======This is an Alpha Library for loading and interacting with models via the llama_cpp rust client exposed as nif extensions.
Inspired By [llama_cpp_ex](https://github.com/jeregrine/llama_cpp_ex)## Getting Started
1. Add the `ex_llama` dependency to your `mix.exs` file:```elixir
def deps do
[
{:ex_llama, "~> 0.0.1"}
]end
```## Chat Completion
As of this build only `<|role|>messsage` format chat completion is supported, such as used by tiny llama.```elixir
{:ok, llama} = ExLLama.load_model("./test/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
thread = [
%{role: :user, content: "Say Hello. And only hello. Example \"Hello\"."},
%{role: :assistant, content: "Hello"},
%{role: :user, content: "Repeat what you just said."},
%{role: :assistant, content: "Hello"},
%{role: :user, content: "Say Goodbye."},
%{role: :assistant, content: "Goodbye"},
%{role: :user, content: "Say Apple."},
%{role: :assistant, content: "Apple"},
%{role: :user, content: "What did you just say?."},
]{:ok, response} = ExLLama.chat_completion(llama, thread, %{seed: 2})
# response = %{
# choices: [
# %{reason: :end, role: "assistant", content: "Apple"},
# %{reason: :end, role: "assistant", content: "Apple"},
# %{reason: :end, role: "assistant", content: "Apple"}
# ]
# }```
## Simple Completion (direct)
```elixir
{:ok, llama} = ExLLama.load_model("./test/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
{:ok, options} = ExLLama.Session.default_options()
{:ok, session} = ExLLama.create_session(llama, %{options| seed: 2})
ExLLama.advance_context(session, "<|user|>\n Say Hello. And only hello. Example \"Hello\".\n<|assistant|>\n Hello\n<|user|>\n Repeat what you just said.\n<|assistant|>\n Hello\n<|user|>\n Say Goodbye.\n<|assistant|>\n")
{:ok, response} = ExLLama.completion(session, 512, "\n*")
response = String.trim_leading(response)
# "Goodbye."
```## Streaming Completion (final mechanism will be replaced with a Stream
```elixirdef receive_text(acc \\ []) do
receive do
x = {:ok, _} -> Enum.reverse([x|acc])
x = {:error, _} -> Enum.reverse([x|acc])
:fin ->
Enum.reverse(acc)
x ->
receive_text([x | acc])
end
end#...
{:ok, llama} = ExLLama.load_model("./test/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")
{:ok, options} = ExLLama.Session.default_options()
{:ok, session} = ExLLama.create_session(llama, %{options| seed: 2})
ExLLama.advance_context(session, "<|user|>\n Say Hello. And only hello. Example \"Hello\".\n<|assistant|>\n Hello\n<|user|>\n Repeat what you just said.\n<|assistant|>\n Hello\n<|user|>\n Say Goodbye.\n<|assistant|>\n")
ExLLama.Session.start_completing_with(session, %{max_tokens: 512})
receive_text()```