Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/edgenai/llama_cpp-rs
High-level, optionally asynchronous Rust bindings to llama.cpp
https://github.com/edgenai/llama_cpp-rs
Last synced: 6 days ago
JSON representation
High-level, optionally asynchronous Rust bindings to llama.cpp
- Host: GitHub
- URL: https://github.com/edgenai/llama_cpp-rs
- Owner: edgenai
- License: apache-2.0
- Created: 2023-10-19T14:11:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-05T16:11:00.000Z (7 months ago)
- Last Synced: 2024-12-21T20:05:22.836Z (13 days ago)
- Language: Rust
- Homepage:
- Size: 389 KB
- Stars: 187
- Watchers: 5
- Forks: 37
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE
Awesome Lists containing this project
- awesome-rust-list - edgenai/llama_cpp-rs - rs?style=social"/> : High-level, optionally asynchronous Rust bindings to llama.cpp. (Machine Learning)
- awesome-rust-list - edgenai/llama_cpp-rs - rs?style=social"/> : High-level, optionally asynchronous Rust bindings to llama.cpp. (Machine Learning)
README
# llama_cpp-rs
[![Documentation](https://docs.rs/llama_cpp/badge.svg)](https://docs.rs/llama_cpp/)
[![Crate](https://img.shields.io/crates/v/llama_cpp.svg)](https://crates.io/crates/llama_cpp)Safe, high-level Rust bindings to the C++ project [of the same name](https://github.com/ggerganov/llama.cpp), meant to
be as user-friendly as possible. Run GGUF-based large language models directly on your CPU in fifteen lines of code, no
ML experience required!```rust
// Create a model from anything that implements `AsRef`:
let model = LlamaModel::load_from_file("path_to_model.gguf", LlamaParams::default()).expect("Could not load model");// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be
// several gigabytes large, a session is typically a few dozen to a hundred megabytes!
let mut ctx = model.create_session(SessionParams::default()).expect("Failed to create session");// You can feed anything that implements `AsRef<[u8]>` into the model's context.
ctx.advance_context("This is the story of a man named Stanley.").unwrap();// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!
let max_tokens = 1024;
let mut decoded_tokens = 0;// `ctx.start_completing_with` creates a worker thread that generates tokens. When the completion
// handle is dropped, tokens stop generating!
let mut completions = ctx.start_completing_with(StandardSampler::default(), 1024).into_strings();for completion in completions {
print!("{completion}");
let _ = io::stdout().flush();
decoded_tokens += 1;
if decoded_tokens > max_tokens {
break;
}
}
```This repository hosts the high-level bindings (`crates/llama_cpp`) as well as automatically generated bindings to
llama.cpp's low-level C API (`crates/llama_cpp_sys`). Contributions are welcome--just keep the UX clean!## Building
Keep in mind that [llama.cpp](https://github.com/ggerganov/llama.cpp) is very computationally heavy, meaning standard
debug builds (running just `cargo build`/`cargo run`) will suffer greatly from the lack of optimisations. Therefore,
unless
debugging is really necessary, it is highly recommended to build and run using Cargo's `--release` flag.### Cargo Features
Several of [llama.cpp](https://github.com/ggerganov/llama.cpp)'s backends are supported through features:
- `cuda` - Enables the CUDA backend, the CUDA Toolkit is required for compilation if this feature is enabled.
- `vulkan` - Enables the Vulkan backend, the Vulkan SDK is required for compilation if this feature is enabled.
- `metal` - Enables the Metal backend, macOS only.
- `hipblas` - Enables the hipBLAS/ROCm backend, ROCm is required for compilation if this feature is enabled.## Experimental
Something that's provided by these bindings is the ability to predict context size in memory, however it should be
noted that this is a highly experimental feature as this isn't something
that [llama.cpp](https://github.com/ggerganov/llama.cpp) itself provides.
The returned values may be highly inaccurate, however an attempt is made to never return values lower than the real
size.## License
MIT or Apache-2.0, at your option (the "Rust" license). See `LICENSE-MIT` and `LICENSE-APACHE`.