https://github.com/kassane/llama-cpp-d
D bindings for llama.cpp
https://github.com/kassane/llama-cpp-d
bindings d dlang ggml llama-cpp
Last synced: 14 days ago
JSON representation
D bindings for llama.cpp
- Host: GitHub
- URL: https://github.com/kassane/llama-cpp-d
- Owner: kassane
- License: mit
- Created: 2026-03-19T11:59:10.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-03-31T13:03:28.000Z (3 months ago)
- Last Synced: 2026-06-08T14:34:25.660Z (14 days ago)
- Topics: bindings, d, dlang, ggml, llama-cpp
- Language: D
- Homepage: https://llama-cpp-d.dub.pm
- Size: 56.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llama-cpp-d
[](https://github.com/kassane/llama-cpp-d/actions/workflows/ci.yml)

[-f8240e?logo=d&logoColor=f8240e&label=frontend)](https://dlang.org/download.html)
[](https://deepwiki.com/kassane/llama-cpp-d)
D bindings for [llama.cpp](https://github.com/ggml-org/llama.cpp).
## Requirements
| Tool | Minimum |
|------|---------|
| LDC or DMD | ≥ 2.111 (`importC` required) |
| CMake | ≥ 3.14 |
| C++17 compiler | GCC / Clang / MSVC |
## How to use
```sh
dub add llama-cpp-d
```
## Tools
### hf-download
List and download GGUF files from HuggingFace Hub:
```sh
cd tools && dub build --build=release
# List available .gguf files in a repository
./build/hf-download -r unsloth/Qwen3.5-0.8B-GGUF
# Download a specific file
./build/hf-download -r unsloth/Qwen3.5-0.8B-GGUF -f Qwen3.5-0.8B-Q4_K_M.gguf -o ~/models
# With authentication (private repos / higher rate limits)
HF_TOKEN=hf_xxx ./build/hf-download -r myorg/mymodel -f model.gguf
```
| Flag | Description |
|------|-------------|
| `-r owner/repo` | HuggingFace repository (required) |
| `-f filename` | File to download; omit to list `.gguf` files |
| `-o outdir` | Output directory (default: `.`) |
| `-t token` | HF access token (or `HF_TOKEN` env var) |
## Examples
```sh
# Text completion
dub run :simple -- -m model.gguf -n 64 "Tell me a joke"
# Tokenization inspector
dub run :tokenize -- -m model.gguf -s "Hello, world!"
# Sentence embeddings (cosine similarity between prompts)
dub run :embedding -- -m model.gguf
dub run :embedding -- -m model.gguf -p "custom sentence"
# Context state save/load (verifies two runs produce identical output)
dub run :save-load-state -- -m model.gguf -n 32
# Multimodal (vision/audio) — text only
dub run :multimodal -c default -- -m model.gguf --mmproj mmproj.gguf -n 200 "Describe this."
# Multimodal with an image
dub run :multimodal -c default -- -m model.gguf --mmproj mmproj.gguf -i photo.jpg "What do you see?"
```
| Example | Required flags | Optional flags |
|---------|----------------|----------------|
| `simple` | `-m ` | `-n ` (default 32), `-ngl ` (default 99) |
| `tokenize` | `-m ` | `-s` include BOS/EOS |
| `embedding` | `-m ` | `-p `, `-ngl` (default 99) |
| `save-load-state` | `-m ` | `-n ` (default 16), `-ngl`, `--state-file ` |
| `multimodal` | `-m `, `--mmproj ` | `-i `, `-n ` (default 512), `-ngl` (default 99), `--no-gpu` |
### Configurations
| Config | Description |
|--------|-------------|
| `default` | CPU only |
| `mtmd` | CPU multimodal (llama + libmtmd) |
| `cuda` | CUDA GPU acceleration |
| `vulkan` | Vulkan GPU acceleration |
| `metal` | Apple Metal (macOS) |
| `hipblas` | AMD ROCm/HIP |
| `openblas` | OpenBLAS |
| `openmp` | OpenMP threading |
| `sycl` | Intel oneAPI SYCL |
## Quick start
### Text completion
```d
import llama;
void main()
{
loadAllBackends();
// D-string overload; second arg is GPU layer count (0 = CPU only)
auto model = LlamaModel.loadFromFile("model.gguf", 99);
assert(model);
// Context window = model default; batch size = number of prompt tokens
auto tokens = tokenize(model.vocab, "Hello");
auto ctx = LlamaContext.fromModel(model,
cast(uint) tokens.length + 32, // nCtx
cast(uint) tokens.length); // nBatch
assert(ctx);
// Two-statement form: SamplerChain is non-copyable, so no chaining on init
auto smpl = SamplerChain.create();
smpl.topK(40).topP(0.9f).temp(0.8f).dist();
auto batch = batchGetOne(tokens);
ctx.decode(batch);
auto next = smpl.sample(ctx); // samples from the last output position
}
```
### Multimodal (vision/audio)
```d
import llama;
void main() @trusted
{
loadAllBackends();
auto model = LlamaModel.loadFromFile("model.gguf", 99);
assert(model);
auto mparams = mtmd_context_params_default();
mparams.use_gpu = true;
auto mtmd = MtmdContext.initFromFile("mmproj.gguf", model.ptr, mparams);
assert(mtmd);
// Load an image (or skip for text-only)
auto bitmap = mtmd.loadBitmap("photo.jpg");
assert(bitmap);
import std.string : fromStringz;
string marker = fromStringz(mtmd_default_marker()).idup;
string prompt = marker ~ "\nDescribe the image.";
auto chunks = InputChunks.create();
auto inputTxt = mtmd_input_text(&prompt[0], true, true);
const(mtmd_bitmap)*[1] bitmaps = [bitmap.ptr];
mtmd.tokenize(chunks, inputTxt, bitmaps[]);
auto ctx = LlamaContext.fromModel(model,
cast(uint)(chunks.nTokens + 256),
512);
assert(ctx);
llama_pos nPast;
mtmd.evalChunks(ctx.ptr, chunks, 0, 0, 512, true, nPast);
auto smpl = SamplerChain.create();
smpl.temp(0.8f).topK(40).topP(0.95f).dist();
// Generation loop
llama_token[1] buf;
foreach (i; 0 .. 256)
{
auto tok = smpl.sample(ctx);
if (isEog(model.vocab, tok)) break;
import std.stdio : write;
write(tokenToString(model.vocab, tok));
smpl.accept(tok);
buf[0] = tok;
ctx.decode(batchGetOne(buf[]));
}
}
```
## License
[MIT](./LICENSE)