Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lun-4/llamabench
end-to-end benchmarking script for llama.cpp
https://github.com/lun-4/llamabench
Last synced: about 1 month ago
JSON representation
end-to-end benchmarking script for llama.cpp
- Host: GitHub
- URL: https://github.com/lun-4/llamabench
- Owner: lun-4
- Created: 2024-04-20T01:53:36.000Z (9 months ago)
- Default Branch: mistress
- Last Pushed: 2024-05-19T22:15:01.000Z (8 months ago)
- Last Synced: 2024-12-01T10:54:18.746Z (about 1 month ago)
- Language: Python
- Size: 127 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# llamabench
end-to-end benchmarking script for llama.cpp## how
first, get llama.cpp setup
```sh
git clone https://github.com/ggerganov/llama.cppcd llama.cpp
# using this commit for my llama3 benchmarks
git checkout 5cf5e7d490dfdd2e70bface2d35dfd14aa44b4fb# verify it builds
make -j8# verify it builds with openblas
make LLAMA_OPENBLAS=1 -j8# verify it builds with Vulkan (optional when benching a system)
# requires vulkan headers, vulkan icd(??), vulkan drivers(??)
make LLAMA_VULKAN=1 -j8
```then, get this repo setup. all done in pure python3, no pip required
```sh
git clone https://github.com/lun-4/llamabench
cd llamabench# using this model for benches
wget 'https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf?download=true' -O 'Meta-Llama-3-8B-Instruct-Q4_K_M.gguf'# prepare system
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor# actually run the bench
# set MAXTHREADS to amount of threads in yr system
# llamabench will rebuild llama.cpp using make to provide isolation
# VULKAN=1 ONLY IF YOU HAVE A VULKAN-CAPABLE DEVICE AND LIBRARIES
env VULKAN=1 MAXTHREADS=12 MAKEFLAGS=-j8 LLAMACPP=/path/to/directory/llama.cpp MODEL=/path/to/model/file/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf python3 ./bench.py# a lot of data is spit out to stdout/stderr, capture everything to a file
```(TODO: process data for pretty visualizations)