https://github.com/eniompw/llama-cpp-gpu
Load larger models by offloading model layers to both GPU and CPU
https://github.com/eniompw/llama-cpp-gpu
colab colab-notebook gpu gpu-acceleration llama llama-cpp llamacpp
Last synced: 9 months ago
JSON representation
Load larger models by offloading model layers to both GPU and CPU
- Host: GitHub
- URL: https://github.com/eniompw/llama-cpp-gpu
- Owner: eniompw
- License: mit
- Created: 2023-06-23T07:05:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-28T14:00:05.000Z (over 2 years ago)
- Last Synced: 2024-10-18T23:15:32.186Z (about 1 year ago)
- Topics: colab, colab-notebook, gpu, gpu-acceleration, llama, llama-cpp, llamacpp
- Language: Jupyter Notebook
- Homepage:
- Size: 109 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLaMA.cpp GPU
Offloads some of the model layers to the GPU, allowing larger models to be loaded



* [cuBLAS](https://github.com/ggerganov/llama.cpp#cublas)
* [Model](https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML#how-to-run-in-llamacpp)