An open API service indexing awesome lists of open source software.

https://github.com/eniompw/llama-cpp-gpu

Load larger models by offloading model layers to both GPU and CPU
https://github.com/eniompw/llama-cpp-gpu

colab colab-notebook gpu gpu-acceleration llama llama-cpp llamacpp

Last synced: 9 months ago
JSON representation

Load larger models by offloading model layers to both GPU and CPU

Awesome Lists containing this project

README

          

# LLaMA.cpp GPU

Offloads some of the model layers to the GPU, allowing larger models to be loaded

![model](https://github.com/eniompw/llama-cpp-gpu/blob/main/model-details.png)
![colab resources](https://github.com/eniompw/llama-cpp-gpu/blob/main/colab-resources.png)
![parameters](https://github.com/eniompw/llama-cpp-gpu/blob/main/size-parameters.png)

* [cuBLAS](https://github.com/ggerganov/llama.cpp#cublas)
* [Model](https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML#how-to-run-in-llamacpp)