https://github.com/mathis-lambert/dockerize-llamacpp
https://github.com/mathis-lambert/dockerize-llamacpp
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mathis-lambert/dockerize-llamacpp
- Owner: mathis-lambert
- Created: 2023-07-27T20:41:51.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-27T20:42:02.000Z (almost 3 years ago)
- Last Synced: 2025-03-02T22:34:13.313Z (over 1 year ago)
- Size: 1.95 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dockeriser LLama.cpp
------------------------
## Prérequis
- Docker-Engine
- git
## Ressources
- Repo Github : https://github.com/ggerganov/llama.cpp/tree/master
- Page HF TheBloke : https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML
## Installation
#### 1. Cloner le repo
```bash
git clone https://github.com/ggerganov/llama.cpp.git llama
```
#### 2. Se placer dans le dossier du repo
```bash
cd llama
```
#### 3. Télécharger le modèle \{{bit_number}}, \{{version}}
```bash
llama-2-13b-chat.ggmlv3.q{{bit_number}}_{{version}}.bin
```
Par exemple pour la version 1.1 du q_4 bits :
(Pour les autres versions, se réferer à : [Page HF TheBloke](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML))
```bash
wget https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_1.bin
```
#### 4. Copier le modèle dans le dossier du ./models
```bash
mv llama-2-13b-chat.ggmlv3.q{{bit_number}}_{{version}}.bin ./models
```
## Utilisation
#### 1. Script sh à mettre dans llama.cpp
```bash
#!/bin/bash
./main -ins \
-f /llama/prompts/alpaca.txt \
-t 8 \
-ngl 1 \
-m /llama/models/llama-2-13b-chat.ggmlv3.q4_1.bin \
--color \
--temp 0.7 \
--repeat_penalty 1.1 \
-s 42 \
-n -1
```
#### 2. Dockerfile (dans llama.cpp)
```dockerfile
FROM ubuntu:latest
RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git
COPY . /llama
WORKDIR /llama
# RUN make
RUN make clean
RUN make
RUN chmod +x run.sh
CMD ["/llama/run.sh"]
```
#### 3. Build de l'image
```bash
docker build -t llama .
```
#### 4. Lancement du container
```bash
docker run -it llama
```
#### 5. Lancement du script
Le script démarrera automatiquement lors du démarrage du container en -it (interactive mode)
## DOCKER avec CUDA
Se renseigner [en cliquant ICI](https://github.com/ggerganov/llama.cpp/blob/master/README.md#docker-with-cuda)
- `make LLAMA_CUBLAS=1`
- `--gpus all`
- `--n-gpu-layers 1`