https://github.com/kevinknights29/llama-v2-gpu-gtx-1650
Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.
https://github.com/kevinknights29/llama-v2-gpu-gtx-1650
docker gpu llama-cpp python
Last synced: 7 months ago
JSON representation
Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.
- Host: GitHub
- URL: https://github.com/kevinknights29/llama-v2-gpu-gtx-1650
- Owner: kevinknights29
- License: mit
- Created: 2023-07-31T04:01:45.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-03-04T02:53:59.000Z (over 1 year ago)
- Last Synced: 2025-03-30T02:04:30.423Z (8 months ago)
- Topics: docker, gpu, llama-cpp, python
- Language: Python
- Homepage:
- Size: 51.8 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Llama-v2-GPU-GTX-1650
Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.
## Setup
To extend your Nvidia GPU resource and drivers to a docker container.
You need to install [NVIDA CUDA Container Toolkit](https://github.com/NVIDIA/nvidia-container-toolkit)
## Results
### Llama.cpp recognizing cuBLAS optimizer

### After optimizing values for inference
```bash
N_GPU_LAYERS=35
N_BATCH=4096
N_THREADS=4
```

### Streaming support
### Generation Paramaters

## Usage
### Build APP Image
```bash
docker compose build
```
### Get everything up and running
```bash
docker compose down && docker compose up -d
```
### Have fun
Visit: `http://localhost:7861/` to access the Gradio Chatbot UI.
## Contributing
### Installing pre-commit
Pre-commit is already part of this project dependencies.
If you would like to installed it as standalone run:
```bash
pip install pre-commit
```
To activate pre-commit run the following commands:
- Install Git hooks:
```bash
pre-commit install
```
- Update current hooks:
```bash
pre-commit autoupdate
```
To test your installation of pre-commit run:
```bash
pre-commit run --all-files
```