https://github.com/paulpierre/vllm-docker
test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB
https://github.com/paulpierre/vllm-docker
docker docker-compose llama llama3 llm llm-inference llms vllm
Last synced: 2 months ago
JSON representation
test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB
- Host: GitHub
- URL: https://github.com/paulpierre/vllm-docker
- Owner: paulpierre
- Created: 2024-11-21T22:40:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-21T23:18:16.000Z (over 1 year ago)
- Last Synced: 2025-03-16T20:25:51.101Z (over 1 year ago)
- Topics: docker, docker-compose, llama, llama3, llm, llm-inference, llms, vllm
- Homepage:
- Size: 2.93 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ▀▄▀ █▄ █▄ █▚▞▌ docker-compose
Run Llama-3.2-11B-Vision-Instruct with 4-bit quantization using vLLM and Open WebUI.
## Prerequisites
- NVIDIA GPU with CUDA support
- Docker and Docker Compose
- Hugging Face token
- Cloudflare tunnel token (optional, for remote access)
## Setup
1. Copy `env-example` to `.env` and fill in:
- `HF_TOKEN`: Your Hugging Face token
- `VLLM_API_KEY`: Generate a random API key
- `CLOUDFLARE_TUNNEL_TOKEN`: Your Cloudflare tunnel token (optional)
2. Start the services:
```bash
# Start all services
docker compose up -d
# View logs
docker compose logs -f
```
3. Access the UI:
- Local: http://localhost:3000
- Remote: If using Cloudflare tunnel, access via your configured domain