https://github.com/paulpierre/vllm-docker

test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB
https://github.com/paulpierre/vllm-docker

docker docker-compose llama llama3 llm llm-inference llms vllm

Last synced: 3 months ago
JSON representation

test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB

Host: GitHub
URL: https://github.com/paulpierre/vllm-docker
Owner: paulpierre
Created: 2024-11-21T22:40:48.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-21T23:18:16.000Z (over 1 year ago)
Last Synced: 2025-03-16T20:25:51.101Z (over 1 year ago)
Topics: docker, docker-compose, llama, llama3, llm, llm-inference, llms, vllm
Homepage:
Size: 2.93 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ▀▄▀ █▄ █▄ █▚▞▌ docker-compose

Run Llama-3.2-11B-Vision-Instruct with 4-bit quantization using vLLM and Open WebUI.

## Prerequisites
- NVIDIA GPU with CUDA support
- Docker and Docker Compose
- Hugging Face token
- Cloudflare tunnel token (optional, for remote access)

## Setup
1. Copy `env-example` to `.env` and fill in:
- `HF_TOKEN`: Your Hugging Face token
- `VLLM_API_KEY`: Generate a random API key
- `CLOUDFLARE_TUNNEL_TOKEN`: Your Cloudflare tunnel token (optional)

2. Start the services:
```bash
# Start all services
docker compose up -d

# View logs
docker compose logs -f
```

3. Access the UI:
- Local: http://localhost:3000
- Remote: If using Cloudflare tunnel, access via your configured domain

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/paulpierre/vllm-docker

Awesome Lists containing this project

README