https://github.com/outerbounds/vllm-ws-setup

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/outerbounds/vllm-ws-setup
Owner: outerbounds
Created: 2025-08-08T07:31:45.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-08-09T02:19:01.000Z (12 months ago)
Last Synced: 2025-08-09T04:10:52.270Z (11 months ago)
Language: Dockerfile
Size: 340 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Step 1. Create a vllm-enabled workstation

To run a 32B model, use a compute pool with a 4 GPU instance, such as `g5.12xlarge` on AWS.
Notice a few things:
1. The setting for shared memory is 10GB, the default is insufficient for IPC across GPU cards with vLLM.
2. Use an image that has Nvidia GPU drivers installed. This repository contains an [example image](./Dockerfile) that pre-installs vllm, PyTorch, and other dependencies. A public image is hosted at `docker.io/eddieob/vllm-flashinfer-metaflow` for demo purposes.

![](./vllm-ws.png)
![](./ws-setting-up.png)

## Step 2. Run vLLM

The image mentioned in the previous section already has `vllm` installed.
If you opt to bring your own image, please ensure you have `vllm` installed in the active environment.

### Run the OpenAI-compatible server

Choose your model and [inference server parameters](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).

```bash
vllm serve Qwen/Qwen3-32B --tensor-parallel-size 4
```

Gated HuggingFace models will require setting the `HF_TOKEN` environment variable to pull.
The initial load and model compilation can take around 10 minutes for larger models.

### Query the server

```
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-32B",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 100
}'
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/outerbounds/vllm-ws-setup

Awesome Lists containing this project

README