{"id":30198859,"url":"https://github.com/outerbounds/vllm-ws-setup","last_synced_at":"2026-02-08T09:34:02.863Z","repository":{"id":308983372,"uuid":"1034312781","full_name":"outerbounds/vllm-ws-setup","owner":"outerbounds","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-09T02:19:01.000Z","size":348,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-09T04:10:52.270Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outerbounds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-08T07:31:45.000Z","updated_at":"2025-08-09T02:19:05.000Z","dependencies_parsed_at":"2025-08-09T04:12:14.818Z","dependency_job_id":"b730a9b9-1ac8-42fd-a030-cdce0447be10","html_url":"https://github.com/outerbounds/vllm-ws-setup","commit_stats":null,"previous_names":["outerbounds/vllm-ws-setup"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/outerbounds/vllm-ws-setup","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fvllm-ws-setup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fvllm-ws-setup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fvllm-ws-setup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fvllm-ws-setup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outerbounds","download_url":"https://codeload.github.com/outerbounds/vllm-ws-setup/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fvllm-ws-setup/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29226470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-08T09:15:18.648Z","status":"ssl_error","status_checked_at":"2026-02-08T09:14:33.745Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-13T07:29:45.075Z","updated_at":"2026-02-08T09:34:02.857Z","avatar_url":"https://github.com/outerbounds.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Step 1. Create a vllm-enabled workstation\n\nTo run a 32B model, use a compute pool with a 4 GPU instance, such as `g5.12xlarge` on AWS.\nNotice a few things:\n1. The setting for shared memory is 10GB, the default is insufficient for IPC across GPU cards with vLLM.\n2. Use an image that has Nvidia GPU drivers installed. This repository contains an [example image](./Dockerfile) that pre-installs vllm, PyTorch, and other dependencies. A public image is hosted at `docker.io/eddieob/vllm-flashinfer-metaflow` for demo purposes. \n\n![](./vllm-ws.png)\n![](./ws-setting-up.png)\n\n## Step 2. Run vLLM \n\nThe image mentioned in the previous section already has `vllm` installed.\nIf you opt to bring your own image, please ensure you have `vllm` installed in the active environment.\n\n### Run the OpenAI-compatible server\n\nChoose your model and [inference server parameters](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).\n\n```bash\nvllm serve Qwen/Qwen3-32B --tensor-parallel-size 4\n```\n\nGated HuggingFace models will require setting the `HF_TOKEN` environment variable to pull. \nThe initial load and model compilation can take around 10 minutes for larger models. \n\n### Query the server\n\n```\ncurl -X POST http://localhost:8000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"Qwen/Qwen3-32B\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n    ],\n    \"temperature\": 0.7,\n    \"max_tokens\": 100\n  }'\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fvllm-ws-setup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fouterbounds%2Fvllm-ws-setup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fvllm-ws-setup/lists"}