{"id":46831326,"url":"https://github.com/embeddedllm/vllm-responses","last_synced_at":"2026-04-06T11:00:53.414Z","repository":{"id":343417460,"uuid":"1166316370","full_name":"EmbeddedLLM/vllm-responses","owner":"EmbeddedLLM","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-25T03:59:37.000Z","size":1070,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-25T17:15:21.801Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://embeddedllm.github.io/vllm-responses/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EmbeddedLLM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-25T05:06:21.000Z","updated_at":"2026-03-19T04:09:23.000Z","dependencies_parsed_at":null,"dependency_job_id":"c41eb5d8-ef8e-444d-a3d3-3950c190adf9","html_url":"https://github.com/EmbeddedLLM/vllm-responses","commit_stats":null,"previous_names":["embeddedllm/vllm-responses"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/EmbeddedLLM/vllm-responses","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmbeddedLLM%2Fvllm-responses","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmbeddedLLM%2Fvllm-responses/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmbeddedLLM%2Fvllm-responses/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmbeddedLLM%2Fvllm-responses/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EmbeddedLLM","download_url":"https://codeload.github.com/EmbeddedLLM/vllm-responses/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EmbeddedLLM%2Fvllm-responses/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31469743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-06T08:36:52.050Z","status":"ssl_error","status_checked_at":"2026-04-06T08:36:51.267Z","response_time":112,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-10T10:08:46.907Z","updated_at":"2026-04-06T11:00:53.393Z","avatar_url":"https://github.com/EmbeddedLLM.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# vLLM Responses\n\nFastAPI gateway that exposes an OpenAI-style **Responses API** (`/v1/responses`) in front of a vLLM **OpenAI-compatible** server (`/v1/chat/completions`), with:\n\n- SSE streaming event shape + ordering\n- `previous_response_id` statefulness (ResponseStore)\n- gateway-executed built-in tool: `code_interpreter`\n- gateway-hosted MCP tools (`tools[].type=\"mcp\"` with configured `server_label`)\n\nCurrent MCP boundary:\n\n- `tools[].type=\"mcp\"` is gateway-hosted MCP resolved via `VR_MCP_CONFIG_PATH`.\n- Request-declared MCP targets (`server_url`, `connector_id`) are not supported yet.\n\n**[📚 Full User Documentation](https://embeddedllm.github.io/vllm-responses/)** (Guides, API Reference, Examples)\n\nDesign docs (maintainer-facing): `design_docs/index.md`.\n\n## Install\n\nThe `vllm-responses` CLI is provided by the Python package in `responses/`.\n\n**Prerequisites:** Python 3.12+ and `uv`.\n\n### Install from a prebuilt wheel (Linux x86_64) (Recommended)\n\nDownload a prebuilt wheel (`vllm_responses-*.whl`) from GitHub Releases (preferred) or a CI run artifact, then install it:\n\n```bash\nuv venv --python=3.12\nsource .venv/bin/activate\nuv pip install vllm\nuv pip install path/to/vllm_responses-*.whl\n```\n\nOn Linux x86_64 wheels, the Code Interpreter server binary is bundled, so **Bun is not required**.\nCurrently, wheels are only built for Linux x86_64.\n\nInstalling `vllm-responses` provides:\n\n- `vllm-responses` for the standalone supervisor mode\n- `vllm` as a CLI shim that supports `vllm serve --responses` and delegates all non-Responses paths to the upstream\n  `vllm` Python package\n\n### Install from source (repo checkout) (Development)\n\n```bash\ngit clone https://github.com/EmbeddedLLM/vllm-responses\ncd vllm-responses\n\nuv venv --python=3.12\nsource .venv/bin/activate\nuv pip install vllm\nuv pip install -e ./responses\n\n# Development: enable Code Interpreter via Bun fallback\n# - Required for source checkouts when running with `code_interpreter` enabled (default)\ncd responses/python/vllm_responses/tools/code_interpreter\nbun install\nexport VR_CODE_INTERPRETER_DEV_BUN_FALLBACK=1\ncd -\n\nvllm-responses --help\n```\n\nVerify installation:\n\n```bash\nvllm-responses --help\nvllm --help\n```\n\n### Optional dependency sets (extras)\n\nInstall any combination via:\n\n```bash\nuv pip install -e './responses[\u003cextra1\u003e,\u003cextra2\u003e]'\n```\n\nAvailable extras:\n\n- `docs`: MkDocs toolchain (contributors).\n- `lint`: Ruff + Markdown formatting.\n- `test`: Pytest + coverage + load testing tools.\n- `tracing`: OpenTelemetry tracing support (only needed if you enable `VR_TRACING_ENABLED=true`).\n- `build`: Package build/publish tools.\n- `all`: Everything above.\n\n## Build a wheel from source\n\nIf you want to produce a local wheel from this checkout, build from the\n`responses/` package directory.\n\n### Rebuild the bundled Code Interpreter binary (Linux x86_64 only)\n\nThis step is only needed if you want the wheel to include a freshly compiled\nCode Interpreter binary.\n\n```bash\nbash scripts/ci/prebuild_code_interpreter_linux_x86_64.sh responses\n```\n\nThe script writes the bundled executable under:\n\n- `responses/python/vllm_responses/tools/code_interpreter/bin/linux/x86_64/code-interpreter-server`\n\n### Build wheel and sdist\n\n```bash\nuv pip install -e './responses[build]'\ncd responses\npython -m build --wheel --sdist\n```\n\nBuild artifacts are written to:\n\n- `responses/dist/`\n\nOn Linux x86_64, wheels built after the prebuild step bundle the native Code\nInterpreter binary. On other platforms, use the source-install Bun fallback or\ndisable Code Interpreter.\n\n## Run\n\n### remote-upstream gateway mode (`vllm-responses serve`)\n\nPrereqs:\n\n- If `code_interpreter` is enabled (default), the first start may download the Pyodide runtime (~400MB) into a cache\n    directory (see `VR_PYODIDE_CACHE_DIR`). This requires `tar` to be installed.\n- For non-Linux platforms (or source installs without the bundled binary), you can disable the tool via\n    `--code-interpreter disabled`. For development you can also enable the Bun-based fallback via\n    `VR_CODE_INTERPRETER_DEV_BUN_FALLBACK=1`.\n\nExternal upstream (you start vLLM yourself; `/v1` is optional):\n\n```bash\nvllm-responses serve --upstream http://127.0.0.1:8457\n```\n\nThe Responses endpoint is:\n\n- `POST http://127.0.0.1:5969/v1/responses`\n\nRemote access note:\n\n- If you bind the gateway with `--gateway-host 0.0.0.0`, use the machine’s IP/hostname to connect (not `0.0.0.0`).\n\n### integrated runtime (`vllm serve --responses`)\n\nPrereq:\n\n- install upstream `vllm` first, then install `vllm-responses` into the same environment\n\nExample:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 vllm serve Qwen/Qwen3.5-0.8B \\\n  --responses \\\n  --reasoning-parser qwen3 \\\n  --enable-auto-tool-choice \\\n  --tool-call-parser qwen3_coder \\\n  --host 0.0.0.0 \\\n  --port 8457\n```\n\nCLI help:\n\n- `vllm serve --help` shows upstream vLLM help\n- `vllm serve --responses --help` shows the Responses-owned integrated flags\n\n### Optional: ResponseStore hot cache (Redis)\n\n`previous_response_id` hydration reads the previous response state from the DB. For multi-worker deployments, you can optionally enable a Redis-backed hot cache to reduce DB reads/latency.\n\nEnv vars (default off):\n\n- `VR_RESPONSE_STORE_CACHE=1`\n- `VR_RESPONSE_STORE_CACHE_TTL_SECONDS=3600`\n\nRedis connection:\n\n- `VR_REDIS_HOST`, `VR_REDIS_PORT`\n\n## Quick smoke test (OpenAI Python SDK)\n\n```python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"http://127.0.0.1:5969/v1\", api_key=\"dummy\")\n\nwith client.responses.stream(\n    model=\"MiniMaxAI/MiniMax-M2.1\",\n    input=[{\"role\": \"user\", \"content\": \"You MUST call the code_interpreter tool. Execute: 2+2. Reply with ONLY the number.\"}],\n    tools=[{\"type\": \"code_interpreter\"}],\n    tool_choice=\"auto\",\n    include=[\"code_interpreter_call.outputs\"],\n) as stream:\n    for evt in stream:\n        if getattr(evt, \"type\", \"\").endswith(\".delta\"):\n            continue\n        print(getattr(evt, \"type\", evt))\n    r1 = stream.get_final_response().id\n\nwith client.responses.stream(\n    model=\"MiniMaxAI/MiniMax-M2.1\",\n    previous_response_id=r1,\n    input=[{\"role\": \"user\", \"content\": \"What number did you just compute? Reply with ONLY the number.\"}],\n    tool_choice=\"none\",\n) as stream:\n    for evt in stream:\n        if getattr(evt, \"type\", \"\").endswith(\".delta\"):\n            continue\n        print(getattr(evt, \"type\", evt))\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembeddedllm%2Fvllm-responses","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fembeddedllm%2Fvllm-responses","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fembeddedllm%2Fvllm-responses/lists"}