{"id":49346723,"url":"https://github.com/legionio/lex-llm-vllm","last_synced_at":"2026-05-03T14:01:02.500Z","repository":{"id":354076744,"uuid":"1221896450","full_name":"LegionIO/lex-llm-vllm","owner":"LegionIO","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-02T03:13:32.000Z","size":50,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-02T13:03:42.300Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LegionIO.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-26T20:20:30.000Z","updated_at":"2026-05-02T03:13:36.000Z","dependencies_parsed_at":"2026-05-02T13:01:08.923Z","dependency_job_id":null,"html_url":"https://github.com/LegionIO/lex-llm-vllm","commit_stats":null,"previous_names":["legionio/lex-llm-vllm"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/LegionIO/lex-llm-vllm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-llm-vllm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-llm-vllm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-llm-vllm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-llm-vllm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LegionIO","download_url":"https://codeload.github.com/LegionIO/lex-llm-vllm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LegionIO%2Flex-llm-vllm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32571456,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-27T08:02:52.123Z","updated_at":"2026-05-03T14:01:02.427Z","avatar_url":"https://github.com/LegionIO.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lex-llm-vllm\n\nLegionIO LLM provider extension for [vLLM](https://docs.vllm.ai/).\n\nThis gem lives under `Legion::Extensions::Llm::Vllm` and depends on `lex-llm` for shared provider-neutral routing, fleet, and schema primitives.\n\nLoad it with `require 'legion/extensions/llm/vllm'`.\n\n## What It Provides\n\n- `Legion::Extensions::Llm::Provider` registration as `:vllm`\n- Shared `Legion::Extensions::Llm::Provider::OpenAICompatible` request and response handling\n- Chat requests through `POST /v1/chat/completions`\n- Streaming chat with `stream_usage_supported?` for token usage reporting\n- Model discovery through `GET /v1/models`\n- Embeddings through `POST /v1/embeddings`\n- vLLM thinking mode via `chat_template_kwargs` (configurable through `Legion::Settings`)\n- Best-effort `llm.registry` readiness and model availability event publishing when transport is loaded\n- vLLM management helpers: `/health`, `/version`, `/reset_prefix_cache`, `/reset_mm_cache`, `/sleep`, `/wake_up`\n- Normalized OpenAI-compatible capability and modality metadata for discovered models\n- Shared fleet/default settings via `Legion::Extensions::Llm.provider_settings`\n- Full `Legion::Logging::Helper` integration with structured `handle_exception` across all classes\n\n## Defaults\n\n```ruby\nLegion::Extensions::Llm::Vllm.default_settings\n# {\n#   provider_family: :vllm,\n#   instances: {\n#     default: {\n#       endpoint: \"http://localhost:8000\",\n#       tier: :private,\n#       transport: :http,\n#       usage: { inference: true, embedding: true },\n#       limits: { concurrency: 8 }\n#     }\n#   }\n# }\n```\n\n## Configuration\n\n```ruby\nLegion::Extensions::Llm.configure do |config|\n  config.vllm_api_base = \"http://localhost:8000\"\n  config.vllm_api_key = ENV[\"VLLM_API_KEY\"]\n  config.default_model = \"meta-llama/Llama-3.1-8B-Instruct\"\n  config.default_embedding_model = \"BAAI/bge-base-en-v1.5\"\nend\n```\n\n### Thinking Mode\n\nEnable vLLM thinking mode globally via settings:\n\n```ruby\n# In Legion::Settings or settings JSON\n{ llm: { providers: { vllm: { enable_thinking: true } } } }\n```\n\nOr pass `thinking: { enabled: true }` per-request. When enabled, the provider adds `chat_template_kwargs: { enable_thinking: true }` to the payload and strips `reasoning_effort`.\n\n## Management Endpoints\n\nThe provider exposes helpers for vLLM server management:\n\n| Method | Endpoint | Description |\n|--------|----------|-------------|\n| `health` | `GET /health` | Server health check |\n| `version` | `GET /version` | Server version info |\n| `reset_prefix_cache` | `POST /reset_prefix_cache` | Clear prefix cache |\n| `reset_mm_cache` | `POST /reset_mm_cache` | Clear multimodal cache |\n| `sleep(level:)` | `POST /sleep` | Put server to sleep |\n| `wake_up(tags:)` | `POST /wake_up` | Wake server up |\n\n## Registry Publishing\n\nWhen `lex-llm` routing and Legion transport are available, the provider publishes best-effort availability events to the `llm.registry` exchange:\n\n- **Readiness events** on `readiness(live: true)` calls\n- **Model availability events** on `list_models` discovery\n\nPublishing is async (background threads) and never blocks the caller. All failures are handled gracefully via `handle_exception`.\n\n## Development\n\n```bash\nbundle install\nbundle exec rspec\nbundle exec rubocop\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flegionio%2Flex-llm-vllm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flegionio%2Flex-llm-vllm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flegionio%2Flex-llm-vllm/lists"}