https://github.com/stabgan/openrouter-mcp-multimodal
MCP server for OpenRouter: 300+ LLMs with vision, image gen, audio in/out, and video analysis + generation (Veo 3.1 / Sora 2 Pro / Seedance / Wan). Structured errors, IPv6 SSRF guards, path sandbox.
https://github.com/stabgan/openrouter-mcp-multimodal
ai audio-generation audio-transcription claude docker image-analysis image-generation llm mcp mcp-server model-context-protocol multimodal nodejs openrouter seedance sora typescript veo video-generation video-understanding
Last synced: 2 months ago
JSON representation
MCP server for OpenRouter: 300+ LLMs with vision, image gen, audio in/out, and video analysis + generation (Veo 3.1 / Sora 2 Pro / Seedance / Wan). Structured errors, IPv6 SSRF guards, path sandbox.
- Host: GitHub
- URL: https://github.com/stabgan/openrouter-mcp-multimodal
- Owner: stabgan
- License: mit
- Created: 2025-03-26T16:49:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-05-03T16:36:45.000Z (2 months ago)
- Last Synced: 2026-05-03T17:29:13.709Z (2 months ago)
- Topics: ai, audio-generation, audio-transcription, claude, docker, image-analysis, image-generation, llm, mcp, mcp-server, model-context-protocol, multimodal, nodejs, openrouter, seedance, sora, typescript, veo, video-generation, video-understanding
- Language: TypeScript
- Homepage: https://www.npmjs.com/package/@stabgan/openrouter-mcp-multimodal
- Size: 1.74 MB
- Stars: 34
- Watchers: 2
- Forks: 18
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- metorial-index - OpenRouter Multimodal Server - Combines text chat and image analysis capabilities to conduct multimodal conversations and handle custom queries seamlessly. Optimizes workflows with intelligent model selection and performance improvements. (Multimodal Input Processing)
- toolsdk-mcp-registry - โ @stabgan/openrouter-mcp-multimodal
- awesome-mcp-servers - **openrouter-mcp-multimodal** - MCP server for OpenRouter providing text chat and image analysis tools `typescript` `mcp` `server` `npm install stabgan/openrouter-mcp-multimodal` (๐ Web Development)
- awesome-openrouter - Documentation
- awesome-mcp-servers - OpenRouter MCP Multimodal - MCP server for OpenRouter providing text chat and image analysis tools, enabling AI assistants to leverage multimodal capabilities through various LLM providers via OpenRouter. ([Read more](/details/openrouter-mcp-multimodal.md)) `Openrouter` `Multimodal` `Image Analysis` (AI & Machine Learning)
README
OpenRouter MCP Multimodal Server
The all-in-one MCP server for 300+ LLMs โ text, vision, audio, and video in a single package.
3,800+ installs across npm + Docker Hub ยท ~950 npm installs/month and accelerating
Install ยท
Tools ยท
Quick Start ยท
Config ยท
Examples ยท
Architecture ยท
Changelog
---
[](https://mseep.ai/app/8f27d6d4-0877-4b86-b377-8a33f451e755)
Access 300+ LLMs through [OpenRouter](https://openrouter.ai) via the [Model Context Protocol](https://modelcontextprotocol.io). Analyze images, audio, and video. Generate images, audio, and video. Chat with any model. Every tool returns structured `_meta.code` errors so MCP clients can switch on failure modes without parsing strings.
## One-Click Install
Kiro
Cursor
VS Code
VS Code Insiders
Claude DesktopInstall Guide โ Add to claude_desktop_config.json
WindsurfInstall Guide โ Add to ~/.codeium/windsurf/mcp_config.json
ClineInstall Guide โ Add via Cline MCP settings
Smitherynpx -y @smithery/cli install @stabgan/openrouter-mcp-multimodal --client claude
> After clicking, the target client opens a confirmation prompt. You'll need to paste your `OPENROUTER_API_KEY` โ the deeplink ships a placeholder so no secrets end up in shared links.
## Why This One?
| Feature | Status |
| :--- | :--- |
| Text chat with 300+ models | โ
|
| Image analysis (vision) | โ
Native with sharp optimization |
| Audio analysis | โ
Transcription + analysis, base64 auto-encoded |
| Audio generation | โ
Conversational, speech, and music with format auto-detection |
| Image generation | โ
Path-sandboxed disk output |
| **Video understanding** | โ
**v3** โ mp4, mpeg, mov, webm from files, URLs, or data URLs |
| **Video generation** | โ
**v3** โ Veo 3.1 / Sora 2 Pro / Seedance / Wan via async API with progress notifications |
| Auto image resize + compress | โ
Configurable (defaults 800px max, JPEG 80%) |
| Model search + validation | โ
Filter by vision / audio / video modality |
| Free model support | โ
Default: free Nemotron VL |
| Docker support | โ
Multi-arch (amd64 + arm64), ~345 MB Alpine |
| Retry-After + jitter | โ
Honors `Retry-After` header, avoids thundering herd |
| IPv4 + IPv6 SSRF blocklist | โ
Covers mapped, compat, multicast, 6to4, Teredo, ORCHID |
| Structured error taxonomy | โ
Closed `_meta.code` so clients can switch on failure modes |
| Reasoning-model awareness | โ
Detects `max_tokens` cutoff during CoT, guides the caller |
| MCP 2025 tool annotations | โ
`readOnlyHint` / `destructiveHint` / `idempotentHint` on every tool |
## Tools
| Tool | Description |
| :--- | :--- |
| `chat_completion` | Send messages to any OpenRouter model. Detects reasoning-model cutoffs. |
| `analyze_image` | Analyze images from local files, URLs, or data URIs. Auto-optimized with sharp. |
| `analyze_audio` | Analyze/transcribe audio (WAV, MP3, FLAC, OGG, etc.) from files, URLs, or data URIs. |
| `analyze_video` | Analyze/transcribe video (mp4, mpeg, mov, webm) from files, URLs, or data URIs. |
| `generate_image` | Generate images from text prompts. Optional path-sandboxed disk save. |
| `generate_audio` | Generate audio from text. Auto-detects format, wraps raw PCM in WAV. |
| `generate_video` | Generate video via OpenRouter's async API (Veo 3.1 / Sora 2 Pro / Seedance / Wan). Submits, polls, downloads, saves. |
| `get_video_status` | Resume polling a `generate_video` job by id. Download + save when complete. |
| `search_models` | Search/filter models by name, provider, or capabilities (vision / audio / video). |
| `get_model_info` | Get pricing, context length, and capabilities for any model. |
| `validate_model` | Check if a model ID exists on OpenRouter. |
> All error responses carry `_meta.code` from a closed taxonomy: `INVALID_INPUT` ยท `UNSAFE_PATH` ยท `UPSTREAM_HTTP` ยท `UPSTREAM_TIMEOUT` ยท `UPSTREAM_REFUSED` ยท `UNSUPPORTED_FORMAT` ยท `RESOURCE_TOO_LARGE` ยท `ZDR_INCOMPATIBLE` ยท `MODEL_NOT_FOUND` ยท `JOB_FAILED` ยท `JOB_STILL_RUNNING` ยท `INTERNAL`
## Quick Start
### Prerequisites
Get a free API key from [openrouter.ai/keys](https://openrouter.ai/keys).
### Option 1: npx (no install)
```json
{
"mcpServers": {
"openrouter": {
"command": "npx",
"args": ["-y", "@stabgan/openrouter-mcp-multimodal"],
"env": {
"OPENROUTER_API_KEY": "sk-or-v1-..."
}
}
}
}
```
### Option 2: Docker
```json
{
"mcpServers": {
"openrouter": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"-e", "OPENROUTER_API_KEY=sk-or-v1-...",
"stabgan/openrouter-mcp-multimodal:latest"
]
}
}
}
```
### Option 3: Global install
```bash
npm install -g @stabgan/openrouter-mcp-multimodal
```
```json
{
"mcpServers": {
"openrouter": {
"command": "openrouter-multimodal",
"env": { "OPENROUTER_API_KEY": "sk-or-v1-..." }
}
}
}
```
### Option 4: Smithery
```bash
npx -y @smithery/cli install @stabgan/openrouter-mcp-multimodal --client claude
```
## Configuration
Environment variables (click to expand)
| Variable | Required | Default | Description |
| :--- | :---: | :--- | :--- |
| `OPENROUTER_API_KEY` | Yes | โ | Your OpenRouter API key |
| `OPENROUTER_DEFAULT_MODEL` | No | `nvidia/nemotron-nano-12b-v2-vl:free` | Default model for chat + analyze tools |
| `DEFAULT_MODEL` | No | โ | Alias for above |
| `OPENROUTER_MODEL_CACHE_TTL_MS` | No | `3600000` | Model cache TTL (ms) |
| `OPENROUTER_IMAGE_MAX_DIMENSION` | No | `800` | Longest edge for resize (px) |
| `OPENROUTER_IMAGE_JPEG_QUALITY` | No | `80` | JPEG quality (1โ100) |
| `OPENROUTER_IMAGE_FETCH_TIMEOUT_MS` | No | `30000` | Image URL timeout |
| `OPENROUTER_IMAGE_MAX_DOWNLOAD_BYTES` | No | `26214400` | Image URL size cap (~25 MB) |
| `OPENROUTER_IMAGE_MAX_REDIRECTS` | No | `8` | Image URL redirect cap |
| `OPENROUTER_IMAGE_MAX_DATA_URL_BYTES` | No | `20971520` | Image data URL size cap (~20 MB) |
| `OPENROUTER_AUDIO_FETCH_TIMEOUT_MS` | No | `30000` | Audio URL timeout |
| `OPENROUTER_AUDIO_MAX_DOWNLOAD_BYTES` | No | `26214400` | Audio URL size cap (~25 MB) |
| `OPENROUTER_AUDIO_MAX_REDIRECTS` | No | `8` | Audio URL redirect cap |
| `OPENROUTER_AUDIO_MAX_DATA_URL_BYTES` | No | `20971520` | Audio data URL size cap |
| `OPENROUTER_DEFAULT_VIDEO_MODEL` | No | `google/gemini-2.5-flash` | Default for `analyze_video` |
| `OPENROUTER_DEFAULT_VIDEO_GEN_MODEL` | No | `google/veo-3.1` | Default for `generate_video` |
| `OPENROUTER_VIDEO_FETCH_TIMEOUT_MS` | No | `60000` | Video URL timeout |
| `OPENROUTER_VIDEO_MAX_DOWNLOAD_BYTES` | No | `104857600` | Video URL size cap (~100 MB) |
| `OPENROUTER_VIDEO_MAX_REDIRECTS` | No | `8` | Video URL redirect cap |
| `OPENROUTER_VIDEO_MAX_DATA_URL_BYTES` | No | `104857600` | Video data URL size cap |
| `OPENROUTER_VIDEO_POLL_INTERVAL_MS` | No | `15000` | Async video poll cadence |
| `OPENROUTER_VIDEO_MAX_WAIT_MS` | No | `600000` | Max wait before returning a resumable handle |
| `OPENROUTER_VIDEO_GEN_MAX_BYTES` | No | `268435456` | Generated video download cap (~256 MB) |
| `OPENROUTER_VIDEO_INLINE_MAX_BYTES` | No | `10485760` | Inline video ceiling (~10 MB) |
| `OPENROUTER_OUTPUT_DIR` | No | `process.cwd()` | Sandbox root for `save_path` |
| `OPENROUTER_ALLOW_UNSAFE_PATHS` | No | โ | `1` disables the sandbox |
| `OPENROUTER_LOG_LEVEL` | No | `info` | `error` / `warn` / `info` / `debug` |
### Security notes
- **Analyze tools** can read local files and fetch HTTP(S) URLs. URL fetches block private/link-local/reserved IPv4 and IPv6 targets (SSRF mitigation) and cap response size.
- **Generate tools** write to disk through a path sandbox: `save_path` is resolved against `OPENROUTER_OUTPUT_DIR` and any traversal attempt is rejected. Override with `OPENROUTER_ALLOW_UNSAFE_PATHS=1`.
- **IPv6 SSRF blocklist** covers loopback, unspecified, IPv4-mapped, IPv4-compatible, link-local, site-local, ULA, multicast, documentation, Teredo, ORCHID, and 6to4 of private IPv4.
## Usage Examples
```
# Chat
Use chat_completion to explain quantum computing in simple terms.
# Vision
Use analyze_image on /path/to/photo.jpg and tell me what you see.
# Audio transcription
Use analyze_audio on /path/to/recording.mp3 to transcribe it.
# Video understanding
Use analyze_video on /path/to/clip.mp4 โ what happens at 00:15?
# Generate audio
Use generate_audio with prompt "Explain neural networks" and voice "alloy", save to ./response.wav
# Generate music
Use generate_audio with model "google/lyria-3-clip-preview" and prompt "upbeat jazz piano trio"
# Generate image
Use generate_image with prompt "a cat astronaut on mars" and save to ./cat.png
# Generate video
Use generate_video with model "google/veo-3.1", prompt "a calm river at sunrise",
resolution 720p, duration 4, save to ./river.mp4
# Resume a video job
Use get_video_status with video_id "vid_abc123" and save_path "./river.mp4"
```
## Architecture
```
src/
โโโ index.ts # Entry, env validation, graceful shutdown
โโโ tool-handlers.ts # 11 tools (annotated) + dispatch
โโโ model-cache.ts # TTL + in-flight coalescing
โโโ openrouter-api.ts # REST client (chat + /videos)
โโโ errors.ts # Closed ErrorCode enum
โโโ logger.ts # JSON-line structured logger
โโโ tool-handlers/
โโโ fetch-utils.ts # SSRF, bounded fetch, data-URL parser
โโโ openrouter-errors.ts # SDK/HTTP โ ErrorCode classifier
โโโ completion-utils.ts # Reasoning-model cutoff detection
โโโ path-safety.ts # save_path sandbox
โโโ chat-completion.ts # Text + multimodal chat
โโโ analyze-image.ts # Vision analysis
โโโ analyze-audio.ts # Audio transcription
โโโ analyze-video.ts # Video understanding
โโโ generate-image.ts # Image generation
โโโ generate-audio.ts # Audio generation + streaming
โโโ generate-video.ts # Video generation (async)
โโโ image-utils.ts # Sharp optimization, MIME sniffing
โโโ audio-utils.ts # Audio format detection
โโโ video-utils.ts # Video format detection
โโโ search-models.ts # Model search
โโโ get-model-info.ts # Model detail lookup
โโโ validate-model.ts # Model existence check
```
## Development
```bash
git clone https://github.com/stabgan/openrouter-mcp-multimodal.git
cd openrouter-mcp-multimodal
npm install
cp .env.example .env # Add your API key
npm run build
npm start
```
```bash
npm test # 163 unit tests, <1s
npm run test:integration # Live API tests
npm run lint
node scripts/live-e2e.mjs # 16 live E2E scenarios
```
## Upgrading from v2
v3 is **additive** โ no tool schemas or env vars were removed.
- Three new tools: `analyze_video`, `generate_video`, `get_video_status`
- Structured `_meta.code` on every error response (text messages preserved)
- `save_path` sandboxed by default โ set `OPENROUTER_OUTPUT_DIR` or `OPENROUTER_ALLOW_UNSAFE_PATHS=1`
- Reasoning-model awareness: `content: null` + `finish_reason: length` now returns `INVALID_INPUT` with a preview instead of empty string
- IPv6 SSRF coverage extended to mapped, compat, multicast, 6to4, Teredo, ORCHID
## Compatibility
Works with any MCP client: [Kiro](https://kiro.dev) ยท [Claude Desktop](https://claude.ai/download) ยท [Cursor](https://cursor.sh) ยท [Windsurf](https://codeium.com/windsurf) ยท [Cline](https://github.com/cline/cline) ยท any MCP-compatible client.
## License
MIT
## Contributing
Issues and PRs welcome. Please open an issue first for major changes.