https://github.com/idea-research/dino-x-mcp
Official DINO-X Model Context Protocol (MCP) server that empowers LLMs with real-world visual perception through image object detection, localization, and captioning APIs.
https://github.com/idea-research/dino-x-mcp
image-recognition mcp mcp-server object-detection pose-estimation
Last synced: about 1 month ago
JSON representation
Official DINO-X Model Context Protocol (MCP) server that empowers LLMs with real-world visual perception through image object detection, localization, and captioning APIs.
- Host: GitHub
- URL: https://github.com/idea-research/dino-x-mcp
- Owner: IDEA-Research
- License: apache-2.0
- Created: 2025-06-04T09:04:21.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-08-08T09:06:00.000Z (3 months ago)
- Last Synced: 2025-09-15T10:44:14.486Z (2 months ago)
- Topics: image-recognition, mcp, mcp-server, object-detection, pose-estimation
- Language: TypeScript
- Homepage: https://cloud.deepdataspace.com/
- Size: 33.6 MB
- Stars: 54
- Watchers: 1
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- toolsdk-mcp-registry - ❌ @deepdataspace/dinox-mcp - grained visual understanding — detect, localize, and describe anything in images with natural language prompts. (node) (Art & Culture / How to Submit)
README
# DINO-X MCP
[](https://opensource.org/licenses/Apache-2.0) [](https://www.npmjs.com/package/@deepdataspace/dinox-mcp) [](https://www.npmjs.com/package/@deepdataspace/dinox-mcp) [](https://github.com/IDEA-Research/DINO-X-MCP/pulls) [](https://lobehub.com/mcp/idea-research-dino-x-mcp) [](https://github.com/IDEA-Research/DINO-X-MCP/stargazers)
**English** | [中文](README_ZH.md)
DINO-X Official MCP Server — powered by the DINO-X and Grounding DINO models — brings fine-grained object detection and image understanding to your multimodal applications.
Your browser does not support the video tag.
## Why DINO-X MCP?
With DINO-X MCP, you can:
- Fine-Grained Understanding: Full image detection, object detection, and region-level descriptions.
- Structured Outputs: Get object categories, counts, locations, and attributes for VQA and multi-step reasoning tasks.
- Composable: Works seamlessly with other MCP servers to build end-to-end visual agents or automation pipelines.
## Transport Modes
DINO-X MCP supports two transport modes:
| Feature | STDIO (default) | Streamable HTTP |
|:--|:--|:--|
| Runtime | Local | Local or Cloud |
| Transport | Standard I/O | HTTP (streaming responses) |
| Input source | `file://` and `https://` | `https://` only |
| Visualization | Supported (saves annotated images locally) | Not supported (for now) |
## Quick Start
### 1. Prepare an MCP client
Any MCP-compatible client works, e.g.:
- [Cursor](https://www.cursor.com/)
- [WindSurf](https://windsurf.com/)
- [Trae](https://www.trae.ai/)
- [Cherry Studio](https://www.cherry-ai.com/)
### 2. Get your API key
Apply on the DINO-X platform: [Request API Key](https://cloud.deepdataspace.com/request_api) (new users get free quota).
### 3. Configure MCP
#### Option A: Official Hosted Streamable HTTP (Recommended)
Add to your MCP client config and replace with your API key:
```json
{
"mcpServers": {
"dinox-mcp": {
"url": "https://mcp.deepdataspace.com/mcp?key=your-api-key"
}
}
}
```
#### Option B: Use the NPM package locally (STDIO)
Install Node.js first
- Download the installer from [nodejs.org](https://nodejs.org/)
- Or use command:
```bash
# macOS / Linux
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# or
wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# load nvm into current shell (choose the one you use)
source ~/.bashrc || true
source ~/.zshrc || true
# install and use LTS Node.js
nvm install --lts
nvm use --lts
# Windows (one of the following)
winget install OpenJS.NodeJS.LTS
# or with Chocolatey (in admin PowerShell)
iwr -useb https://raw.githubusercontent.com/chocolatey/chocolatey/master/chocolateyInstall/InstallChocolatey.ps1 | iex
choco install nodejs-lts -y
```
Configure your MCP client:
```json
{
"mcpServers": {
"dinox-mcp": {
"command": "npx",
"args": ["-y", "@deepdataspace/dinox-mcp"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
```
Note: Replace `your-api-key-here` with your real key.
#### Option C: Run from source locally
Make sure Node.js is installed (see Option B), then:
```bash
# clone
git clone https://github.com/IDEA-Research/DINO-X-MCP.git
cd DINO-X-MCP
# install deps
npm install
# build
npm run build
```
Configure your MCP client:
```json
{
"mcpServers": {
"dinox-mcp": {
"command": "node",
"args": ["/path/to/DINO-X-MCP/build/index.js"],
"env": {
"DINOX_API_KEY": "your-api-key-here",
"IMAGE_STORAGE_DIRECTORY": "/path/to/your/image/directory"
}
}
}
}
```
## CLI Flags & Environment Variables
- Common flags
- `--http`: start in Streamable HTTP mode (otherwise STDIO by default)
- `--stdio`: force STDIO mode
- `--dinox-api-key=...`: set API key
- `--enable-client-key`: allow API key via URL `?key=` (Streamable HTTP only)
- `--port=8080`: HTTP port (default 3020)
- Environment variables
- `DINOX_API_KEY` (required/conditionally required): DINO-X platform API key
- `IMAGE_STORAGE_DIRECTORY` (optional, STDIO): directory to save annotated images
- `AUTH_TOKEN` (optional, HTTP): if set, client must send `Authorization: Bearer `
Examples:
```bash
# STDIO (local)
node build/index.js --dinox-api-key=your-api-key
# Streamable HTTP (server provides a shared API key)
node build/index.js --http --dinox-api-key=your-api-key
# Streamable HTTP (custom port)
node build/index.js --http --dinox-api-key=your-api-key --port=8080
# Streamable HTTP (require client-provided API key via URL)
node build/index.js --http --enable-client-key
```
Client config when using `?key=`:
```json
{
"mcpServers": {
"dinox-mcp": {
"url": "http://localhost:3020/mcp?key=your-api-key"
}
}
}
```
Using `AUTH_TOKEN` with a gateway that injects `Authorization: Bearer `:
```bash
AUTH_TOKEN=my-token node build/index.js --http --enable-client-key
```
Client example with `supergateway`:
```json
{
"mcpServers": {
"dinox-mcp": {
"command": "npx",
"args": [
"-y",
"supergateway",
"--streamableHttp",
"http://localhost:3020/mcp?key=your-api-key",
"--oauth2Bearer",
"my-token"
]
}
}
}
```
## Tools
| Capability | Tool ID | Transport | Input | Output |
|:--|:--|:--|:--|:--|
| Full-scene object detection | `detect-all-objects` | STDIO / HTTP | Image URL | Category + bbox + (optional) captions |
| Text-prompted object detection | `detect-objects-by-text` | STDIO / HTTP | Image URL + English nouns (dot-separated for multiple, e.g., `person.car`) | Target object bbox + (optional) captions |
| Human pose estimation | `detect-human-pose-keypoints` | STDIO / HTTP | Image URL | 17 keypoints + bbox + (optional) captions |
| Visualization | `visualize-detection-result` | STDIO only | Image URL + detection results array | Local path to annotated image |
## 🎬 Use Cases
| 🎯 Scenario | 📝 Input | ✨ Output |
|---------|---------|---------|
| **Detection & Localization** | **💬 Prompt:**
`Detect and visualize the `
`fire areas in the forest `
**🖼️ Input Image:**
| |
| **Object Counting** | **💬 Prompt:**
`Please analyze this`
`warehouse image, detect`
`all the cardboard boxes,`
`count the total number`
**🖼️ Input Image:**
|
|
| **Feature Detection** | **💬 Prompt:**
`Find all red cars`
`in the image`
**🖼️ Input Image:**
||
| **Attribute Reasoning** | **💬 Prompt:**
`Find the tallest person`
`in the image, describe`
`their clothing`
**🖼️ Input Image:**
 |  |
| **Full Scene Detection** | **💬 Prompt:**
`Find the fruit with`
`the highest vitamin C`
`content in the image`
**🖼️ Input Image:**
| 
*Answer: Kiwi fruit (93mg/100g)* |
| **Pose Analysis** | **💬 Prompt:**
`Please analyze what`
`yoga pose this is`
**🖼️ Input Image:**
 ||
## FAQ
- Supported image sources?
- STDIO: `file://` and `https://`
- Streamable HTTP: `https://` only
- Supported image formats?
- jpg, jpeg, webp, png
## Development & Debugging
Use watch mode to auto-rebuild during development:
```bash
npm run watch
```
Use MCP Inspector for debugging:
```bash
npm run inspector
```
## License
Apache License 2.0