{"id":29018180,"url":"https://github.com/groundlight/mcp-vision","last_synced_at":"2026-04-29T09:02:11.586Z","repository":{"id":294275154,"uuid":"960085885","full_name":"groundlight/mcp-vision","owner":"groundlight","description":"Computer vision models as MCP servers","archived":false,"fork":false,"pushed_at":"2025-05-22T20:59:29.000Z","size":4081,"stargazers_count":27,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-06-25T23:41:29.806Z","etag":null,"topics":["computer-vision","mcp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/groundlight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-03T20:48:03.000Z","updated_at":"2025-06-21T20:05:06.000Z","dependencies_parsed_at":"2025-05-19T18:08:36.928Z","dependency_job_id":null,"html_url":"https://github.com/groundlight/mcp-vision","commit_stats":null,"previous_names":["groundlight/mcp-vision"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/groundlight/mcp-vision","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groundlight%2Fmcp-vision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groundlight%2Fmcp-vision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groundlight%2Fmcp-vision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groundlight%2Fmcp-vision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/groundlight","download_url":"https://codeload.github.com/groundlight/mcp-vision/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/groundlight%2Fmcp-vision/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32418173,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","mcp"],"created_at":"2025-06-25T23:39:23.339Z","updated_at":"2026-04-29T09:02:11.581Z","avatar_url":"https://github.com/groundlight.png","language":"Python","funding_links":[],"categories":["Media Processing","MCP Servers for Creative Work","📦 Other"],"sub_categories":["Image Processing","Computer Vision"],"readme":"\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/image0_and_claude_zoomed_in.png\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge\" alt=\"License: MIT\"\u003e\n  \u003ca href=\"https://www.groundlight.ai/blog/vision-as-mcp-service\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Read%20More-Blog-orange?style=for-the-badge\"  alt=\"Read More\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n# mcp-vision by \u003cimg src=\"images/gl_logo.png\" height=25\u003e\n\nA Model Context Protocol (MCP) server exposing HuggingFace computer vision models such as zero-shot object detection as tools, enhancing the vision capabilities of large language or vision-language models.\n\nThis repo is in active development. See below for details of currently available tools.\n\n## Installation\n\nClone the repo:\n```bash\ngit clone git@github.com:groundlight/mcp-vision.git\n```\n\nBuild a local docker image:\n```bash\ncd mcp-vision\nmake build-docker\n```\n\n## Configuring Claude Desktop\n\nAdd this to your `claude_desktop_config.json`:\n\nIf your local environment has access to a NVIDIA GPU:\n```json\n\"mcpServers\": {\n  \"mcp-vision\": {\n    \"command\": \"docker\",\n    \"args\": [\"run\", \"-i\", \"--rm\", \"--runtime=nvidia\", \"--gpus\", \"all\", \"mcp-vision\"],\n\t\"env\": {}\n  }\n}\n```\nOr, CPU only:\n```json\n\"mcpServers\": {\n  \"mcp-vision\": {\n    \"command\": \"docker\",\n    \"args\": [\"run\", \"-i\", \"--rm\", \"mcp-vision\"],\n\t\"env\": {}\n  }\n}\n```\nWhen running on CPU, the default large-size object detection model make take a long time to laod and run inference. Consider using a smaller model as `DEFAULT_OBJDET_MODEL` (you can tell Claude directly to use a specific model too). \n\n**(Beta)** It is possible to run the public docker image directly without building locally, however the download time may interfere with Claude's loading of the server. \n```json\n\"mcpServers\": {\n  \"mcp-vision\": {\n    \"command\": \"docker\",\n    \"args\": [\"run\", \"-i\", \"--rm\", \"--runtime=nvidia\", \"--gpus\", \"all\", \"groundlight/mcp-vision:latest\"],\n\t\"env\": {}\n  }\n}\n```\n\n## Tools\nThe following tools are currently available through the mcp-vision server:\n\n1. **locate_objects**\n- Description: Detect and locate objects in an image using one of the zero-shot object detection pipelines available \nthrough HuggingFace (list for reference [https://huggingface.co/models?pipeline_tag=zero-shot-object-detection\u0026sort=trending]). \n- Input: `image_path` (string) URL or file path, `candidate_labels` (list of strings) list of possible objects to detect, `hf_model` (optional string), will use `\"google/owlvit-large-patch14\"` by default, which could be slow on a non-GPU machine\n- Returns: List of dicts in HF object-detection format\n\n2. **zoom_to_object**\n- Description: Zoom into an object in the image, allowing you to analyze it more closely. Crop image to the object bounding box and return the cropped image. If many objects are present in the image, will return the 'best' one as represented by object score.\n- Input: `image_path` (string) URL or file path, `label` (string) object label to find and zoom and crop to, `hf_model` (optional), will use `\"google/owlvit-large-patch14\"` by default, which could be slow on a non-GPU machine\n- Returns: MCPImage or None\n\n\n## Example in blog post and video\n\nRun Claude Desktop with Claude Sonnet 3.7 and `mcp-vision` configured as an MCP server in `claude_desktop_config.json`. \n\nThe prompt used in the example video and blog post was: \n```\nFrom the information on that advertising board, what is the type of this shop?\nOptions:\nThe shop is a yoga studio.\nThe shop is a cafe.\nThe shop is a seven-eleven.\nThe shop is a milk tea shop.\n```\nThe image is the first image in the V*Bench/GPT4V-hard dataset and can be found here: https://huggingface.co/datasets/craigwu/vstar_bench/blob/main/GPT4V-hard/0.JPG (use the download link). \n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/claude_with_zoom_tool_large_font.webp\"\u003e\n\u003c/p\u003e\n\nNote: \n- If you upload the image directly into the conversation with Claude instead of providing a download link, it will not be able to call the tools and will attempt to answer directly. \n- On accounts that have web search enabled, Claude will prefer to use web search over local MCP tools AFAIK. Disable web search for best results. \n\n## Development\n\nRun locally using the \u003ca href=\"https://github.com/astral-sh/uv\"\u003e`uv`\u003c/a\u003e package manager:\n```bash\nuv install\nuv run python mcp_vision\n```\n\nBuild the Docker image locally:\n```bash\nmake build-docker\n```\n\nRun the Docker image locally:\n```bash\nmake run-docker-cpu\n```\nor \n```bash\nmake run-docker-gpu\n```\n\n[Groundlight Internal] Push the Docker image to Docker Hub (requires DockerHub credentials):\n```bash\nmake push-docker\n```\n\n## Troubleshooting\n\nIf Claude Desktop is failing to connect to `mcp-vision`:\n- Check the configuration is correct (CPU vs GPU)\n- Developer options may need to be enabled in Claude Desktop\n- Depending on the size of the model(s) used, give it a few minutes to download them from HuggingFace on first opening Claude Desktop. Once downloaded, the server will respond and Claude will connect.\n\nOn accounts that have web search enabled, Claude will prefer to use web search over local MCP tools AFAIK. Disable web search for best results. \n\n## TODO\n- Host best models online instead of requiring local download\n- Add more tools\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgroundlight%2Fmcp-vision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgroundlight%2Fmcp-vision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgroundlight%2Fmcp-vision/lists"}