{"id":51343719,"url":"https://github.com/kitdevua/video-vision-mcp","last_synced_at":"2026-07-02T10:00:39.627Z","repository":{"id":368480779,"uuid":"1285285448","full_name":"KitDevUA/video-vision-mcp","owner":"KitDevUA","description":"MCP server that turns any video (file, URL, or Jira attachment) into frames + transcript for Claude Code","archived":false,"fork":false,"pushed_at":"2026-06-30T18:27:29.000Z","size":169,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-30T20:13:56.834Z","etag":null,"topics":["ai","anthropic","claude","claude-code","ffmpeg","gemini","jira","llm","mcp","mcp-server","model-context-protocol","python","transcription","video","video-analysis","whisper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KitDevUA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-30T16:47:56.000Z","updated_at":"2026-06-30T18:27:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/KitDevUA/video-vision-mcp","commit_stats":null,"previous_names":["kitdevua/video-vision-mcp"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/KitDevUA/video-vision-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KitDevUA%2Fvideo-vision-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KitDevUA%2Fvideo-vision-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KitDevUA%2Fvideo-vision-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KitDevUA%2Fvideo-vision-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KitDevUA","download_url":"https://codeload.github.com/KitDevUA/video-vision-mcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KitDevUA%2Fvideo-vision-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35041999,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-02T02:00:06.368Z","response_time":173,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","anthropic","claude","claude-code","ffmpeg","gemini","jira","llm","mcp","mcp-server","model-context-protocol","python","transcription","video","video-analysis","whisper"],"created_at":"2026-07-02T10:00:26.111Z","updated_at":"2026-07-02T10:00:39.622Z","avatar_url":"https://github.com/KitDevUA.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# video-vision-mcp\n\n[![CI](https://github.com/KitDevUA/video-vision-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/KitDevUA/video-vision-mcp/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/video-vision-mcp)](https://pypi.org/project/video-vision-mcp/)\n![Python](https://img.shields.io/badge/python-3.10%2B-blue)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n\n\u003c!-- mcp-name: io.github.KitDevUA/video-vision-mcp --\u003e\n\nAn MCP server that gives Claude Code the ability to **analyze any video** —\na local file or a URL — through one set of tools.\n\nClaude can't watch video natively (only text + the first frame of an image).\nThis server converts a video into **sampled frame images + an audio transcript**,\nor — when a Gemini key is present — a **native Gemini analysis** of the whole video.\n\nIt is **standalone**: give it a ready video (a local path or a direct URL) and it\ndoes the rest. It does not connect to Jira/Slack/etc. If a video lives behind an\nintegration, fetch it with that integration first (download to a file or get a\ndirect URL), then hand the `file_path` or `url` to this server.\n\n\u003e Scenario: a Jira bug ticket has only a screen-recording, no text. Your Jira MCP\n\u003e downloads the attachment to a temp file → `analyze_video file_path=/tmp/bug.mp4`\n\u003e → you see the frames + transcript (or Gemini's analysis) and can reason about the bug.\n\n## Three backend tiers (auto-selected)\n\n| Tier | Needs | What it does |\n|---|---|---|\n| **1 — local** (default) | nothing | `ffmpeg` frames + `whisper.cpp` transcript. Free, fully local, always works. |\n| **2 — cloud ASR** | `OPENAI_API_KEY` or `GROQ_API_KEY` | Local frames, but transcription via OpenAI Whisper / Groq for higher quality. |\n| **3 — native Gemini** | `GEMINI_API_KEY` | Gemini ingests the whole video (visual + audio) in one call, with MM:SS timestamps. Default when the key is set. |\n\nPrecedence: **Gemini \u003e OpenAI \u003e Groq \u003e local.** Set `VIDEO_MCP_DISABLE_GEMINI=true`\nto force tiers 1/2 even with a Gemini key. The backend used is named in every result.\n\n**Privacy:** tier 1 never uploads anything. Tiers 2/3 print a one-time notice in\nthe session the first time video content is sent to a third party.\n\n## Tools\n\n- `analyze_video` — frames + transcript + metadata (the main tool). `frame_interval`\n  sets seconds between frames (default 1.0; e.g. 0.5/0.25/0.1 denser, 2/5 sparser).\n- `get_video_transcript_only` — transcript text only.\n- `extract_frames_at` — frames at specific timestamps (`\"00:42\"`, `\"1:05\"`, `12.5`).\n- `list_recent_analyses` — cached analyses + backend used.\n\n## Install\n\nRequires **Python ≥ 3.10**. A single install pulls everything — backends, plus the\nffmpeg and whisper.cpp dependencies. **Nothing is ever installed globally on your\nmachine** (no brew/apt/winget, no sudo).\n\n### Use it (recommended)\n\nWith [uv](https://docs.astral.sh/uv/) you don't install it explicitly — `uvx` runs\nthe published package on demand (see [Register in Claude Code](#register-in-claude-code)).\nTo install into an environment instead:\n\n```bash\nuv pip install video-vision-mcp     # or: pip install video-vision-mcp\n```\n\n### From source (development)\n\n```bash\ngit clone https://github.com/KitDevUA/video-vision-mcp.git\ncd video-vision-mcp\nuv venv \u0026\u0026 source .venv/bin/activate\nuv pip install -e \".[dev]\"          # all backends bundled\n```\n\n### Dependencies — fully self-contained\n\n- **ffmpeg / ffprobe**: if they are already on your `PATH`, those system binaries\n  are used. Otherwise the bundled `static-ffmpeg` package supplies them (fetched\n  once into its own local cache — never a system-wide install).\n- **whisper.cpp** (tier 1 transcription): shipped as the bundled `pywhispercpp`\n  binding (prebuilt wheels; builds from source only if no wheel exists for your\n  platform/Python). A `whisper-cli` already on `PATH` is used if present.\n- **whisper model**: the ggml model (`base` by default) downloads from Hugging\n  Face into the cache on first transcription. Override with\n  `VIDEO_MCP_WHISPER_MODEL` (`tiny`/`base`/`small`/`medium`/`large-v3`) or\n  `VIDEO_MCP_WHISPER_MODEL_PATH`.\n- **cloud-only**: set `OPENAI_API_KEY` / `GROQ_API_KEY` (tier 2) or\n  `GEMINI_API_KEY` (tier 3); whisper.cpp is then never invoked.\n\n## Configure\n\n```bash\ncp env.example .env\n# edit .env — nothing is required for tier 1\n```\n\nSee `env.example` for every variable — all optional (API keys and tuning). Tier 1\nneeds none.\n\n## Register in Claude Code\n\nAdd to your project `.mcp.json` (or global config) — see `.mcp.json.example`:\n\n```json\n{\n  \"mcpServers\": {\n    \"video-vision\": {\n      \"command\": \"uvx\",\n      \"args\": [\"video-vision-mcp\"],\n      \"env\": { \"VIDEO_MCP_ENV\": \"/abs/path/to/.env\" }\n    }\n  }\n}\n```\n\n`uvx` downloads and runs the published package automatically — no manual install\nstep. `VIDEO_MCP_ENV` is optional (tier 1 needs no keys); point it at your `.env`\nif you use the cloud backends. For local development against a checkout, use\n`\"args\": [\"--from\", \"/abs/path/to/video-vision-mcp\", \"video-vision-mcp\"]` instead.\nRestart Claude Code; the `video-vision` tools then appear.\n\n## Cache\n\nResults are cached at `~/.cache/video-vision-mcp/` keyed by **(file hash,\nbackend, frame interval)** — re-analyzing the same video is instant, and\nswitching backends or intervals keeps each result separately. Downloaded URLs and\nwhisper models live under the same dir. Override with `VIDEO_MCP_CACHE_DIR`.\n\nCached analyses and downloaded videos older than `VIDEO_MCP_CACHE_TTL_HOURS`\n(default **24**) are pruned on startup and skipped on read; set `0` to keep them\nforever. Whisper models are never pruned (expensive to re-download).\n\n## Using it with an integration (e.g. Jira, Slack)\n\nThis server is deliberately standalone — it never talks to Jira, Slack, or any\nother service. When a video lives behind an integration, let that integration's\nMCP fetch it, then pass the result here:\n\n1. The integration MCP downloads the attachment to a local file (or gives a\n   direct, publicly reachable URL — an authenticated API URL won't work with `url`).\n2. Call `analyze_video file_path=\u003cdownloaded file\u003e` (or `url=\u003cdirect link\u003e`).\n\nThis keeps auth and service-specific logic where it belongs, and lets one video\ntool serve every source.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkitdevua%2Fvideo-vision-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkitdevua%2Fvideo-vision-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkitdevua%2Fvideo-vision-mcp/lists"}