{"id":29196652,"url":"https://github.com/vchitect/shotbench","last_synced_at":"2026-02-08T15:03:18.918Z","repository":{"id":302058741,"uuid":"1008264134","full_name":"Vchitect/ShotBench","owner":"Vchitect","description":null,"archived":false,"fork":false,"pushed_at":"2025-06-30T11:19:26.000Z","size":110790,"stargazers_count":6,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-30T11:21:24.264Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Vchitect.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-25T09:23:40.000Z","updated_at":"2025-06-30T11:19:29.000Z","dependencies_parsed_at":"2025-06-30T11:32:45.114Z","dependency_job_id":null,"html_url":"https://github.com/Vchitect/ShotBench","commit_stats":null,"previous_names":["vchitect/shotbench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Vchitect/ShotBench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FShotBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FShotBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FShotBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FShotBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Vchitect","download_url":"https://codeload.github.com/Vchitect/ShotBench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FShotBench/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266614394,"owners_count":23956362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-02T06:08:00.739Z","updated_at":"2026-02-08T15:03:13.792Z","avatar_url":"https://github.com/Vchitect.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models\n\n\u003cp align=\"center\"\u003e\n    \u003ca href='https://github.com/Alexios-hub' target='_blank'\u003eHongbo Liu\u003c/a\u003e\u003csup\u003e1, 3*\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://github.com/hejingwenhejingwen' target='_blank'\u003eJingwen He\u003c/a\u003e\u003csup\u003e2, 3*\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://github.com/MQN-80' target='_blank'\u003eYi Jin\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://zhengdian1.github.io/' target='_blank'\u003eDian Zheng\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?hl=zh-CN\u0026user=kMui170AAAAJ' target='_blank'\u003eYuhao Dong\u003c/a\u003e\u003csup\u003e4\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://github.com/zhangfan-p' target='_blank'\u003eFan Zhang\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://ziqihuangg.github.io/' target='_blank'\u003eZiqi Huang\u003c/a\u003e\u003csup\u003e4\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://scholar.google.com/citations?user=EgfF_CEAAAAJ\u0026hl=en' target='_blank'\u003eYinan He\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://yg256li.github.io/' target='_blank'\u003eYangguang Li\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://dblp.org/pid/98/120-1.html' target='_blank'\u003eWeichao Chen\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://mmlab.siat.ac.cn/yuqiao' target='_blank'\u003eYu Qiao\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://wlouyang.github.io/' target='_blank'\u003eWanli Ouyang\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://orcid.org/0000-0002-4301-394X' target='_blank'\u003eShengjie Zhao\u003c/a\u003e\u003csup\u003e1\u0026dagger;\u003c/sup\u003e,\u0026emsp;\n    \u003ca href='https://liuziwei7.github.io/' target='_blank'\u003eZiwei Liu\u003c/a\u003e\u003csup\u003e4\u0026dagger;\u003c/sup\u003e\u0026emsp;\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  (* equal contributions) \u0026nbsp;\u0026nbsp; († corresponding authors)\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003csup\u003e1\u003c/sup\u003e Tongji University \u0026emsp;\n  \u003csup\u003e2\u003c/sup\u003e The Chinese University of Hong Kong \u0026emsp;\u003cbr\u003e\n  \u003csup\u003e3\u003c/sup\u003e Shanghai Artificial Intelligence Laboratory \u0026emsp;\n  \u003csup\u003e4\u003c/sup\u003e S-Lab, Nanyang Technological University\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2506.21356\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Paper-arXiv%3A2506.21356-B31B1B?logo=arxiv\" alt=\"Paper\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/datasets/Vchitect/ShotBench\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Dataset-HuggingFace-orange?logo=huggingface\" alt=\"Dataset\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/collections/Vchitect/shot-vl-685e541cdc5583148b36c12f\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Model-ShotVL-green\" alt=\"Model\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://vchitect.github.io/ShotBench-project/\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Project\u0026nbsp;Page-Website-lightgrey?logo=googlechrome\" alt=\"Project Page\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.youtube.com/watch?v=MJBJlJEsPFM\"\u003e\n    \u003cimg src=\"assets/shotbench_demo.gif\" alt=\"ShotBench Demo (click to play)\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n## 🎬 Overview\n- We introduce **ShotBench**, a comprehensive benchmark for evaluating VLMs’ understanding of cinematic language. It comprises over 3.5 k expert-annotated QA pairs derived from images and video clips of over 200 critically acclaimed films (predominantly Oscar-nominated), covering eight distinct cinematography dimensions. This provides a rigorous new standard for assessing fine-grained visual comprehension in film.\n- We conducted an extensive evaluation of 24 leading VLMs, including prominent open-source and proprietary models, on ShotBench. Our results reveal a critical performance gap: even the most capable model, GPT-4o, achieves less than 60 % average accuracy. This systematically quantifies the current limitations of VLMs in genuine cinematographic comprehension.\n- To address the identified limitations and facilitate future research, we constructed **ShotQA**, the first large-scale multimodal dataset for cinematography understanding, containing approximately 70 k high-quality QA pairs. Leveraging ShotQA, we developed **ShotVL**, a novel VLM trained using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). ShotVL significantly surpasses all tested open-source and proprietary models, establishing a new **state-of-the-art** on ShotBench.\n\n## 🔥 News\n- **2025-07-7** Release **Evaluation** code.\n- **2025-07-2** Release [**ShotQA-70k**](https://huggingface.co/datasets/Vchitect/ShotQA) dataset.\n- **2025-06-27** Release [**ShotBench**](https://huggingface.co/datasets/Vchitect/ShotBench) **test** split.  \n- **2025-06-27** Release our paper: [**ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models**](https://arxiv.org/abs/2506.21356).  \n- **2025-06-27** Release[ **ShotVL-7B**](https://huggingface.co/Vchitect/ShotVL-7B) and [**ShotVL-3B**](https://huggingface.co/Vchitect/ShotVL-3B), these models are currently SOTA VLMs on cinematography understanding.\n\n## Installation\n\n```shell\nconda create -n shotbench python=3.10\nconda activate shotbench\npip install -r requirements.txt\n```\n\n## Evaluation\n\n### 1.Preparing ShotBench Test Data\n\n```shell\nmkdir -p evaluation/data \u0026\u0026 cd evaluation/data\nhuggingface-cli download --repo-type dataset Vchitect/ShotBench --local-dir ShotBench\ncd ShotBench\ntar -xvf images.tar\ntar -xvf videos.tar\ncd ../../../\n```\n\n### 2.Run Evaluation Code\n\nEvaluate ShotVL-3B with 4 GPUs:\n\n```shell\naccelerate launch --num_processes 4 evaluation/shotvl/evaluate.py --model ShotVL-3B --reasoning --output-dir eval_results\n```\n\nEvaluate ShotVL-7B with 4 GPUs:\n\n```shell\naccelerate launch --num_processes 4 evaluation/shotvl/evaluate.py --model ShotVL-7B --output-dir eval_results\n```\n\n### 3.Calculate Metrics\n\n```shell\nOPENAI_API_KEY=YOUR_OPENAI_APIKEY python evaluation/calculate_scores.py --prediction_path OUTPUT_FILE_PATH\n```\n\n## Evaluation Results\n\n\u003cdiv align=\"center\"\u003e\n\u003ctable\u003e\n  \u003ccaption\u003e\n    \u003csmall\u003e\n      Abbreviations:\u0026nbsp;\n      SS = \u003cem\u003eShot\u0026nbsp;Size\u003c/em\u003e,\u0026nbsp;\n      SF = \u003cem\u003eShot\u0026nbsp;Framing\u003c/em\u003e,\u0026nbsp;\n      CA = \u003cem\u003eCamera\u0026nbsp;Angle\u003c/em\u003e,\u0026nbsp;\n      LS = \u003cem\u003eLens\u0026nbsp;Size\u003c/em\u003e,\u0026nbsp;\n      LT = \u003cem\u003eLighting\u0026nbsp;Type\u003c/em\u003e,\u0026nbsp;\n      LC = \u003cem\u003eLighting\u0026nbsp;Conditions\u003c/em\u003e,\u0026nbsp;\n      SC = \u003cem\u003eShot\u0026nbsp;Composition\u003c/em\u003e,\u0026nbsp;\n      CM = \u003cem\u003eCamera\u0026nbsp;Movement\u003c/em\u003e.\u0026nbsp;\n      \u003cu\u003eUnderline\u003c/u\u003e marks previous best in each group.\u003cbr\u003e\n      \u003cstrong\u003eOur \u003cem\u003eShotVL\u003c/em\u003e models establish new SOTA.\u003c/strong\u003e\n    \u003c/small\u003e\n  \u003c/caption\u003e\u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModels\u003c/th\u003e\u003cth\u003eSS\u003c/th\u003e\u003cth\u003eSF\u003c/th\u003e\u003cth\u003eCA\u003c/th\u003e\u003cth\u003eLS\u003c/th\u003e\u003cth\u003eLT\u003c/th\u003e\n      \u003cth\u003eLC\u003c/th\u003e\u003cth\u003eSC\u003c/th\u003e\u003cth\u003eCM\u003c/th\u003e\u003cth\u003eAvg\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\u003ctbody\u003e\n  \u003ctr\u003e\u003cth colspan=\"10\"\u003e\u003cem\u003eOpen-Sourced\u0026nbsp;VLMs\u003c/em\u003e\u003c/th\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eQwen2.5-VL-3B-Instruct\u003c/td\u003e\u003ctd\u003e54.6\u003c/td\u003e\u003ctd\u003e56.6\u003c/td\u003e\u003ctd\u003e43.1\u003c/td\u003e\u003ctd\u003e36.6\u003c/td\u003e\u003ctd\u003e59.3\u003c/td\u003e\u003ctd\u003e45.1\u003c/td\u003e\u003ctd\u003e41.5\u003c/td\u003e\u003ctd\u003e31.9\u003c/td\u003e\u003ctd\u003e46.1\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eQwen2.5-VL-7B-Instruct\u003c/td\u003e\u003ctd\u003e69.1\u003c/td\u003e\u003ctd\u003e73.5\u003c/td\u003e\u003ctd\u003e53.2\u003c/td\u003e\u003ctd\u003e47.0\u003c/td\u003e\u003ctd\u003e60.5\u003c/td\u003e\u003ctd\u003e47.4\u003c/td\u003e\u003ctd\u003e49.9\u003c/td\u003e\u003ctd\u003e30.2\u003c/td\u003e\u003ctd\u003e53.8\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eLLaVA-NeXT-Video-7B\u003c/td\u003e\u003ctd\u003e35.9\u003c/td\u003e\u003ctd\u003e37.1\u003c/td\u003e\u003ctd\u003e32.5\u003c/td\u003e\u003ctd\u003e27.8\u003c/td\u003e\u003ctd\u003e50.9\u003c/td\u003e\u003ctd\u003e31.7\u003c/td\u003e\u003ctd\u003e28.0\u003c/td\u003e\u003ctd\u003e31.3\u003c/td\u003e\u003ctd\u003e34.4\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eLLaVA-Video-7B-Qwen2\u003c/td\u003e\u003ctd\u003e56.9\u003c/td\u003e\u003ctd\u003e65.4\u003c/td\u003e\u003ctd\u003e45.1\u003c/td\u003e\u003ctd\u003e36.0\u003c/td\u003e\u003ctd\u003e63.5\u003c/td\u003e\u003ctd\u003e45.4\u003c/td\u003e\u003ctd\u003e37.4\u003c/td\u003e\u003ctd\u003e35.3\u003c/td\u003e\u003ctd\u003e48.1\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eLLaVA-Onevision-Qwen2-7B-Ov-Chat\u003c/td\u003e\u003ctd\u003e58.4\u003c/td\u003e\u003ctd\u003e71.0\u003c/td\u003e\u003ctd\u003e52.3\u003c/td\u003e\u003ctd\u003e38.7\u003c/td\u003e\u003ctd\u003e59.5\u003c/td\u003e\u003ctd\u003e44.9\u003c/td\u003e\u003ctd\u003e50.9\u003c/td\u003e\u003ctd\u003e39.7\u003c/td\u003e\u003ctd\u003e51.9\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL2.5-8B\u003c/td\u003e\u003ctd\u003e56.3\u003c/td\u003e\u003ctd\u003e70.3\u003c/td\u003e\u003ctd\u003e50.8\u003c/td\u003e\u003ctd\u003e41.1\u003c/td\u003e\u003ctd\u003e60.2\u003c/td\u003e\u003ctd\u003e45.1\u003c/td\u003e\u003ctd\u003e50.1\u003c/td\u003e\u003ctd\u003e33.6\u003c/td\u003e\u003ctd\u003e50.9\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL3-2B\u003c/td\u003e\u003ctd\u003e56.3\u003c/td\u003e\u003ctd\u003e56.0\u003c/td\u003e\u003ctd\u003e44.4\u003c/td\u003e\u003ctd\u003e34.6\u003c/td\u003e\u003ctd\u003e56.8\u003c/td\u003e\u003ctd\u003e44.6\u003c/td\u003e\u003ctd\u003e43.0\u003c/td\u003e\u003ctd\u003e38.1\u003c/td\u003e\u003ctd\u003e46.7\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL3-8B\u003c/td\u003e\u003ctd\u003e62.1\u003c/td\u003e\u003ctd\u003e65.8\u003c/td\u003e\u003ctd\u003e46.8\u003c/td\u003e\u003ctd\u003e42.9\u003c/td\u003e\u003ctd\u003e58.0\u003c/td\u003e\u003ctd\u003e44.3\u003c/td\u003e\u003ctd\u003e46.8\u003c/td\u003e\u003ctd\u003e44.2\u003c/td\u003e\u003ctd\u003e51.4\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL3-14B\u003c/td\u003e\u003ctd\u003e59.6\u003c/td\u003e\u003ctd\u003e82.2\u003c/td\u003e\u003ctd\u003e55.4\u003c/td\u003e\u003ctd\u003e40.7\u003c/td\u003e\u003ctd\u003e61.7\u003c/td\u003e\u003ctd\u003e44.6\u003c/td\u003e\u003ctd\u003e51.1\u003c/td\u003e\u003ctd\u003e38.2\u003c/td\u003e\u003ctd\u003e54.2\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternlm-xcomposer2d5-7B\u003c/td\u003e\u003ctd\u003e51.1\u003c/td\u003e\u003ctd\u003e71.0\u003c/td\u003e\u003ctd\u003e39.8\u003c/td\u003e\u003ctd\u003e32.7\u003c/td\u003e\u003ctd\u003e59.3\u003c/td\u003e\u003ctd\u003e35.7\u003c/td\u003e\u003ctd\u003e35.7\u003c/td\u003e\u003ctd\u003e38.8\u003c/td\u003e\u003ctd\u003e45.5\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eOvis2-8B\u003c/td\u003e\u003ctd\u003e35.9\u003c/td\u003e\u003ctd\u003e37.1\u003c/td\u003e\u003ctd\u003e32.5\u003c/td\u003e\u003ctd\u003e27.8\u003c/td\u003e\u003ctd\u003e50.9\u003c/td\u003e\u003ctd\u003e31.7\u003c/td\u003e\u003ctd\u003e28.0\u003c/td\u003e\u003ctd\u003e35.3\u003c/td\u003e\u003ctd\u003e34.9\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eVILA1.5-3B\u003c/td\u003e\u003ctd\u003e33.4\u003c/td\u003e\u003ctd\u003e44.9\u003c/td\u003e\u003ctd\u003e32.1\u003c/td\u003e\u003ctd\u003e28.6\u003c/td\u003e\u003ctd\u003e50.6\u003c/td\u003e\u003ctd\u003e35.7\u003c/td\u003e\u003ctd\u003e28.4\u003c/td\u003e\u003ctd\u003e21.5\u003c/td\u003e\u003ctd\u003e34.4\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eVILA1.5-8B\u003c/td\u003e\u003ctd\u003e40.6\u003c/td\u003e\u003ctd\u003e44.5\u003c/td\u003e\u003ctd\u003e39.1\u003c/td\u003e\u003ctd\u003e29.7\u003c/td\u003e\u003ctd\u003e48.9\u003c/td\u003e\u003ctd\u003e32.9\u003c/td\u003e\u003ctd\u003e34.4\u003c/td\u003e\u003ctd\u003e36.9\u003c/td\u003e\u003ctd\u003e38.4\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eVILA1.5-13B\u003c/td\u003e\u003ctd\u003e36.7\u003c/td\u003e\u003ctd\u003e54.6\u003c/td\u003e\u003ctd\u003e40.7\u003c/td\u003e\u003ctd\u003e34.8\u003c/td\u003e\u003ctd\u003e52.8\u003c/td\u003e\u003ctd\u003e35.4\u003c/td\u003e\u003ctd\u003e34.2\u003c/td\u003e\u003ctd\u003e31.3\u003c/td\u003e\u003ctd\u003e40.1\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInstructblip-vicuna-7B\u003c/td\u003e\u003ctd\u003e27.0\u003c/td\u003e\u003ctd\u003e27.9\u003c/td\u003e\u003ctd\u003e34.5\u003c/td\u003e\u003ctd\u003e29.4\u003c/td\u003e\u003ctd\u003e44.4\u003c/td\u003e\u003ctd\u003e29.7\u003c/td\u003e\u003ctd\u003e27.1\u003c/td\u003e\u003ctd\u003e25.0\u003c/td\u003e\u003ctd\u003e30.6\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInstructblip-vicuna-13B\u003c/td\u003e\u003ctd\u003e26.8\u003c/td\u003e\u003ctd\u003e29.2\u003c/td\u003e\u003ctd\u003e27.9\u003c/td\u003e\u003ctd\u003e28.0\u003c/td\u003e\u003ctd\u003e39.0\u003c/td\u003e\u003ctd\u003e24.0\u003c/td\u003e\u003ctd\u003e27.1\u003c/td\u003e\u003ctd\u003e22.0\u003c/td\u003e\u003ctd\u003e28.0\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL2.5-38B\u003c/td\u003e\u003ctd\u003e67.8\u003c/td\u003e\u003ctd\u003e\u003cu\u003e85.4\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e55.4\u003c/td\u003e\u003ctd\u003e41.7\u003c/td\u003e\u003ctd\u003e61.7\u003c/td\u003e\u003ctd\u003e48.9\u003c/td\u003e\u003ctd\u003e52.4\u003c/td\u003e\u003ctd\u003e44.0\u003c/td\u003e\u003ctd\u003e57.2\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL3-38B\u003c/td\u003e\u003ctd\u003e68.0\u003c/td\u003e\u003ctd\u003e84.0\u003c/td\u003e\u003ctd\u003e51.9\u003c/td\u003e\u003ctd\u003e43.6\u003c/td\u003e\u003ctd\u003e64.4\u003c/td\u003e\u003ctd\u003e46.9\u003c/td\u003e\u003ctd\u003e54.7\u003c/td\u003e\u003ctd\u003e44.6\u003c/td\u003e\u003ctd\u003e57.3\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eQwen2.5-VL-32B-Instruct\u003c/td\u003e\u003ctd\u003e62.3\u003c/td\u003e\u003ctd\u003e76.6\u003c/td\u003e\u003ctd\u003e51.0\u003c/td\u003e\u003ctd\u003e48.3\u003c/td\u003e\u003ctd\u003e61.7\u003c/td\u003e\u003ctd\u003e44.0\u003c/td\u003e\u003ctd\u003e52.2\u003c/td\u003e\u003ctd\u003e43.8\u003c/td\u003e\u003ctd\u003e55.0\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eQwen2.5-VL-72B-Instruct\u003c/td\u003e\u003ctd\u003e\u003cu\u003e75.1\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e82.9\u003c/td\u003e\u003ctd\u003e56.7\u003c/td\u003e\u003ctd\u003e46.8\u003c/td\u003e\u003ctd\u003e59.0\u003c/td\u003e\u003ctd\u003e\u003cu\u003e49.4\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e54.1\u003c/td\u003e\u003ctd\u003e\u003cu\u003e48.9\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e59.1\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eInternVL3-78B\u003c/td\u003e\u003ctd\u003e69.7\u003c/td\u003e\u003ctd\u003e80.0\u003c/td\u003e\u003ctd\u003e54.5\u003c/td\u003e\u003ctd\u003e44.0\u003c/td\u003e\u003ctd\u003e\u003cu\u003e65.5\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e47.4\u003c/td\u003e\u003ctd\u003e51.8\u003c/td\u003e\u003ctd\u003e44.4\u003c/td\u003e\u003ctd\u003e57.2\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003cth colspan=\"10\"\u003e\u003cem\u003eProprietary\u0026nbsp;VLMs\u003c/em\u003e\u003c/th\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eGemini-2.0-flash\u003c/td\u003e\u003ctd\u003e48.9\u003c/td\u003e\u003ctd\u003e75.5\u003c/td\u003e\u003ctd\u003e44.6\u003c/td\u003e\u003ctd\u003e31.9\u003c/td\u003e\u003ctd\u003e62.2\u003c/td\u003e\u003ctd\u003e48.9\u003c/td\u003e\u003ctd\u003e52.4\u003c/td\u003e\u003ctd\u003e47.4\u003c/td\u003e\u003ctd\u003e51.5\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eGemini-2.5-flash-preview-04-17\u003c/td\u003e\u003ctd\u003e57.7\u003c/td\u003e\u003ctd\u003e82.9\u003c/td\u003e\u003ctd\u003e51.4\u003c/td\u003e\u003ctd\u003e43.8\u003c/td\u003e\u003ctd\u003e65.2\u003c/td\u003e\u003ctd\u003e45.7\u003c/td\u003e\u003ctd\u003e45.9\u003c/td\u003e\u003ctd\u003e43.5\u003c/td\u003e\u003ctd\u003e54.5\u003c/td\u003e\u003c/tr\u003e\n                            \u003ctr\u003e\u003ctd\u003eGPT-4o\u003c/td\u003e\u003ctd\u003e69.3\u003c/td\u003e\u003ctd\u003e83.1\u003c/td\u003e\u003ctd\u003e\u003cu\u003e58.2\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e\u003cu\u003e48.9\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e63.2\u003c/td\u003e\u003ctd\u003e48.0\u003c/td\u003e\u003ctd\u003e\u003cu\u003e55.2\u003c/u\u003e\u003c/td\u003e\u003ctd\u003e48.3\u003c/td\u003e\u003ctd\u003e\u003cu\u003e59.3\u003c/u\u003e\u003c/td\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003cth colspan=\"10\"\u003e\u003cem\u003eOurs\u003c/em\u003e\u003c/th\u003e\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003eShotVL-3B\n    \u003ca href=\"https://huggingface.co/Vchitect/ShotVL-3B\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Model-HF-yellow?logo=huggingface\" alt=\"HF\"\u003e\n    \u003c/a\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e77.9\u003c/td\u003e\u003ctd\u003e85.6\u003c/td\u003e\u003ctd\u003e68.8\u003c/td\u003e\u003ctd\u003e59.3\u003c/td\u003e\u003ctd\u003e65.7\u003c/td\u003e\n  \u003ctd\u003e53.1\u003c/td\u003e\u003ctd\u003e57.4\u003c/td\u003e\u003ctd\u003e51.7\u003c/td\u003e\u003ctd\u003e65.1\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003ctd\u003eShotVL-7B\n    \u003ca href=\"https://huggingface.co/Vchitect/ShotVL-7B\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/Model-HF-yellow?logo=huggingface\" alt=\"HF\"\u003e\n    \u003c/a\u003e\n  \u003c/td\u003e\n  \u003ctd\u003e81.2\u003c/td\u003e\u003ctd\u003e90.1\u003c/td\u003e\u003ctd\u003e78.0\u003c/td\u003e\u003ctd\u003e68.5\u003c/td\u003e\u003ctd\u003e70.1\u003c/td\u003e\n  \u003ctd\u003e64.3\u003c/td\u003e\u003ctd\u003e45.7\u003c/td\u003e\u003ctd\u003e62.9\u003c/td\u003e\u003ctd\u003e70.1\u003c/td\u003e\n\u003c/tr\u003e  \u003c/tbody\u003e\n\u003c/table\u003e\u003c/div\u003e\n\n## Open-Sourcing Plan\n\n- [ ] Release Training code.\n- [x] Release Evaluation code.\n- [x] Release **ShotQA-70k** dataset.\n- [x] Release **ShotBench** test set.\n- [x] Release **ShotVL** models.\n\n## BibTeX\n\n```\n@misc{\n      liu2025shotbench,\n      title={ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models}, \n      author={Hongbo Liu and Jingwen He and Yi Jin and Dian Zheng and Yuhao Dong and Fan Zhang and Ziqi Huang and Yinan He and Yangguang Li and Weichao Chen and Yu Qiao and Wanli Ouyang and Shengjie Zhao and Ziwei Liu},\n      year={2025},\n      eprint={2506.21356},\n      achivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https://arxiv.org/abs/2506.21356}, \n    }\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvchitect%2Fshotbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvchitect%2Fshotbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvchitect%2Fshotbench/lists"}