{"id":20161906,"url":"https://github.com/ailab-cvc/seed-bench","last_synced_at":"2025-05-16T11:06:28.315Z","repository":{"id":184751710,"uuid":"671103085","full_name":"AILab-CVC/SEED-Bench","owner":"AILab-CVC","description":"(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.","archived":false,"fork":false,"pushed_at":"2025-01-14T08:22:43.000Z","size":27170,"stargazers_count":337,"open_issues_count":21,"forks_count":13,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-12T08:25:49.444Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-26T14:43:46.000Z","updated_at":"2025-04-09T06:59:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"957ef9ce-1cc3-4803-9c56-f7568286f26c","html_url":"https://github.com/AILab-CVC/SEED-Bench","commit_stats":null,"previous_names":["ailab-cvc/seed-bench"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FSEED-Bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FSEED-Bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FSEED-Bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FSEED-Bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/SEED-Bench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254518383,"owners_count":22084374,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T00:21:43.421Z","updated_at":"2025-05-16T11:06:23.306Z","avatar_url":"https://github.com/AILab-CVC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SEED-Bench: Benchmarking Multimodal Large Language Models\r\n[SEED-Bench-H](https://github.com/AILab-CVC/SEED-Bench/blob/main/SEED-Bench-H/SEED-Bench-H.pdf)\r\n\r\n[SEED-Bench-2-Plus Arxiv](https://arxiv.org/abs/2404.16790)\r\n\r\n[SEED-Bench-2 Arxiv](https://arxiv.org/abs/2311.17092)\r\n\r\n[SEED-Bench-1 Arxiv](https://arxiv.org/abs/2307.16125)\r\n\r\n \u003cimg src=\"https://github.com/AILab-CVC/SEED-Bench/blob/main/figs/seed-bench-2.jpg\" width = \"600\"  alt=\"图片名称\" align=center /\u003e\r\n \r\n SEED-Bench-H is a comprehensive integration of previous SEED-Bench series (SEED-Bench, SEED-Bench-2, SEED-Bench-2-Plus), with additional evaluation dimension. \r\n It consists of 28K multiple-choice questions with precise human annotations, spanning 34 dimensions, including the evaluation of both text and image generation.\r\n\r\n SEED-Bench-2-Plus comprises 2.3K multiple-choice questions with precise human annotations, spanning three broad categories: Charts, Maps, and Webs, each of which covers a wide spectrum of textrich scenarios in the real world.\r\n\r\n SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions, including the evaluation of both text and image generation.\r\n \r\n SEED-Bench-1 consists of 19K multiple-choice questions with accurate human annotations, covering 12 evaluation dimensions including both the spatial and temporal understanding.\r\n## News\r\n**[2025.1.14]** [SEED-Bench](https://arxiv.org/abs/2307.16125) has been included in the [OpenCompass-Dataset Community](https://hub.opencompass.org.cn/dataset-detail/SEED-Bench), thanks to [OpenCompass](https://opencompass.org.cn/home).\r\n\r\n**[2024.7.11]** [SEED-Bench-H](https://github.com/AILab-CVC/SEED-Bench/blob/main/SEED-Bench-H/SEED-Bench-H.pdf), [SEED-Bench-2-Plus](https://arxiv.org/abs/2404.16790), [SEED-Bench-2](https://arxiv.org/abs/2311.17092), and [SEED-Bench-1](https://arxiv.org/abs/2307.16125) data is released on [ModelScope](https://modelscope.cn/organization/TencentARC?tab=dataset), thanks to [ModelScope Community](https://modelscope.cn).\r\n\r\n**[2024.6.18]** [SEED-Bench-2](https://arxiv.org/abs/2311.17092) can be evaluated on [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), thanks to [kennymckormick](https://github.com/kennymckormick).\r\n\r\n**[2024.5.30]** We released [SEED-Bench-H](https://github.com/AILab-CVC/SEED-Bench/blob/main/SEED-Bench-H/SEED-Bench-H.pdf), which is a comprehensive integration of previous SEED-Bench series ([SEED-Bench](https://arxiv.org/abs/2311.17092), [SEED-Bench-2](https://arxiv.org/abs/2311.17092), [SEED-Bench-2-Plus](https://arxiv.org/abs/2404.16790)), with additional evaluation dimension. The additional evaluation dimension including Image to Latex, Visual Story Comprehension, Few-shot Segmentation, Few-shot Keypoint, Few-shot Depth, and Few-shot Object Detection. Please refer [SEED-Bench-H](https://github.com/AILab-CVC/SEED-Bench/blob/main/SEED-Bench-H/SEED-Bench-H.pdf) for detailed. Corresponding dataset is released on [SEED-Bench-H](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-H).\r\n\r\n**[2024.5.25]** [SEED-Bench-2-Plus](https://arxiv.org/abs/2404.16790) can be evaluated on [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), thanks to [kennymckormick](https://github.com/kennymckormick).\r\n\r\n**[2024.4.26]** We are excited to announce the release of [SEED-Bench-2-Plus](https://arxiv.org/abs/2404.16790), a benchmark specifically designed for text-rich visual comprehension. The accompanying dataset is released on [SEED-Bench-2-Plus](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-2-plus).\r\n\r\n**[2024.4.23]** We are pleased to share the comprehensive evaluation results for [Gemini-Vision-Pro](https://gemini.google.com/) and [Claude-3-Opus](https://www.anthropic.com/news/claude-3-family) on [SEED-Bench-1](https://arxiv.org/abs/2307.16125) and [SEED-Bench-2](https://arxiv.org/abs/2311.17092). You can access detailed performance on the [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard). Please note that for [Gemini-Vision-Pro](https://gemini.google.com/) we only report task performance when the model responds with at least 50% valid data in the task. \r\n\r\n**[2024.2.27]** [SEED-Bench](https://arxiv.org/abs/2311.17092) is accepted by **CVPR 2024**.\r\n\r\n**[2023.12.18]** We have placed the comprehensive evaluation results for [GPT-4v](https://openai.com/research/gpt-4v-system-card) on [SEED-Bench-1](https://arxiv.org/abs/2307.16125) and [SEED-Bench-2](https://arxiv.org/abs/2311.17092). These can be accessed at [GPT-4V for SEED-Bench-1](https://github.com/AILab-CVC/SEED-Bench/blob/main/evaluate_result/SEED-Bench-1/GPT-4V.json) and [GPT-4V for SEED-Bench-2](https://github.com/AILab-CVC/SEED-Bench/blob/main/evaluate_result/SEED-Bench-2/GPT-4V.json). If you're interested, please feel free to take a look.\r\n\r\n**[2023.12.4]** We have updated the [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard) for [SEED-Bench-2](https://arxiv.org/abs/2311.17092). Additionally, we have updated the evaluation results for [GPT-4v](https://openai.com/research/gpt-4v-system-card) on both [SEED-Bench-1](https://arxiv.org/abs/2307.16125) and [SEED-Bench-2](https://arxiv.org/abs/2311.17092). If you are interested, please visit the [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard) for more details.\r\n\r\n**[2023.11.30]** We have updated the SEED-Bench-v1 JSON (manually screening the multiple-choice questions for videos) and provided corresponding video frames for easier testing. Please refer to [SEED-Bench](https://huggingface.co/datasets/AILab-CVC/SEED-Bench) for more information.\r\n\r\n**[2023.11.27]** [SEED-Bench-2](https://arxiv.org/abs/2311.17092) is released! Data and evaluation code is available now.\r\n\r\n**[2023.9.9]** We are actively looking for self-motivated interns. Please feel free to reach out if you are interested.\r\n\r\n**[2023.8.16]** [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard) is released! You can upload your model's results now.\r\n\r\n**[2023.7.30]** [SEED-Bench](https://arxiv.org/abs/2307.16125) is released! Data and evaluation code is available now.\r\n\r\n## Leaderboard\r\nWelcome to [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard)!\r\n\r\n### Leaderboard Submission\r\n\r\nYou can submit your model results in [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard) now. You can use our evaluation code to obtain 'results.json' in 'results' folder as below.\r\n\r\n```shell\r\npython eval.py --model instruct_blip --anno_path SEED-Bench.json --output-dir results --task all\r\n```\r\n\r\nThen you can upload 'results.json' in [SEED-Bench Leaderboard](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard).\r\n\r\nAfter submitting, please press refresh button to get the latest results.\r\n\r\n## Data Preparation\r\n\r\nYou can download the data of SEED-Bench released on HuggingFace repo [SEED-Bench](https://huggingface.co/datasets/AILab-CVC/SEED-Bench), [SEED-Bench-2](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-2), [SEED-Bench-2-Plus](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-2-plus), and [SEED-Bench-H](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-H).\r\nAlso, you can download data from [ModelScope](https://modelscope.cn/organization/TencentARC?tab=dataset).\r\nPlease refer to [DATASET.md](DATASET.md) for data preparation.\r\n\r\n## Installation\r\n\r\nPlease refer to [INSTALL.md](INSTALL.md).\r\n\r\n## Run Evaluation\r\n\r\nPlease refer to [EVALUATION.md](EVALUATION.md).\r\n\r\n## License\r\nSEED-Bench is released under Apache License Version 2.0.\r\n\r\n## Declaration\r\n\r\n### SEED-Bench-2-Plus\r\nData Sources: Data from the internet under CC-BY licenses.\r\n\r\nPlease contact us if you believe any data infringes upon your rights, and we will remove it.\r\n\r\n### SEED-Bench-2\r\nData Sources:\r\n- Dimensions 1-9, 23 (In-Context Captioning): Conceptual Captions Dataset (https://ai.google.com/research/ConceptualCaptions/) under its license (https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE). Copyright belongs to the original dataset owner.\r\n- Dimension 9 (Text Recognition): ICDAR2003 (http://www.imglab.org/db/index.html), ICDAR2013(https://rrc.cvc.uab.es/?ch=2), IIIT5k(https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset), and SVT(http://vision.ucsd.edu/~kai/svt/). Copyright belongs to the original dataset owner.\r\n- Dimension 10 (Celebrity Recognition): MME (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and MMBench (https://github.com/open-compass/MMBench) under MMBench license (https://github.com/open-compass/MMBench/blob/main/LICENSE). Copyright belongs to the original dataset owners.\r\n- Dimension 11 (Landmark Recognition): Google Landmark Dataset v2 (https://github.com/cvdfoundation/google-landmark) under CC-BY licenses without ND restrictions.\r\n- Dimension 12 (Chart Understanding): PlotQA (https://github.com/NiteshMethani/PlotQA) under its license (https://github.com/NiteshMethani/PlotQA/blob/master/LICENSE).\r\n- Dimension 13 (Visual Referring Expression): VCR (http://visualcommonsense.com) under its license (http://visualcommonsense.com/license/).\r\n- Dimension 14 (Science Knowledge): ScienceQA (https://github.com/lupantech/ScienceQA) under its license (https://github.com/lupantech/ScienceQA/blob/main/LICENSE-DATA).\r\n- Dimension 15 (Emotion Recognition): FER2013 (https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/data) under its license (https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/rules#7-competition-data).\r\n- Dimension 16 (Visual Mathematics): MME (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and data from the internet under CC-BY licenses.\r\n- Dimension 17 (Difference Spotting): MIMICIT (https://github.com/Luodian/Otter/blob/main/mimic-it/README.md) under its license (https://github.com/Luodian/Otter/tree/main/mimic-it#eggs).\r\n- Dimension 18 (Meme Comprehension): Data from the internet under CC-BY licenses.\r\n- Dimension 19 (Global Video Understanding): Charades (https://prior.allenai.org/projects/charades) under its license (https://prior.allenai.org/projects/data/charades/license.txt). SEED-Bench-2 provides 8 frames per video.\r\n- Dimensions 20-22 (Action Recognition, Action Prediction, Procedure Understanding): Something-Something v2 (https://developer.qualcomm.com/software/ai-datasets/something-something), Epic-Kitchen 100 (https://epic-kitchens.github.io/2023), and Breakfast (https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/). SEED-Bench-2 provides 8 frames per video.\r\n- Dimension 24 (Interleaved Image-Text Analysis): Data from the internet under CC-BY licenses.\r\n- Dimension 25 (Text-to-Image Generation): CC-500 (https://github.com/weixi-feng/Structured-Diffusion-Guidance) and ABC-6k (https://github.com/weixi-feng/Structured-Diffusion-Guidance) under their license (https://github.com/weixi-feng/Structured-Diffusion-Guidance/blob/master/LICENSE), with images generated by Stable-Diffusion-XL (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) under its license (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).\r\n- Dimension 26 (Next Image Prediction): Epic-Kitchen 100 (https://epic-kitchens.github.io/2023) under its license (https://creativecommons.org/licenses/by-nc/4.0/).\r\n- Dimension 27 (Text-Image Creation): Data from the internet under CC-BY licenses.\r\n\r\nPlease contact us if you believe any data infringes upon your rights, and we will remove it.\r\n\r\n### SEED-Bench-1\r\nFor the images of SEED-Bench-1, we use the data from Conceptual Captions Dataset (https://ai.google.com/research/ConceptualCaptions/)\r\nfollowing its license (https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE).\r\nTencent does not hold the copyright for these images and the copyright belongs to the original owner of Conceptual Captions Dataset. \r\n\r\nFor the videos of SEED-Bench-1, we use tha data from Something-Something v2 (https://developer.qualcomm.com/software/ai-datasets/something-something),\r\nEpic-kitchen 100 (https://epic-kitchens.github.io/2023) and \r\nBreakfast (https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/). We only provide the video name. Please download them in their official websites.\r\n\r\n\r\n## Citing\r\nIf you find this repository helpful, please consider citing it:\r\n```\r\n@article{li2024seed2plus,\r\n  title={SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension},\r\n  author={Li, Bohao and Ge, Yuying and Chen, Yi and Ge, Yixiao and Zhang, Ruimao and Shan, Ying},\r\n  journal={arXiv preprint arXiv:2404.16790},\r\n  year={2024}\r\n}\r\n\r\n@article{li2023seed2,\r\n  title={SEED-Bench-2: Benchmarking Multimodal Large Language Models},\r\n  author={Li, Bohao and Ge, Yuying and Ge, Yixiao and Wang, Guangzhi and Wang, Rui and Zhang, Ruimao and Shan, Ying},\r\n  journal={arXiv preprint arXiv:2311.17092},\r\n  year={2023}\r\n  }\r\n\r\n@article{li2023seed,\r\n  title={Seed-bench: Benchmarking multimodal llms with generative comprehension},\r\n  author={Li, Bohao and Wang, Rui and Wang, Guangzhi and Ge, Yuying and Ge, Yixiao and Shan, Ying},\r\n  journal={arXiv preprint arXiv:2307.16125},\r\n  year={2023}\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fseed-bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failab-cvc%2Fseed-bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fseed-bench/lists"}