{"id":32514162,"url":"https://github.com/vchitect/uni-mmmu","last_synced_at":"2025-10-27T23:29:16.264Z","repository":{"id":318905930,"uuid":"1071928122","full_name":"Vchitect/Uni-MMMU","owner":"Vchitect","description":null,"archived":false,"fork":false,"pushed_at":"2025-10-16T05:11:52.000Z","size":1113,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-16T11:09:02.240Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Vchitect.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-08T02:43:28.000Z","updated_at":"2025-10-16T07:12:31.000Z","dependencies_parsed_at":"2025-10-17T05:41:49.353Z","dependency_job_id":null,"html_url":"https://github.com/Vchitect/Uni-MMMU","commit_stats":null,"previous_names":["vchitect/uni-mmmu"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Vchitect/Uni-MMMU","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FUni-MMMU","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FUni-MMMU/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FUni-MMMU/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FUni-MMMU/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Vchitect","download_url":"https://codeload.github.com/Vchitect/Uni-MMMU/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Vchitect%2FUni-MMMU/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281361396,"owners_count":26487881,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-27T02:00:05.855Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-27T23:29:09.697Z","updated_at":"2025-10-27T23:29:16.257Z","avatar_url":"https://github.com/Vchitect.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Paper](https://img.shields.io/badge/cs.CV-Paper-b31b1b?logo=arxiv\u0026logoColor=red)](https://arxiv.org/abs/xxxx)\n[![Project Page](https://img.shields.io/badge/Evaluation-Website-green?logo=googlechrome\u0026logoColor=green)](https://vchitect.github.io/Uni-MMMU-Project/)\n\n\u003cdiv align=\"center\"\u003e\n\u003ch1\u003eUni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark\n\u003c/h1\u003e\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://github.com/Jacky-hate\" target=\"_blank\"\u003eKai Zou\u003c/a\u003e*,\n    \u003ca href=\"https://ziqihuangg.github.io/\" target=\"_blank\"\u003eZiqi Huang\u003c/a\u003e*,\n    \u003ca href=\"https://scholar.google.com/citations?hl=zh-CN\u0026user=kMui170AAAAJ\" target=\"_blank\"\u003e Yuhao Dong\u003c/a\u003e*,\n    \u003ca href=\"https://shulin16.github.io/\" target=\"_blank\"\u003eShulin Tian\u003c/a\u003e,\n    \u003ca href=\"https://zhengdian1.github.io/\" target=\"_blank\"\u003eDian Zheng\u003c/a\u003e,\n    \u003ca href=\"https://github.com/Alexios-hub\" target=\"_blank\"\u003eHongbo Liu\u003c/a\u003e,\n    \u003ca href=\"https://github.com/hejingwenhejingwen\" target=\"_blank\"\u003eJingwen He\u003c/a\u003e,\n    \u003ca href=\"https://scholar.google.com/citations?user=kReWULQAAAAJ\u0026hl=en\u0026oi=ao\" target=\"_blank\"\u003eBin Liu\u003c/a\u003e\u003csup\u003e+\u003c/sup\u003e,\n    \u003ca href=\"http://mmlab.siat.ac.cn/yuqiao/index.html\" target=\"_blank\"\u003eYu Qiao\u003c/a\u003e\u003csup\u003e+\u003c/sup\u003e,\n    \u003ca href=\"https://liuziwei7.github.io/\" target=\"_blank\"\u003eZiwei Liu\u003c/a\u003e\u003csup\u003e+\u003c/sup\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n     * equal contributions \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp; \n     + corresponding authors\n\u003c/div\u003e\n\n\n\n\u003ca name=\"overview\"\u003e\u003c/a\u003e\n## :mega: Overview\n\nUnified multimodal models aim to jointly enable visual understanding and generation, yet current benchmarks rarely examine their true integration. Existing evaluations either treat the two abilities in isolation or overlook tasks that inherently couple them. To address this gap, we present Uni-MMMU, a comprehensive and discipline-aware benchmark that systematically unfolds the bidirectional synergy between generation and understanding across eight reasoning-centric domains, including science, coding, mathematics, and puzzles. Each task is bidirectionally coupled, demanding models to (i) leverage conceptual understanding to guide precise visual synthesis, or (ii) utilize generation as a cognitive scaffold for analytical reasoning. Uni-MMMU incorporates verifiable intermediate reasoning steps, unique ground truths, and a reproducible scoring protocol for both textual and visual outputs. Through extensive evaluation of state-of-the-art unified, generation-only, and understanding-only models, we reveal substantial performance disparities and cross-modal dependencies, offering new insights into **when and how** these abilities reinforce one another, and establishing a reliable foundation for advancing unified models.\n\n![Framework](./assets/images/all.svg)\n\n\n**Overview of Uni-MMMU.** Eight tasks are grouped into two paradigms: generation aids understanding (Maze, Sliding, Geometry, Jigsaw) and understanding guides generation (Science: Physics/Chemistry/Biology; Code Rendering). Each task reports dual-channel scores (text + image).\n\n\n\n\u003ca name=\"installation\"\u003e\u003c/a\u003e\n## :hammer: Installation\n\n1. Clone the repository.\n\n```bash\ngit clone https://github.com/Vchitect/Uni-MMMU.git\ncd Uni-MMMU\n```\n\n2. Install the environment.\n```bash\nconda update -n base -c defaults conda\nconda create -n ummmu python==3.10 -y\nconda activate ummmu\n\npip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118\npip install -r requirements.txt\n```\n\n3. Download the dataset.\n```bash\ngit clone https://huggingface.co/datasets/Vchitect/Uni-MMMU-Eval\ncd Uni-MMMU-Eval\ntar -xvf data.tar -C /path/to/Uni-MMMU\n```\n\n\n\u003ca name=\"usage\"\u003e\u003c/a\u003e\n## Usage\n\n\n### Sampling\n\n- Please refer to ./sample_code_example for details.\n- All sampled data will be in ./outputs/model_name .\n\n\n### Evaluation\n\n\n#### Command\n\n```\npython eval_ummmu.py --model_name model_to_be_eval\n```\n- Note: This evaluation requires Qwen2.5-VL-72B and Qwen3-32B as evaluators. We recommend running this on a system with at least A100 80GB GPUs to ensure sufficient memory and performance.\n\n\n\n## Citation\n\nIf you find our repo useful for your research, please consider citing our paper:\n\n```bibtex\n\n@misc{zou2025unimmmumassivemultidisciplinemultimodal,\n      title={Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark}, \n      author={Kai Zou and Ziqi Huang and Yuhao Dong and Shulin Tian and Dian Zheng and Hongbo Liu and Jingwen He and Bin Liu and Yu Qiao and Ziwei Liu},\n      year={2025},\n      eprint={2510.13759},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https://arxiv.org/abs/2510.13759}, \n}\n```\n\n## Related Links\n\nOur related projects: [VBench](https://github.com/Vchitect/VBench), [Awesome Evaluation of Visual Generation](https://github.com/ziqihuangg/Awesome-Evaluation-of-Visual-Generation)\n\n```bibtex\n@InProceedings{huang2023vbench,\n    title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},\n    author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tianxing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},\n    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n    year={2024}\n}\n\n@article{huang2024vbench++,\n    title={{VBench++}: Comprehensive and Versatile Benchmark Suite for Video Generative Models},\n    author={Huang, Ziqi and Zhang, Fan and Xu, Xiaojie and He, Yinan and Yu, Jiashuo and Dong, Ziyue and Ma, Qianli and Chanpaisit, Nattapol and Si, Chenyang and Jiang, Yuming and Wang, Yaohui and Chen, Xinyuan and Chen, Ying-Cong and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},\n    journal={arXiv preprint arXiv:2411.13503},\n    year={2024}\n}\n\n@article{zheng2025vbench2,\n    title={{VBench-2.0}: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness},\n    author={Zheng, Dian and Huang, Ziqi and Liu, Hongbo and Zou, Kai and He, Yinan and Zhang, Fan and Zhang, Yuanhan and He, Jingwen and Zheng, Wei-Shi and Qiao, Yu and Liu, Ziwei},\n    journal={arXiv preprint arXiv:2503.21755},\n    year={2025}\n}\n@InProceedings{zhang2024evaluationagent,\n    title = {Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models},\n    author = {Zhang, Fan and Tian, Shulin and Huang, Ziqi and Qiao, Yu and Liu, Ziwei},\n    booktitle={Annual Meeting of the Association for Computational Linguistics (ACL), 2025},\n    year = {2024}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvchitect%2Funi-mmmu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvchitect%2Funi-mmmu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvchitect%2Funi-mmmu/lists"}