https://github.com/tencentarc/gvt
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
https://github.com/tencentarc/gvt
Last synced: about 1 year ago
JSON representation
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
- Host: GitHub
- URL: https://github.com/tencentarc/gvt
- Owner: TencentARC
- License: apache-2.0
- Created: 2023-05-19T06:36:35.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-06-27T03:44:20.000Z (about 3 years ago)
- Last Synced: 2025-03-21T13:23:03.391Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 5.65 MB
- Stars: 58
- Watchers: 7
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GVT: Good Visual Tokenizer for LLMs
This repo contains assets in our paper [What makes for Good Visual Tokenizers for Large Language Models?](https://arxiv.org/abs/2305.12223)
## Model
We provide related details in [gvt](./gvt/).
## GVTBench
We provide the Object Counting (OC) and Multi-Class Identification (MCI) on MS-COCO and VCR datasets in [GVTBench](./GVTBench/).
## Acknowledgement
Our work is built on
[VLMo](https://github.com/microsoft/unilm/tree/master/vlmo)
[LAVIS](https://github.com/salesforce/LAVIS)
[EVA](https://github.com/baaivision/EVA)
[Vicuna](https://github.com/lm-sys/FastChat).
Thanks for their great work!
## Citation
If you find this work useful, please cite:
```
@misc{wang2023gvt,
title={What Makes for Good Visual Tokenizers for Large Language Models?},
author={Guangzhi Wang and Yixiao Ge and Xiaohan Ding and Mohan Kankanhalli and Ying Shan},
year={2023},
eprint={2305.12223},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```