An open API service indexing awesome lists of open source software.

https://github.com/tencentarc/gvt

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
https://github.com/tencentarc/gvt

Last synced: about 1 year ago
JSON representation

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".

Awesome Lists containing this project

README

          

# GVT: Good Visual Tokenizer for LLMs
This repo contains assets in our paper [What makes for Good Visual Tokenizers for Large Language Models?](https://arxiv.org/abs/2305.12223)

## Model
We provide related details in [gvt](./gvt/).

## GVTBench
We provide the Object Counting (OC) and Multi-Class Identification (MCI) on MS-COCO and VCR datasets in [GVTBench](./GVTBench/).

## Acknowledgement
Our work is built on
[VLMo](https://github.com/microsoft/unilm/tree/master/vlmo)
[LAVIS](https://github.com/salesforce/LAVIS)
[EVA](https://github.com/baaivision/EVA)
[Vicuna](https://github.com/lm-sys/FastChat).

Thanks for their great work!

## Citation
If you find this work useful, please cite:
```
@misc{wang2023gvt,
title={What Makes for Good Visual Tokenizers for Large Language Models?},
author={Guangzhi Wang and Yixiao Ge and Xiaohan Ding and Mohan Kankanhalli and Ying Shan},
year={2023},
eprint={2305.12223},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```