https://github.com/tencentarc/gvt

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
https://github.com/tencentarc/gvt

Last synced: about 1 year ago
JSON representation

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".

Host: GitHub
URL: https://github.com/tencentarc/gvt
Owner: TencentARC
License: apache-2.0
Created: 2023-05-19T06:36:35.000Z (about 3 years ago)
Default Branch: master
Last Pushed: 2023-06-27T03:44:20.000Z (about 3 years ago)
Last Synced: 2025-03-21T13:23:03.391Z (over 1 year ago)
Language: Python
Homepage:
Size: 5.65 MB
Stars: 58
Watchers: 7
Forks: 0
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # GVT: Good Visual Tokenizer for LLMs

This repo contains assets in our paper [What makes for Good Visual Tokenizers for Large Language Models?](https://arxiv.org/abs/2305.12223)

## Model

We provide related details in [gvt](./gvt/).

## GVTBench

We provide the Object Counting (OC) and Multi-Class Identification (MCI) on MS-COCO and VCR datasets in [GVTBench](./GVTBench/).

## Acknowledgement

Our work is built on 

[VLMo](https://github.com/microsoft/unilm/tree/master/vlmo)

[LAVIS](https://github.com/salesforce/LAVIS) 

[EVA](https://github.com/baaivision/EVA) 

[Vicuna](https://github.com/lm-sys/FastChat).

Thanks for their great work!

## Citation

If you find this work useful, please cite:

```

@misc{wang2023gvt,

      title={What Makes for Good Visual Tokenizers for Large Language Models?}, 

      author={Guangzhi Wang and Yixiao Ge and Xiaohan Ding and Mohan Kankanhalli and Ying Shan},

      year={2023},

      eprint={2305.12223},

      archivePrefix={arXiv},

      primaryClass={cs.CV}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tencentarc/gvt

Awesome Lists containing this project

README