https://github.com/showlab/visincontext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
https://github.com/showlab/visincontext
efficient in-context-learning llm mllm
Last synced: 11 months ago
JSON representation
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
- Host: GitHub
- URL: https://github.com/showlab/visincontext
- Owner: showlab
- Created: 2024-06-03T15:41:44.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-30T03:57:48.000Z (over 1 year ago)
- Last Synced: 2025-04-01T17:24:48.337Z (12 months ago)
- Topics: efficient, in-context-learning, llm, mllm
- Language: Python
- Homepage: https://fingerrec.github.io/visincontext/
- Size: 1010 KB
- Stars: 14
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# VisInContext
[Arxiv](https://arxiv.org/abs/2406.02547)

- VisInContext is a easy way to increase the in-context text length in Multi-modality Learning.
- This work is also complement with existing works to increase in-context text length like FlashAttn, Memory Transformer.
## Install
```
pip install -r requirement.txt
```
For H100 GPUS, run the following dependencies:
```
pip install -r requirements_h100.txt
```
## Dataset Preparation
See [DATASET.md](DATASET.md).
## Pre-training
See [PRETRAIN.md](PRETRAIN.md).
## Few-shot Evaluation
See [Evaluation.md](EVALUATION.md)
## Citation
If you find our work helps, please consider cite the following work
```
@article{wang2024visincontext,
title={Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning},
author={Wang, Alex Jinpeng and Li, Linjie and Lin, Yiqi and Li, Min and Wang, Lijuan and Shou, Mike Zheng},
journal={NeurIPS},
year={2024}
}
```
## Contact
Email: awinyimgprocess at gmail dot com
## Acknowledgement
Thanks for these good works.
[Open-flamingo](https://github.com/mlfoundations/open_flamingo), [Open-CLIP](https://github.com/mlfoundations/open_clip) and [WebDataset](https://github.com/webdataset/webdataset).