https://github.com/baaivision/eve

EVE Series: Encoder-Free Vision-Language Models from BAAI
https://github.com/baaivision/eve

clip encoder-free-vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models vlm

Last synced: 6 months ago
JSON representation

EVE Series: Encoder-Free Vision-Language Models from BAAI

Host: GitHub
URL: https://github.com/baaivision/eve
Owner: baaivision
License: mit
Created: 2024-06-14T06:10:16.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-01T00:41:37.000Z (8 months ago)
Last Synced: 2025-04-03T20:09:23.881Z (6 months ago)
Topics: clip, encoder-free-vlm, instruction-following, large-language-models, llm, mllm, multimodal-large-language-models, vision-language-models, vlm
Language: Python
Homepage:
Size: 6.95 MB
Stars: 315
Watchers: 10
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          #  EVE Series: Encoder-Free VLMs from BAAI



  



- **2024/05**: [EVEv1](https://github.com/baaivision/EVE/blob/main/EVEv1/README.md) - Unveiling Encoder-Free Vision-Language Models (NeurIPS 2024, Spotlight)

- **2024/11**: [EVEv2](https://github.com/baaivision/EVE/blob/main/EVEv2/README.md) - EVEv2: Improved Baselines for Encoder-Free Vision-Language Models (ArXiv 2025) 

## 💡 Motivation

- **Can we remove vision encoder from VLMs?**

- **How to transfer an LLM to an encoder-free VLM efficiently and stably?**

  

- **How to bridge the performance gap between encoder-free and encoder-based VLMs?** 

## 📜 News

[2025/02/09] 🔥🔥🔥 The [paper](https://arxiv.org/abs/2502.06788), [weights](https://huggingface.co/BAAI/EVE-7B-HD-v2.0), and [code](https://github.com/baaivision/EVE/blob/main/EVEv2/README.md) of **EVEv2** are released ! 💥    

[2024/09/26] Our **EVE** has been accepted by **NeurIPS 2024** (**spotlight**) ! 💥       

[2024/06/18] The [paper](https://arxiv.org/abs/2406.11832), [weights](https://huggingface.co/BAAI/EVE-7B-HD-v1.0), and [code](https://github.com/baaivision/EVE/blob/main/EVEv1/README.md) of **EVE** are released ! 💥   

## 💡 Highlights

- 🔥 **Superior Capability:** *An originated encoder-free* LVLM with *arbitrary* image aspect ratio, outperforming the counterparts and approaching existing *modular encoder-based* LVLMs.  

- 🔥 **Data Efficiency:** Filter and recaption solely *<100M* publicly avaliable data from OpenImages, SAM, LAION, Datacomp for pre-training.  

- 🔥 **Pioneering Route:** We attempt to provide an *efficient*, *transparent*, and *practical* training strategy and procedure for developing a pure decoder-only architecture across modalities.  

## ✒️ Citation 

If **EVE** is helpful for your research, please consider **star** ⭐ and **citation** 📝 :

```bibtex

@article{diao2024EVE,

  title={Unveiling Encoder-Free Vision-Language Models},

  author={Diao, Haiwen and Cui, Yufeng and Li, Xiaotong and Wang, Yueze and Lu, Huchuan and Wang, Xinlong},

  journal={arXiv preprint arXiv:2406.11832},

  year={2024}

}

```

```bibtex

@article{diao2025EVEv2,

  title={EVEv2: Improved Baselines for Encoder-Free Vision-Language Models},

  author={Diao, Haiwen and Li, Xiaotong and Cui, Yufeng and Wang, Yueze and Deng, Haoge and Pan, Ting and Wang, Wenxuan and Lu, Huchuan and Wang, Xinlong},

  journal={arXiv preprint arXiv:2502.06788},

  year={2025}

}

```

## 📄 License 

The content of this project itself is licensed under [LICENSE](https://github.com/baaivision/EVE/blob/main/LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/baaivision/eve

Awesome Lists containing this project

README