https://github.com/baaivision/eve
EVE Series: Encoder-Free Vision-Language Models from BAAI
https://github.com/baaivision/eve
clip encoder-free-vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models vlm
Last synced: 18 days ago
JSON representation
EVE Series: Encoder-Free Vision-Language Models from BAAI
- Host: GitHub
- URL: https://github.com/baaivision/eve
- Owner: baaivision
- License: mit
- Created: 2024-06-14T06:10:16.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-01T00:41:37.000Z (2 months ago)
- Last Synced: 2025-04-03T20:09:23.881Z (26 days ago)
- Topics: clip, encoder-free-vlm, instruction-following, large-language-models, llm, mllm, multimodal-large-language-models, vision-language-models, vlm
- Language: Python
- Homepage:
- Size: 6.95 MB
- Stars: 315
- Watchers: 10
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
#
EVE Series: Encoder-Free VLMs from BAAI
![]()
- **2024/05**: [EVEv1](https://github.com/baaivision/EVE/blob/main/EVEv1/README.md) - Unveiling Encoder-Free Vision-Language Models (NeurIPS 2024, Spotlight)
- **2024/11**: [EVEv2](https://github.com/baaivision/EVE/blob/main/EVEv2/README.md) - EVEv2: Improved Baselines for Encoder-Free Vision-Language Models (ArXiv 2025)
## 💡 Motivation
- **Can we remove vision encoder from VLMs?**
- **How to transfer an LLM to an encoder-free VLM efficiently and stably?**
- **How to bridge the performance gap between encoder-free and encoder-based VLMs?**## 📜 News
[2025/02/09] 🔥🔥🔥 The [paper](https://arxiv.org/abs/2502.06788), [weights](https://huggingface.co/BAAI/EVE-7B-HD-v2.0), and [code](https://github.com/baaivision/EVE/blob/main/EVEv2/README.md) of **EVEv2** are released ! 💥
[2024/09/26] Our **EVE** has been accepted by **NeurIPS 2024** (**spotlight**) ! 💥
[2024/06/18] The [paper](https://arxiv.org/abs/2406.11832), [weights](https://huggingface.co/BAAI/EVE-7B-HD-v1.0), and [code](https://github.com/baaivision/EVE/blob/main/EVEv1/README.md) of **EVE** are released ! 💥## 💡 Highlights
- 🔥 **Superior Capability:** *An originated encoder-free* LVLM with *arbitrary* image aspect ratio, outperforming the counterparts and approaching existing *modular encoder-based* LVLMs.- 🔥 **Data Efficiency:** Filter and recaption solely *<100M* publicly avaliable data from OpenImages, SAM, LAION, Datacomp for pre-training.
- 🔥 **Pioneering Route:** We attempt to provide an *efficient*, *transparent*, and *practical* training strategy and procedure for developing a pure decoder-only architecture across modalities.
## ✒️ Citation
If **EVE** is helpful for your research, please consider **star** ⭐ and **citation** 📝 :
```bibtex
@article{diao2024EVE,
title={Unveiling Encoder-Free Vision-Language Models},
author={Diao, Haiwen and Cui, Yufeng and Li, Xiaotong and Wang, Yueze and Lu, Huchuan and Wang, Xinlong},
journal={arXiv preprint arXiv:2406.11832},
year={2024}
}
``````bibtex
@article{diao2025EVEv2,
title={EVEv2: Improved Baselines for Encoder-Free Vision-Language Models},
author={Diao, Haiwen and Li, Xiaotong and Cui, Yufeng and Wang, Yueze and Deng, Haoge and Pan, Ting and Wang, Wenxuan and Lu, Huchuan and Wang, Xinlong},
journal={arXiv preprint arXiv:2502.06788},
year={2025}
}
```## 📄 License
The content of this project itself is licensed under [LICENSE](https://github.com/baaivision/EVE/blob/main/LICENSE).