https://github.com/thisisiron/llava-pool
🌋 A flexible framework for training and configuring Vision-Language Models
https://github.com/thisisiron/llava-pool
llava multimodal-large-language-models vision-language-model vlm
Last synced: 2 months ago
JSON representation
🌋 A flexible framework for training and configuring Vision-Language Models
- Host: GitHub
- URL: https://github.com/thisisiron/llava-pool
- Owner: thisisiron
- Created: 2024-11-13T00:07:43.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-03-14T15:01:09.000Z (2 months ago)
- Last Synced: 2025-03-14T16:22:11.087Z (2 months ago)
- Topics: llava, multimodal-large-language-models, vision-language-model, vlm
- Language: Python
- Homepage:
- Size: 1.51 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Llava Pool
![]()
This project provides functionalities for training and configuring Vision-Language Models (VLM).
## Features
- Open Vision Language Model: Ex. Qwen2-VL, Pixtral, LLama 3.2 Vision
- Training methods for VLMs: Pre-Training, Supervised Fine-Tuning## Install
To install LLaVA-Pool, follow these commands in order. The flash-attn package is required for GPU acceleration and improved performance.
```
git clone https://github.com/thisisiron/LLaVA-Pool.git
cd LLaVA-Pool
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
```## Data Preparation
Provide detailed instructions on how to prepare the data for training.## Pre-training Model
Magma(Multimodal AI Generation and Model Architecture) is a pre-trained model that can be used for various tasks. It is designed to be flexible and adaptable to different use cases. You select the model you want to use based on your needs. If you want to use the Qwen2.5 as the LLM model and SigLIP as the vision model, modify the config of the Magma model to use them.## SFT Model List
| Model | Converter |
| --- | --- |
| Qwen2-VL | qwen2_vl |
| Qwen2.5-VL | qwen2_vl |
| Llama 3.2 Vision | llama3.2_vision |
| Pixtral | pixtral |
| InternVL2.5 | internvl2_5 |## License
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for more details.## References
This repository was built based on LLaMA-Factory.- LLaMA-Factory
- LLaVA-NeXT