https://github.com/thisisiron/llava-pool

🌋 A flexible framework for training and configuring Vision-Language Models
https://github.com/thisisiron/llava-pool

llava multimodal-large-language-models vision-language-model vlm

Last synced: 4 months ago
JSON representation

🌋 A flexible framework for training and configuring Vision-Language Models

Host: GitHub
URL: https://github.com/thisisiron/llava-pool
Owner: thisisiron
Created: 2024-11-13T00:07:43.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-14T15:01:09.000Z (4 months ago)
Last Synced: 2025-03-14T16:22:11.087Z (4 months ago)
Topics: llava, multimodal-large-language-models, vision-language-model, vlm
Language: Python
Homepage:
Size: 1.51 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Llava Pool

This project provides functionalities for training and configuring Vision-Language Models (VLM).

## Features
- Open Vision Language Model: Ex. Qwen2-VL, Pixtral, LLama 3.2 Vision
- Training methods for VLMs: Pre-Training, Supervised Fine-Tuning

## Install
To install LLaVA-Pool, follow these commands in order. The flash-attn package is required for GPU acceleration and improved performance.
```
git clone https://github.com/thisisiron/LLaVA-Pool.git
cd LLaVA-Pool
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
```

## Data Preparation
Provide detailed instructions on how to prepare the data for training.

## Pre-training Model
Magma(Multimodal AI Generation and Model Architecture) is a pre-trained model that can be used for various tasks. It is designed to be flexible and adaptable to different use cases. You select the model you want to use based on your needs. If you want to use the Qwen2.5 as the LLM model and SigLIP as the vision model, modify the config of the Magma model to use them.

## SFT Model List

| Model | Converter |
| --- | --- |
| Qwen2-VL | qwen2_vl |
| Qwen2.5-VL | qwen2_vl |
| Llama 3.2 Vision | llama3.2_vision |
| Pixtral | pixtral |
| InternVL2.5 | internvl2_5 |

## License
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for more details.

## References
This repository was built based on LLaMA-Factory.

- LLaMA-Factory
- LLaVA-NeXT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thisisiron/llava-pool

Awesome Lists containing this project

README