https://github.com/thisisiron/QFormer_Pretraining
Implementation of Qformer pre-training
https://github.com/thisisiron/QFormer_Pretraining
blip-2 blip2 qformer vision-language-model vlm
Last synced: about 1 month ago
JSON representation
Implementation of Qformer pre-training
- Host: GitHub
- URL: https://github.com/thisisiron/QFormer_Pretraining
- Owner: thisisiron
- License: mit
- Created: 2024-09-24T14:30:09.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-11-18T13:32:56.000Z (6 months ago)
- Last Synced: 2025-03-16T06:43:29.187Z (2 months ago)
- Topics: blip-2, blip2, qformer, vision-language-model, vlm
- Language: Python
- Homepage:
- Size: 141 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pre-training Q-Former
This repository contains code for pre-training Q-Former using the transformers library. The code supports training converting pre-trained LAVIS BLIP-2 models to the PyTorch transformers format.### Features
- Pre-train Q-Former from scratch using transformers library
- Convert LAVIS BLIP-2 Q-Former weights to transformers format## Usage
### From LAVIS BLIP-2
To run the script for pre-training Q-Former from lavis, use the following command:
```
```## Citation
```bibtex
@article{blip2,
title={BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models},
author={Junnan Li and Dongxu Li and Silvio Savarese and Steven Hoi},
journal={arXiv:2301.12597},
year={2023}
}
```