https://github.com/BAAI-WuDao/BriVL

Bridging Vision and Language Model
https://github.com/BAAI-WuDao/BriVL

Last synced: 7 months ago
JSON representation

Bridging Vision and Language Model

Host: GitHub
URL: https://github.com/BAAI-WuDao/BriVL
Owner: BAAI-WuDao
License: mit
Created: 2021-08-12T08:18:22.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-03-27T14:05:55.000Z (over 2 years ago)
Last Synced: 2024-11-14T13:35:27.250Z (about 1 year ago)
Language: Python
Homepage:
Size: 313 KB
Stars: 279
Watchers: 4
Forks: 31
Open Issues: 13
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ai-game-devtools - BriVL

README

          # BriVL

BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果，超过了同期其他常见的多模态预训练模型（例如UNITER、CLIP）。

BriVL论文：[WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training](https://arxiv.org/abs/2103.06561)

# 适用场景

适用场景示例：图像检索文本、文本检索图像、图像标注、图像零样本分类、作为其他下游多模态任务的输入特征等。

# 技术特色

1. BriVL使用对比学习算法将图像和文本映射到了同一特征空间，可用于弥补图像特征和文本特征之间存在的隔阂。

2. 基于视觉-语言弱相关的假设，除了能理解对图像的描述性文本外，也可以捕捉图像和文本之间存在的抽象联系。

3. 图像编码器和文本编码器可分别独立运行，有利于实际生产环境中的部署。 

# 下载专区

| 模型      | 语言 | 参数量（单位：亿） | 文件（file）                |

| --------- | ---- | ------------------ | --------------------------- |

| BriVL-1.0  | 中文 | 10亿                 | [BriVL-1.0-5500w.tar](https://wudaoai.cn/model/detail/BriVL)| 

# 使用BriVL

### 搭建环境

```

# 环境要求

lmdb==0.99

timm==0.4.12

easydict==1.9

pandas==1.2.4

jsonlines==2.0.0

tqdm==4.60.0

torchvision==0.9.1

numpy==1.20.2

torch==1.8.1

transformers==4.5.1

msgpack_numpy==0.4.7.1

msgpack_python==0.5.6

Pillow==8.3.1

PyYAML==5.4.1

```

配置要求在requirements.txt中，可使用下面的命令：

```

pip install -r requirements.txt

```

### BriVL的目标检测工具

从此处获取BriVL的Bounding Box提取工具[BBox-extractor](https://github.com/chuhaojin/BriVL-BUA-applications/tree/master/bbox_extractor)。

### 特征提取与计算检索结果

```

cd evaluation/

bash test_xyb.sh

```

### 数据解释

现已放入3个图文对示例:

```

./data/imgs  # 放入图像

./data/jsonls # 放入图文对描述

```

# 引用BriVL

```

@article{DBLP:journals/corr/abs-2103-06561,

  author    = {Yuqi Huo and

               Manli Zhang and

               Guangzhen Liu and

               Haoyu Lu and

               Yizhao Gao and

               Guoxing Yang and

               Jingyuan Wen and

               Heng Zhang and

               Baogui Xu and

               Weihao Zheng and

               Zongzheng Xi and

               Yueqian Yang and

               Anwen Hu and

               Jinming Zhao and

               Ruichen Li and

               Yida Zhao and

               Liang Zhang and

               Yuqing Song and

               Xin Hong and

               Wanqing Cui and

               Dan Yang Hou and

               Yingyan Li and

               Junyi Li and

               Peiyu Liu and

               Zheng Gong and

               Chuhao Jin and

               Yuchong Sun and

               Shizhe Chen and

               Zhiwu Lu and

               Zhicheng Dou and

               Qin Jin and

               Yanyan Lan and

               Wayne Xin Zhao and

               Ruihua Song and

               Ji{-}Rong Wen},

  title     = {WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training},

  journal   = {CoRR},

  volume    = {abs/2103.06561},

  year      = {2021},

  url       = {https://arxiv.org/abs/2103.06561},

  archivePrefix = {arXiv},

  eprint    = {2103.06561},

  timestamp = {Tue, 03 Aug 2021 12:35:30 +0200},

  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-06561.bib},

  bibsource = {dblp computer science bibliography, https://dblp.org}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/BAAI-WuDao/BriVL

Awesome Lists containing this project

README