https://github.com/swhl/tinygpt-v

整理学习版，博客依赖该仓库，不可删除
https://github.com/swhl/tinygpt-v

Last synced: about 1 year ago
JSON representation

整理学习版，博客依赖该仓库，不可删除

Host: GitHub
URL: https://github.com/swhl/tinygpt-v
Owner: SWHL
Created: 2024-01-10T01:37:25.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-01-12T11:01:47.000Z (over 2 years ago)
Last Synced: 2025-01-25T05:41:29.172Z (over 1 year ago)
Language: Python
Homepage:
Size: 4.69 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          #### QFormer作用？

QFormer来自论文BCLI2工作中，用来弥补Frozen Image encoder和Frozen LLM之间的gap。

基于Bert作为初始化的。

#### 推理结构图

```mermaid

flowchart LR

A(Image) --> B(blip2_image_eval) --> C(QFormer) --> D(Liner)

D --> E(Linear) --> F(get_context_emb)

```

#### prompt

```text

Give the following image: ImageContent. "

"You will be able to see the image once I provide it to you. Please answer my questions.

```

融合方法：

先将图像转为向量。将prompt除Image部分其他部分依次转为向量。

再将两者mix，得到最终向量。

```python

def get_context_emb(self, prompt, img_list):

    device = img_list[0].device

    prompt_segs = prompt.split("")

    assert (

        len(prompt_segs) == len(img_list) + 1

    ), "Unmatched numbers of image placeholders and images."

    seg_tokens = [

        self.llama_tokenizer(seg, return_tensors="pt", add_special_tokens=i == 0)

        .to(device)

        .input_ids  # only add bos to the first seg

        for i, seg in enumerate(prompt_segs)

    ]

    seg_embs = [self.embed_tokens(seg_t) for seg_t in seg_tokens]

    # TODO: 这里具体如何混合在一起的，需要Debug查看

    mixed_embs = [emb for pair in zip(seg_embs[:-1], img_list) for emb in pair] + [

        seg_embs[-1]

    ]

    mixed_embs = torch.cat(mixed_embs, dim=1)

    return mixed_embs

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/swhl/tinygpt-v

Awesome Lists containing this project

README