https://github.com/swhl/tinygpt-v
整理学习版,博客依赖该仓库,不可删除
https://github.com/swhl/tinygpt-v
Last synced: about 1 year ago
JSON representation
整理学习版,博客依赖该仓库,不可删除
- Host: GitHub
- URL: https://github.com/swhl/tinygpt-v
- Owner: SWHL
- Created: 2024-01-10T01:37:25.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-12T11:01:47.000Z (over 2 years ago)
- Last Synced: 2025-01-25T05:41:29.172Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 4.69 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#### QFormer作用?
QFormer来自论文BCLI2工作中,用来弥补Frozen Image encoder和Frozen LLM之间的gap。
基于Bert作为初始化的。
#### 推理结构图
```mermaid
flowchart LR
A(Image) --> B(blip2_image_eval) --> C(QFormer) --> D(Liner)
D --> E(Linear) --> F(get_context_emb)
```
#### prompt
```text
Give the following image:
ImageContent. "
"You will be able to see the image once I provide it to you. Please answer my questions.
```
融合方法:
先将图像转为向量。将prompt除Image部分其他部分依次转为向量。
再将两者mix,得到最终向量。
```python
def get_context_emb(self, prompt, img_list):
device = img_list[0].device
prompt_segs = prompt.split("")
assert (
len(prompt_segs) == len(img_list) + 1
), "Unmatched numbers of image placeholders and images."
seg_tokens = [
self.llama_tokenizer(seg, return_tensors="pt", add_special_tokens=i == 0)
.to(device)
.input_ids # only add bos to the first seg
for i, seg in enumerate(prompt_segs)
]
seg_embs = [self.embed_tokens(seg_t) for seg_t in seg_tokens]
# TODO: 这里具体如何混合在一起的,需要Debug查看
mixed_embs = [emb for pair in zip(seg_embs[:-1], img_list) for emb in pair] + [
seg_embs[-1]
]
mixed_embs = torch.cat(mixed_embs, dim=1)
return mixed_embs
```