https://github.com/shreydan/visiongpt2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
https://github.com/shreydan/visiongpt2

gpt image-captioning multimodal pytorch transformers vit

Last synced: about 1 year ago
JSON representation

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.

Host: GitHub
URL: https://github.com/shreydan/visiongpt2
Owner: shreydan
Created: 2023-09-28T19:36:07.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-10-02T14:40:53.000Z (over 2 years ago)
Last Synced: 2024-03-15T14:16:28.883Z (about 2 years ago)
Topics: gpt, image-captioning, multimodal, pytorch, transformers, vit
Language: Jupyter Notebook
Homepage: https://www.kaggle.com/code/shreydan/visiongpt2-image-captioning-pytorch
Size: 289 KB
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 0

ecosyste.ms