https://github.com/shreydan/visiongpt2
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
https://github.com/shreydan/visiongpt2
gpt image-captioning multimodal pytorch transformers vit
Last synced: 5 months ago
JSON representation
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
- Host: GitHub
- URL: https://github.com/shreydan/visiongpt2
- Owner: shreydan
- Created: 2023-09-28T19:36:07.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-02T14:40:53.000Z (about 2 years ago)
- Last Synced: 2024-03-15T14:16:28.883Z (over 1 year ago)
- Topics: gpt, image-captioning, multimodal, pytorch, transformers, vit
- Language: Jupyter Notebook
- Homepage: https://www.kaggle.com/code/shreydan/visiongpt2-image-captioning-pytorch
- Size: 289 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0