Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/FeiElysia/awesome-zero-shot-captioning
A curated list of zero-shot captioning papers
https://github.com/FeiElysia/awesome-zero-shot-captioning
List: awesome-zero-shot-captioning
captioning image-to-text video-to-text zero-shot
Last synced: 16 days ago
JSON representation
A curated list of zero-shot captioning papers
- Host: GitHub
- URL: https://github.com/FeiElysia/awesome-zero-shot-captioning
- Owner: FeiElysia
- Created: 2022-10-26T07:44:38.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-26T08:17:12.000Z (over 1 year ago)
- Last Synced: 2024-05-21T00:18:42.251Z (7 months ago)
- Topics: captioning, image-to-text, video-to-text, zero-shot
- Homepage:
- Size: 15.6 KB
- Stars: 15
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- ultimate-awesome - awesome-zero-shot-captioning - A curated list of zero-shot captioning papers. (Other Lists / Monkey C Lists)
README
# Awesome Zero-shot Captioning [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
A curated list of zero-shot captioning papers, including image-to-text generation, video-to-text generation. Maintained by Junjie Fei ([email protected]). Most recently updated on 2023/08/26.
## Image-to-Text Generation
### 2023
- **[Transferable Decoding with Visual Entities for Zero-Shot Image Captioning](https://arxiv.org/abs/2307.16525)** [![Star](https://img.shields.io/github/stars/FeiElysia/ViECap.svg?style=social&label=Star)](https://github.com/FeiElysia/ViECap)
```ICCV 2023``` [[paper]](https://arxiv.org/abs/2307.16525) [[code]](https://github.com/FeiElysia/ViECap)
- **[ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing](https://openaccess.thecvf.com/content/CVPR2023/html/Zeng_ConZIC_Controllable_Zero-Shot_Image_Captioning_by_Sampling-Based_Polishing_CVPR_2023_paper.html)** [![Star](https://img.shields.io/github/stars/joeyz0z/ConZIC.svg?style=social&label=Star)](https://github.com/joeyz0z/ConZIC)
```CVPR 2023``` [[paper]](https://openaccess.thecvf.com/content/CVPR2023/html/Zeng_ConZIC_Controllable_Zero-Shot_Image_Captioning_by_Sampling-Based_Polishing_CVPR_2023_paper.html) [[code]](https://github.com/joeyz0z/ConZIC) [[demo]](https://huggingface.co/spaces/jiaqingj/ConZIC)
- **[DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training](https://arxiv.org/abs/2303.03032)** [![Star](https://img.shields.io/github/stars/dhg-wei/DeCap.svg?style=social&label=Star)](https://github.com/dhg-wei/DeCap)
```ICLR 2023``` [[paper]](https://arxiv.org/abs/2303.03032) [[code]](https://github.com/dhg-wei/DeCap)
- **[MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning](https://aclanthology.org/2023.acl-long.664/)** [![Star](https://img.shields.io/github/stars/yangbang18/MultiCapCLIP.svg?style=social&label=Star)](https://github.com/yangbang18/MultiCapCLIP)
```ACL 2023``` [[paper]](https://aclanthology.org/2023.acl-long.664/) [[code]](https://github.com/yangbang18/MultiCapCLIP)
- **[ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles](https://arxiv.org/abs/2306.16649)** [![Star](https://img.shields.io/github/stars/ImKeTT/ZeroGen.svg?style=social&label=Star)](https://github.com/ImKeTT/ZeroGen)
```arXiv 2023``` [[paper]](https://arxiv.org/abs/2306.16649) [[code]](https://github.com/ImKeTT/ZeroGen)
- **[Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning](https://arxiv.org/abs/2302.04858)**
```arXiv 2023``` [[paper]](https://arxiv.org/abs/2302.04858)
- **[Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models](https://arxiv.org/abs/2305.18010)**
```arXiv 2023``` [[paper]](https://arxiv.org/abs/2305.18010) [[project page]](https://mzhaoshuai.github.io/RLCF/)
- **[CoBIT: A Contrastive Bi-directional Image-Text Generation Model](https://arxiv.org/abs/2303.13455)**
```arXiv 2023``` [[paper]](https://arxiv.org/abs/2303.13455)
### 2022
- **[ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic](https://openaccess.thecvf.com/content/CVPR2022/html/Tewel_ZeroCap_Zero-Shot_Image-to-Text_Generation_for_Visual-Semantic_Arithmetic_CVPR_2022_paper.html)** [![Star](https://img.shields.io/github/stars/YoadTew/zero-shot-image-to-text.svg?style=social&label=Star)](https://github.com/YoadTew/zero-shot-image-to-text)
```CVPR 2022``` [[paper]](https://openaccess.thecvf.com/content/CVPR2022/html/Tewel_ZeroCap_Zero-Shot_Image-to-Text_Generation_for_Visual-Semantic_Arithmetic_CVPR_2022_paper.html) [[code]](https://github.com/YoadTew/zero-shot-image-to-text) [[demo]](https://replicate.com/yoadtew/zero-shot-image-to-text)
- **[Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/abs/2211.00575)** [![Star](https://img.shields.io/github/stars/DavidHuji/CapDec.svg?style=social&label=Star)](https://github.com/DavidHuji/CapDec)
```EMNLP 2022``` [[paper]](https://arxiv.org/abs/2211.00575) [[code]](https://github.com/DavidHuji/CapDec)
- **[Visual Information Guided Zero-Shot Paraphrase Generation](https://arxiv.org/abs/2201.09107)**
```COLING 2022``` [[paper]](https://arxiv.org/abs/2201.09107)
- **[Language Models Can See: Plugging Visual Controls in Text Generation](https://arxiv.org/abs/2205.02655)** [![Star](https://img.shields.io/github/stars/yxuansu/MAGIC.svg?style=social&label=Star)](https://github.com/yxuansu/MAGIC)
```arXiv 2022``` [[paper]](https://arxiv.org/abs/2205.02655) [[code]](https://github.com/yxuansu/MAGIC) [[demo]](https://replicate.com/yxuansu/magic/examples)
- **[Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language](https://arxiv.org/abs/2204.00598)**
```arXiv 2022``` [[paper]](https://arxiv.org/abs/2204.00598) [[code]](https://github.com/google-research/google-research/tree/master/socraticmodels) [[project page]](https://socraticmodels.github.io/#code)
- **[Large-Scale Bidirectional Training for Zero-Shot Image Captioning](https://arxiv.org/abs/2211.06774)**
```arXiv 2022``` [[paper]](https://arxiv.org/abs/2211.06774)
### Before 2022
- **[Image Captioning with Unseen Objects](https://arxiv.org/abs/1908.00047)**
```BMVC 2019``` [[paper]](https://arxiv.org/abs/1908.00047)
## Video-to-Text Generation
### 2023
- **[Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Momen](https://arxiv.org/abs/2307.02682)**
```arXiv 2023``` [[paper]](https://arxiv.org/abs/2307.02682)
### 2022
- **[Zero-Shot Video Captioning with Evolving Pseudo-Tokens](https://arxiv.org/abs/2207.11100)** [![Star](https://img.shields.io/github/stars/YoadTew/zero-shot-video-to-text.svg?style=social&label=Star)](https://github.com/YoadTew/zero-shot-video-to-text)
```arXiv 2022``` [[paper]](https://arxiv.org/abs/2207.11100) [[code]](https://github.com/YoadTew/zero-shot-video-to-text)
### Before 2022
- **[Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning](https://ojs.aaai.org/index.php/AAAI/article/view/4926)**
```AAAI 2019``` [[paper]](https://ojs.aaai.org/index.php/AAAI/article/view/4926)