Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/FeiElysia/awesome-zero-shot-captioning

A curated list of zero-shot captioning papers
https://github.com/FeiElysia/awesome-zero-shot-captioning

List: awesome-zero-shot-captioning

captioning image-to-text video-to-text zero-shot

Last synced: 16 days ago
JSON representation

A curated list of zero-shot captioning papers

Awesome Lists containing this project

README

        

# Awesome Zero-shot Captioning [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

A curated list of zero-shot captioning papers, including image-to-text generation, video-to-text generation. Maintained by Junjie Fei ([email protected]). Most recently updated on 2023/08/26.

## Image-to-Text Generation

### 2023

- **[Transferable Decoding with Visual Entities for Zero-Shot Image Captioning](https://arxiv.org/abs/2307.16525)** [![Star](https://img.shields.io/github/stars/FeiElysia/ViECap.svg?style=social&label=Star)](https://github.com/FeiElysia/ViECap)

```ICCV 2023``` [[paper]](https://arxiv.org/abs/2307.16525) [[code]](https://github.com/FeiElysia/ViECap)

- **[ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing](https://openaccess.thecvf.com/content/CVPR2023/html/Zeng_ConZIC_Controllable_Zero-Shot_Image_Captioning_by_Sampling-Based_Polishing_CVPR_2023_paper.html)** [![Star](https://img.shields.io/github/stars/joeyz0z/ConZIC.svg?style=social&label=Star)](https://github.com/joeyz0z/ConZIC)

```CVPR 2023``` [[paper]](https://openaccess.thecvf.com/content/CVPR2023/html/Zeng_ConZIC_Controllable_Zero-Shot_Image_Captioning_by_Sampling-Based_Polishing_CVPR_2023_paper.html) [[code]](https://github.com/joeyz0z/ConZIC) [[demo]](https://huggingface.co/spaces/jiaqingj/ConZIC)

- **[DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training](https://arxiv.org/abs/2303.03032)** [![Star](https://img.shields.io/github/stars/dhg-wei/DeCap.svg?style=social&label=Star)](https://github.com/dhg-wei/DeCap)

```ICLR 2023``` [[paper]](https://arxiv.org/abs/2303.03032) [[code]](https://github.com/dhg-wei/DeCap)

- **[MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning](https://aclanthology.org/2023.acl-long.664/)** [![Star](https://img.shields.io/github/stars/yangbang18/MultiCapCLIP.svg?style=social&label=Star)](https://github.com/yangbang18/MultiCapCLIP)

```ACL 2023``` [[paper]](https://aclanthology.org/2023.acl-long.664/) [[code]](https://github.com/yangbang18/MultiCapCLIP)

- **[ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles](https://arxiv.org/abs/2306.16649)** [![Star](https://img.shields.io/github/stars/ImKeTT/ZeroGen.svg?style=social&label=Star)](https://github.com/ImKeTT/ZeroGen)

```arXiv 2023``` [[paper]](https://arxiv.org/abs/2306.16649) [[code]](https://github.com/ImKeTT/ZeroGen)

- **[Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning](https://arxiv.org/abs/2302.04858)**

```arXiv 2023``` [[paper]](https://arxiv.org/abs/2302.04858)

- **[Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models](https://arxiv.org/abs/2305.18010)**

```arXiv 2023``` [[paper]](https://arxiv.org/abs/2305.18010) [[project page]](https://mzhaoshuai.github.io/RLCF/)

- **[CoBIT: A Contrastive Bi-directional Image-Text Generation Model](https://arxiv.org/abs/2303.13455)**

```arXiv 2023``` [[paper]](https://arxiv.org/abs/2303.13455)

### 2022

- **[ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic](https://openaccess.thecvf.com/content/CVPR2022/html/Tewel_ZeroCap_Zero-Shot_Image-to-Text_Generation_for_Visual-Semantic_Arithmetic_CVPR_2022_paper.html)** [![Star](https://img.shields.io/github/stars/YoadTew/zero-shot-image-to-text.svg?style=social&label=Star)](https://github.com/YoadTew/zero-shot-image-to-text)

```CVPR 2022``` [[paper]](https://openaccess.thecvf.com/content/CVPR2022/html/Tewel_ZeroCap_Zero-Shot_Image-to-Text_Generation_for_Visual-Semantic_Arithmetic_CVPR_2022_paper.html) [[code]](https://github.com/YoadTew/zero-shot-image-to-text) [[demo]](https://replicate.com/yoadtew/zero-shot-image-to-text)

- **[Text-Only Training for Image Captioning using Noise-Injected CLIP](https://arxiv.org/abs/2211.00575)** [![Star](https://img.shields.io/github/stars/DavidHuji/CapDec.svg?style=social&label=Star)](https://github.com/DavidHuji/CapDec)

```EMNLP 2022``` [[paper]](https://arxiv.org/abs/2211.00575) [[code]](https://github.com/DavidHuji/CapDec)

- **[Visual Information Guided Zero-Shot Paraphrase Generation](https://arxiv.org/abs/2201.09107)**

```COLING 2022``` [[paper]](https://arxiv.org/abs/2201.09107)

- **[Language Models Can See: Plugging Visual Controls in Text Generation](https://arxiv.org/abs/2205.02655)** [![Star](https://img.shields.io/github/stars/yxuansu/MAGIC.svg?style=social&label=Star)](https://github.com/yxuansu/MAGIC)

```arXiv 2022``` [[paper]](https://arxiv.org/abs/2205.02655) [[code]](https://github.com/yxuansu/MAGIC) [[demo]](https://replicate.com/yxuansu/magic/examples)

- **[Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language](https://arxiv.org/abs/2204.00598)**

```arXiv 2022``` [[paper]](https://arxiv.org/abs/2204.00598) [[code]](https://github.com/google-research/google-research/tree/master/socraticmodels) [[project page]](https://socraticmodels.github.io/#code)

- **[Large-Scale Bidirectional Training for Zero-Shot Image Captioning](https://arxiv.org/abs/2211.06774)**

```arXiv 2022``` [[paper]](https://arxiv.org/abs/2211.06774)

### Before 2022

- **[Image Captioning with Unseen Objects](https://arxiv.org/abs/1908.00047)**

```BMVC 2019``` [[paper]](https://arxiv.org/abs/1908.00047)

## Video-to-Text Generation

### 2023

- **[Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Momen](https://arxiv.org/abs/2307.02682)**

```arXiv 2023``` [[paper]](https://arxiv.org/abs/2307.02682)

### 2022

- **[Zero-Shot Video Captioning with Evolving Pseudo-Tokens](https://arxiv.org/abs/2207.11100)** [![Star](https://img.shields.io/github/stars/YoadTew/zero-shot-video-to-text.svg?style=social&label=Star)](https://github.com/YoadTew/zero-shot-video-to-text)

```arXiv 2022``` [[paper]](https://arxiv.org/abs/2207.11100) [[code]](https://github.com/YoadTew/zero-shot-video-to-text)

### Before 2022

- **[Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning](https://ojs.aaai.org/index.php/AAAI/article/view/4926)**

```AAAI 2019``` [[paper]](https://ojs.aaai.org/index.php/AAAI/article/view/4926)