Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/RyanLiut/awesome-diverse-captioning

Some papers about *diverse* image (a few videos) captioning
https://github.com/RyanLiut/awesome-diverse-captioning

List: awesome-diverse-captioning

captioning diversity

Last synced: 3 months ago
JSON representation

Some papers about *diverse* image (a few videos) captioning

Awesome Lists containing this project

README

        

# Awesome-Diverse-Captioning

A curated list of diverse image (mainly, sometimes video, and even textual) captioning. Note that broadly, visual diverse captioning includes diverse caption set (one to many) and distinctive caption (for one single caption) with/without explicit controllable signs. Dense video captioning is excluded since it has become a subarea of video captioning. More detailed tags will be updated later. Feel free to inform me if you have any comment.

## Paper List

## 2022

1. [A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation](https://arxiv.org/abs/2203.15108)

*Shashi Narayan, Gonçalo Simões, Yao Zhao, Joshua Maynez, Dipanjan Das, Michael Collins, Mirella Lapata (Google)*

`ACL 2022` [[partial code](https://github.com/google-research/language/tree/master/language/frost)]

`conditional` `metrics` `decoding sampling`

2. [Hierarchical Sketch Induction for Paraphrase Generation](https://arxiv.org/abs/2203.03463)

*Tom Hosking, Hao Tang, Mirella Lapata*

`ACL 2022`

`controllable` `VAEs`

3. [Generating Scientific Definitions with Controllable Complexity](https://arxiv.org/abs/2203.15108)

*Tal August, Katharina Reinecke, Noah A. Smith*

`ACL 2022`

`controllable` `definition modeling`

4. [CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation](https://arxiv.org/abs/2204.00862)

*Pei Ke, Hao Zhou, Yankai Lin, Peng Li, Jie Zhou, Xiaoyan Zhu, Minlie Huang*

`ACL 2022`

`controllable` `metric`

4. Controllable Dictionary Example Generation: Generating Example Sentences for Specific Targeted Audiences

*Xingwei He, Siu Ming Yiu*

`ACL 2022`

`controllable`

5. [Show, Tell and Rephrase: Diverse Video Captioning via Two-Stage Progressive Training](https://ieeexplore.ieee.org/abstract/document/9999037/metrics#metrics)

*Zhu Liu, Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ke Lu*

`TMM 2022`

`diversity` `metric`

## 2021

1. [Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Human-Like_Controllable_Image_Captioning_With_Verb-Specific_Semantic_Roles_CVPR_2021_paper.pdf)

*Long Chen, Zhihong Jiang, Jun Xiao, Wei Liu*

`CVPR 2021`

`controllable`

2. [Towards Accurate Text-Based Image Captioning With Content Diversity Exploration](https://openaccess.thecvf.com/content/CVPR2021/papers/Song_Towards_Diverse_Paragraph_Captioning_for_Untrimmed_Videos_CVPR_2021_paper.pdf)

*Guanghui Xu, Shuaicheng Niu, Mingkui Tan, Yucheng Luo, Qing Du, Qi Wu*

`CVPR 2021`

3. [Open-Book Video Captioning With Retrieve-Copy-Generate Network](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Open-Book_Video_Captioning_With_Retrieve-Copy-Generate_Network_CVPR_2021_paper.pdf)

*Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Ying Deng, Weiming Hu*

`CVPR 2021`

4. [Question-controlled Text-aware Image Captioning](https://arxiv.org/abs/2108.02059)

*Anwen Hu, Shizhe Chen, Qin Jin*

`ACMMM 2021`

`controllable`

5. [O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning (Short)](https://web.pkusz.edu.cn/adsp/files/2021/07/ACL2021_O2NA.pdf)

*Fenglin Liu, Xuancheng Ren, Xian Wu, Bang Yang, Shen Ge, Yuexian Zou, Xu Sun*

`ACL 2021`

`controllable`

6. [Control Image Captioning Spatially and Temporally](https://aclanthology.org/2021.acl-long.157.pdf)

*Kun Yan, Lei Ji, Huaishao Luok, Ming Zhou, Nan Duan, Shuai Ma*

`ACL 2021`

`controllable (mouse traces)`

7. [Understanding Guided Image Captioning Performance across Domains](https://arxiv.org/abs/2012.02339)

*Edwin G. Ng, Bo Pang, Piyush Sharma, Radu Soricut*

`CoNLL 2021`

`controllable (semantic label)`

8. [Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning](https://openaccess.thecvf.com/content/ICCV2021/papers/Shi_Partial_Off-Policy_Learning_Balance_Accuracy_and_Diversity_for_Human-Oriented_Image_ICCV_2021_paper.pdf)

*Jiahe Shi, Yali Li, Shengjin Wang*

`ICCV 2021`

`controllable`

## 2020

1. LNFMM: [Latent Normalizing Flows for Many-to-Many Cross Domain Mappings](https://openreview.net/pdf?id=SJxE8erKDH)

*Shweta Mahajan, Iryna Gurevych, Stefan Roth*

`ICLR 2020` [[pytorch-code](https://github.com/visinf/lnfmm)] [[openreview](https://openreview.net/forum?id=SJxE8erKDH)]

2. [Diverse Image Captioning with Context-Object Split Latent Spaces](https://arxiv.org/abs/2011.00966)

*Shweta Mahajan and Stefan Roth*

`NIPS 2020` [[pytorch-code](https://github.com/visinf/cos-cvae)] [[review](https://papers.nips.cc/paper/2020/file/24bea84d52e6a1f8025e313c2ffff50a-Review.html)]

`diversity`

3. [On Diversity in Image Captioning: Metrics and Methods](https://doi.ieeecomputersociety.org/10.1109/TPAMI.2020.3013834)

*Qingzhong Wang and Jia Wan and Antoni B. Chan*

`TPAMI 2020` [[pytorch-code](https://github.com/qingzwang/DiverseImageCaptioning)]

`survey` `diversity` `metrics`

4. [Improving Image Captioning Evaluation by Considering Inter References Variance](https://www.aclweb.org/anthology/2020.acl-main.93.pdf)

*Yanzhi Yi and Hangyu Deng and Jinglu Hu*

`ACL 2020` [[code](https://github.com/ck0123/improved-bertscore-for-image-captioning-evaluation)]

`metrics`

5. [Better Captioning with Sequence-Level Exploration](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Better_Captioning_With_Sequence-Level_Exploration_CVPR_2020_paper.pdf)

*Jia Chen and Qin Jin*

`CVPR 2020` [[video](https://www.youtube.com/watch?v=d-IVBVci3Nk)]

`diversity`

## 2019

1. POS: [Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech](https://openaccess.thecvf.com/content_CVPR_2019/papers/Deshpande_Fast_Diverse_and_Accurate_Image_Captioning_Guided_by_Part-Of-Speech_CVPR_2019_paper.pdf)

*Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander Schwing, David Forsyth*

`CVPR 2019`.

`diversity` `controllable`

2. [Generating Diverse and Descriptive Image Captions Using Visual Paraphrases](https://openaccess.thecvf.com/content_ICCV_2019/papers/Liu_Generating_Diverse_and_Descriptive_Image_Captions_Using_Visual_Paraphrases_ICCV_2019_paper.pdf)

*Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo*

`ICCV 2019`

`descriptiveness`

3. [Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network](https://arxiv.org/abs/1908.10072)

*Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu*

`ICCV 2019` [[pytorch-code](https://github.com/vsislab/Controllable_XGating)]

`controllable`

4. VSSI-cap: [Variational Structured Semantic Inference for Diverse Image Captioning](https://openaccess.thecvf.com/content_ICCV_2019/papers/Liu_Generating_Diverse_and_Descriptive_Image_Captions_Using_Visual_Paraphrases_ICCV_2019_paper.pdf)

*Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang*

`NIPS 2019`

`diversity` `VAE` `discriminativeness`

5. [Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions](https://arxiv.org/pdf/1811.10652.pdf)

*Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara*

`CVPR 2019` [[pytorch-code](https://github.com/aimagelab/show-control-and-tell)]

`controllable`

6. [Intention Oriented Image Captions with Guiding Objects](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zheng_Intention_Oriented_Image_Captions_With_Guiding_Objects_CVPR_2019_paper.pdf)

*Yue Zheng, Yali Li and Shengjin Wang*

`CVPR 2019` [[unfinishe-code](https://github.com/google-research-datasets/T2-Guiding)]

`controllable (object labels)`

7. [Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process](https://arxiv.org/abs/1908.04919)

*Wang, Qingzhong and Chan, Antoni B*

`Arxiv 2019` [[pytorch-code](https://github.com/qingzwang/DiverseImageCaptioning)]

8. [Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation](https://arxiv.org/pdf/1908.00169)

*Yadan Luo, Zi Huang, Zheng Zhang, Ziwei Wang, Jingjing Li, Yang Yang*

`ACM MM 2019`

9. [Engaging Image Captioning via Personality](https://openaccess.thecvf.com/content_CVPR_2019/papers/Shuster_Engaging_Image_Captioning_via_Personality_CVPR_2019_paper.pdf)

*Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston*

`CVPR 2019` [[Openreview for ICLR 19](https://openreview.net/forum?id=HJN6DiAcKQ)]

10. [Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning](https://openaccess.thecvf.com/content_ICCV_2019/papers/Aneja_Sequential_Latent_Spaces_for_Modeling_the_Intention_During_Diverse_Image_ICCV_2019_paper.pdf)

*Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing*

`ICCV 2019`

`diversity` `VAE`

11. [Describing Like Humans: On Diversity in Image Captioning](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Describing_Like_Humans_On_Diversity_in_Image_Captioning_CVPR_2019_paper.pdf)

*Qingzhong Wang, Antoni B. Chan*

`CVPR 2019`

12. [MSCap: Multi-Style Image Captioning With Unpaired Stylized Text](https://openaccess.thecvf.com/CVPR2019_search)

*Longteng Guo, Jing Liu, Peng Yao, Jiangwei Li, Hanqing Lu*

`CVPR 2019`

## 2018

1. [GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints](https://openaccess.thecvf.com/content_cvpr_2018/html/Chen_GroupCap_Group-Based_Image_CVPR_2018_paper.html)

*Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Jinsong Su*

`CVPR 2018`

2. [A Neural Compositional Paradigm for Image Captioning](https://papers.nips.cc/paper/2018/file/8bf1211fd4b7b94528899de0a43b9fb3-Paper.pdf)

*Bo Dai, Sanja Fidler, Dahua Lin*

`NIPS 2018` [[lua-code](https://github.com/doubledaibo/compcaption_neurips2018)] [[open review](https://openreview.net/forum?id=SJxyZ81IYQ)]

3. [Diverse and Coherent Paragraph Generation from Images](https://openaccess.thecvf.com/content_ECCV_2018/papers/Moitreya_Chatterjee_Diverse_and_Coherent_ECCV_2018_paper.pdf)

*Moitreya Chatterjee and Alexander G. Schwing*

`ECCV 2018` [[pytorch-code](https://github.com/metro-smiles/CapG_RevG_Code)]

4. [Categorizing Concepts With Basic Level for Vision-to-Language](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Categorizing_Concepts_With_CVPR_2018_paper.pdf)

*Hanzhang Wang, Hanli Wang, Kaisheng Xu*

`CVPR 2018`

## 2017

1. [Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training](https://openaccess.thecvf.com/content_ICCV_2017/papers/Shetty_Speaking_the_Same_ICCV_2017_paper.pdf)

*Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele*

`ICCV 2017`

`diversity` `GAN`

2. [Towards Diverse and Natural Image Descriptions via a Conditional GAN](https://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Towards_Diverse_and_ICCV_2017_paper.pdf)

*Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin*

`ICCV 2017` `GAN` [[video](https://www.youtube.com/watch?v=Xnk1bjZCEYo)]

3. [Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space](https://proceedings.neurips.cc/paper/2017/hash/4b21cf96d4cf612f239a6c322b10c8fe-Abstract.html)

*Liwei Wang, Alexander Schwing, Svetlana Lazebnik*

`NeurIPS 2017` [[Review](https://papers.nips.cc/paper/2017/file/4b21cf96d4cf612f239a6c322b10c8fe-Reviews.html)]

`diversity` `VAE`

4. [Weakly Supervised Dense Video Captioning](https://openaccess.thecvf.com/content_cvpr_2017/papers/Shen_Weakly_Supervised_Dense_CVPR_2017_paper.pdf)

*Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, Xiangyang Xue*

`CVPR 2017` `VAE`

5. [From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning](https://ieeexplore.ieee.org/document/8438512)

*Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, Heng Tao Shen*

`IEEE Trans Neural Netw Learn Syst 2017` `VAE`

## 2016

1. [Diverse Image Captioning via GroupTalk](https://www.ijcai.org/Proceedings/16/Papers/420.pdf)

*Zhuhao Wang, Fei Wu, Weiming Lu, Jun Xiao, Xi Li, Zitong Zhang, Yueting Zhuang*

`IJCAI 2016`

2. [Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models](https://arxiv.org/abs/1610.02424)

*Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, Dhruv Batra*

`CoRR 2016` [[lua-code](https://github.com/Cloud-CV/diverse-beam-search)] [[demo](http://dbs.cloudcv.org/captioning)] [[openreview from ICLR'17](https://openreview.net/forum?id=HJV1zP5xg&noteId=ryZU_K87x)]

## 2015
1. [Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)](https://arxiv.org/pdf/1412.6632.pdf)

*Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille*

`ICLR 2015` [[code:TF-mRNN](https://github.com/mjhucla/TF-mRNN)] [[code:mRNN-CR](https://github.com/mjhucla/mRNN-CR)]

`diversity` `consensus re-ranking`

## Main Reference

https://openaccess.thecvf.com/menu

https://openreview.net/