Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hymie122/RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
https://github.com/hymie122/RAG-Survey

aigc diffusion-models llm multimodality rag survey

Last synced: 2 months ago
JSON representation

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

Awesome Lists containing this project

README

        

# Retrieval-Augmented Generation for AI-Generated Content: A Survey
This repo is constructed for collecting and categorizing papers about RAG according to our survey paper: [*Retrieval-Augmented Generation for AI-Generated Content: A Survey*](https://arxiv.org/abs/2402.19473). Considering the rapid growth of this field, we will continue to update both [paper](https://arxiv.org/abs/2402.19473) and this repo.

# Overview


image

# Catalogue
## Methods Taxonomy
### RAG Foundations


image

- Query-based RAG

[REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909)

[Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511)

[REPLUG: Retrieval-Augmented Black-Box Language Models](https://arxiv.org/abs/2301.12652)

[In-Context Retrieval-Augmented Language Models](https://arxiv.org/abs/2302.00083)

[When Language Model Meets Private Library](https://arxiv.org/abs/2210.17236)

[DocPrompting: Generating Code by Retrieving the Docs](https://openreview.net/pdf?id=ZTCxT2t2Ru)

[Retrieval-based prompt selection for code-related few-shot learning](https://doi.org/10.1109/ICSE48619.2023.00205)

[Inferfix: End-to-end program repair with llms](https://doi.org/10.1145/3611643.3613892)

[Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models](https://proceedings.mlr.press/v202/huang23i.html)

[Reacc: A retrieval-augmented code completion framework](https://doi.org/10.18653/v1/2022.acl-long.431)

[Uni-parser: Unified semantic parser for question answering on knowledge base and database](https://doi.org/10.18653/v1/2022.emnlp-main.605)

[RNG-KBQA: generation augmented iterative ranking for knowledge base question answering](https://doi.org/10.18653/v1/2022.acl-long.417)

[End-to-end casebased reasoning for commonsense knowledge base completion](https://doi.org/10.18653/v1/2023.eacl-main.255)

[Combining transfer learning with in-context learning using blackbox llms for zero-shot knowledge base question answering](https://doi.org/10.48550/arXiv.2311.08894)

[Genegpt: Augmenting large language models with domain tools for improved access to biomedical information](https://arxiv.org/abs/2304.09667)

[Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making](https://dl.acm.org/doi/10.1145/3584371.3612956)

[Retrievegan:Image synthesis via differentiable patch retrieval](https://link.springer.com/chapter/10.1007/978-3-030-58598-3_15)

[Instance-conditioned gan](https://proceedings.neurips.cc/paper/2021/file/e7ac288b0f2d41445904d071ba37aaff-Paper.pdf)

[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)

- Latent Representation-based RAG

[Leveraging passage retrieval with generative models for open domain question answering](https://doi.org/10.18653/v1/2021.eacl-main.74)

[Bashexplainer: Retrieval-augmented bash code comment generation based on finetuned codebert](https://doi.org/10.1109/ICSME55016.2022.00016)

[EditSum: A Retrieve-and-Edit Framework for Source Code Summarization](https://doi.org/10.1109/ASE51524.2021.9678724)

[Retrieve and Refine: Exemplar-based Neural Comment Generation](https://arxiv.org/abs/2010.04459)

[RACE: retrieval-augmented commit message generation](https://doi.org/10.18653/v1/2022.emnlp-main.372)

[Unik-qa: Unified representations of structured and unstructured knowledge for open-domain question answering](https://doi.org/10.18653/v1/2022.findings-naacl.115)

[A Retrieve-and-Edit Framework for Predicting Structured Outputs](https://proceedings.neurips.cc/paper/2018/hash/cd17d3ce3b64f227987cd92cd701cc58-Abstract.html)

[DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases](https://openreview.net/pdf?id=XHc5zRPxqV9)

[Bridging the kb-text gap: Leveraging structured knowledge-aware pre-training for KBQA](https://doi.org/10.1145/3583780.3615150)

[Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering](https://doi.org/10.48550/arXiv.2308.13259)

[Retrieval-enhanced generative model for large-scale knowledge graph completion](https://doi.org/10.1145/3539618.3592052)

[Case-based reasoning for natural language queries over knowledge bases](https://doi.org/10.18653/v1/2021)

[A Protein-Ligand Interaction-focused 3D Molecular Generative Framework for Generalizable Structure-based Drug Design](https://chemrxiv.org/engage/chemrxiv/article-details/6482d9dbbe16ad5c57af1937)

[Improving language models by retrieving from trillions of tokens](https://proceedings.mlr.press/v162/borgeaud22a.html)

[Remodiffuse: Retrieval-augmented motion diffusion model](https://doi.org/10.1109/ICCV51070.2023.00040)

[Memorizing transformers](https://openreview.net/forum?id=TrjbxzRcnf-)

[Audio captioning using pre-trained large-scale language model guided by audio-based similar caption retrieval](https://arxiv.org/abs/2012.07331)

[Retrieval augmented convolutional encoder-decoder networks for video captioning](https://doi.org/10.1145/3539225)

[Retrieval-augmented egocentric video captioning](https://doi.org/10.48550/arXiv.2401.00789)

[Re-imagen: Retrievalaugmented text-to-image generator](https://arxiv.org/abs/2209.14491)

[Knn-diffusion: Image generation via large-scale retrieval](https://arxiv.org/abs/2204.02849)

[Retrieval-augmented diffusion models](https://proceedings.neurips.cc/paper_files/paper/2022/file/62868cc2fc1eb5cdf321d05b4b88510c-Paper-Conference.pdf)

[Text-guided synthesis of artistic images with retrieval-augmented diffusion models](https://arxiv.org/abs/2207.13038)

[Memory-driven text-to-image generation](https://arxiv.org/abs/2208.07022)

[Mention memory: incorporating textual knowledge into transformers through entity mention attention](https://arxiv.org/abs/2110.06176)

[Unlimiformer:Long-range transformers with unlimited length input](https://doi.org/10.48550/arXiv.2305.01625)

[Entities as experts: Sparse memory access with entity supervision](https://arxiv.org/abs/2004.07202)

[Amd: Anatomical motion diffusion with interpretable motion decomposition and fusion](https://arxiv.org/abs/2312.12763)

[Retrieval-augmented text-to-audio generation](https://doi.org/10.48550/arXiv.2309.08051)

[Concept-aware video captioning: Describing videos with effective prior information](https://doi.org/10.1109/TIP.2023.3307969)

- Logit-based RAG

[Generalization through memorization: Nearest neighbor language models](https://openreview.net/forum?id=HklBjCEKvH)

[Syntax-Aware Retrieval Augmented Code Generation](https://aclanthology.org/2023.findings-emnlp.90)

[Memory-augmented image captioning](https://aaai.org/papers/01317-memory-augmented-image-captioning/)

[Retrieval-based neural source code summarization](https://doi.org/10.1145/3377811.3380383)

[Efficient nearest neighbor language models](https://doi.org/10.18653/v1/2021.emnlp-main.461)

[Nonparametric masked language modeling](https://doi.org/10.18653/v1/2023.findings-acl.132)

[Editsum:A retrieve-and-edit framework for source code summarization](https://doi.org/10.1109/ASE51524.2021.9678724)


- Speculative RAG

[REST: Retrieval-Based Speculative Decoding](https://doi.org/10.48550/arXiv.2311.08252)

[GPTCache](https://github.com/zilliztech/GPTCache)

[COPY IS ALL YOU NEED](https://arxiv.org/abs/2307.06962)

[RETRIEVAL IS ACCURATE GENERATION](https://arxiv.org/abs/2402.17532)

### RAG Enhancements


image

- Input Enhancement

- Query Transformations

[Query2doc: Query Expansion with Large Language Models](https://aclanthology.org/2023.emnlp-main.585)

[Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models](https://openreview.net/forum?id=vDvFT7IX4O)

[Precise Zero-Shot Dense Retrieval without Relevance Labels](https://doi.org/10.18653/v1/2023.acl-long.99)

[RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation](https://arxiv.org/pdf/2404.00610)

[Dynamic Contexts for Generating Suggestion Questions in RAG Based Conversational Systems](https://arxiv.org/pdf/2403.11413)

- Data Augmentation

[LESS: selecting influential data for targeted instruction tuning](https://arxiv.org/abs/2402.04333)

[Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models](https://proceedings.mlr.press/v202/huang23i.html)

[Telco-RAG: Navigating the challenges of retrieval-augmented language models for telecommunications](https://arxiv.org/pdf/2404.15939)

- Retriever Enhancement

- Recursive Retrieve

[Query Expansion by Prompting Large Language Models](https://doi.org/10.48550/arXiv.2305.03653)

[Rat: Retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation](https://arxiv.org/abs/2403.05313)

[React: Synergizing reasoning and acting in language models](https://arxiv.org/abs/2210.03629)

[Chain-of-thought prompting elicits reasoning in large language models](https://arxiv.org/abs/2201.11903)

[Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search](https://aclanthology.org/2023.findings-emnlp.86)

[ACTIVERAG: Revealing the Treasures of Knowledge via Active Learning](https://arxiv.org/abs/2402.13547)

[Retrieval-Augmented Thought Process as Sequential Decision Making](https://arxiv.org/abs/2402.07812)

[In search of needles in a 10m haystack: Recurrent memory finds what llms miss](https://arxiv.org/abs/2402.10790v1)

[Lost in the middle: How language models use long contexts](https://arxiv.org/abs/2307.03172)



- Chunk Optimization

[LlamaIndex](https://github.com/jerryjliu/llama_index)

[RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL](https://arxiv.org/pdf/2401.18059.pdf)

[Prompt-RAG: Pioneering Vector Embedding-Free Retrieval-Augmented Generation in Niche Domains, Exemplified by Korean Medicine](https://arxiv.org/pdf/2401.11246)

[Question-Based Retrieval using Atomic Units for Enterprise RAG](https://arxiv.org/pdf/2405.12363)

- Finetune Retriever

[C-Pack: Packaged Resources To Advance General Chinese Embedding](https://arxiv.org/abs/2309.07597)

[BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation](https://arxiv.org/abs/2402.03216)

[LM-Cocktail: Resilient Tuning of Language Models via Model Merging](https://arxiv.org/abs/2311.13534)

[Retrieve Anything To Augment Large Language Models](https://arxiv.org/abs/2310.07554)

[Replug: Retrieval-augmented black-box language models](https://arxiv.org/abs/2301.12652)

[When Language Model Meets Private Library](https://doi.org/10.18653/v1/2022.findings-emnlp.21)

[EditSum: A Retrieve-and-Edit Framework for Source Code Summarization](https://doi.org/10.1109/ASE51524.2021.9678724)

[Synchromesh: Reliable Code Generation from Pre-trained Language Models](https://openreview.net/forum?id=KmtVD97J43e)

[Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning](https://doi.org/10.1145/3539225)

[Reinforcement learning for optimizing RAG for domain chatbots](https://arxiv.org/abs/2401.06800)

- Hybrid Retrieve

[RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair](https://doi.org/10.1145/3611643.3616256)

[ReACC: A Retrieval-Augmented Code Completion Framework](https://doi.org/10.18653/v1/2022.acl-long.431)

[Retrieval-based neural source code summarization](https://doi.org/10.1145/3377811.3380383)

[BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT](https://doi.org/10.1109/ICSME55016.2022.00016)

[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)

[Corrective Retrieval Augmented Generation](https://arxiv.org/abs/2401.15884)

[Retrieval augmented generation with rich answer encoding](https://aclanthology.org/2023.ijcnlp-main.65.pdf)

[Unims-rag: A unified multi-source retrieval-augmented generation for personalized dialogue systems](https://arxiv.org/abs/2401.13256)

[You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval](https://arxiv.org/pdf/2403.07222v1)

[Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers](https://arxiv.org/pdf/2404.07220)

- Re-ranking

[Re2G: Retrieve, Rerank, Generate](https://doi.org/10.18653/v1/2022.naacl-main.194)

[Passage Re-ranking with BERT](http://arxiv.org/abs/1901.04085)

[AceCoder: Utilizing Existing Code to Enhance Code Generation](https://arxiv.org/abs/2303.17780)

[XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing](https://doi.org/10.18653/v1/2022.findings-emnlp.384)

[A Fine-tuning Enhanced RAG System with Quantized Influence Measure as AI Judge](https://arxiv.org/abs/2402.17081v1)

[UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers](https://arxiv.org/pdf/2303.00807.pdf)

[Learning to Retrieve In-Context Examples for Large Language Models](https://arxiv.org/pdf/2307.07164.pdf)

[The Chronicles of RAG: The Retriever, the Chunk and the Generator](https://arxiv.org/pdf/2401.07883.pdf)

[Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases](https://arxiv.org/pdf/2403.10446)

- Retrieval Transformation

[Learning to filter context for retrieval-augmented generation](https://arxiv.org/abs/2311.08377)

[Fid-light: Efficient and effective retrieval-augmented text generation](https://arxiv.org/abs/2209.14290)

[Gar-meets-rag paradigm for zero-shot information retrieval](https://arxiv.org/abs/2310.20158)

- Others

[PineCone](https://www.pinecone.io)

[Generate rather than retrieve: Large language models are strong context generators](https://arxiv.org/abs/2209.10063)

[Generator-retriever-generator: A novel approach to open-domain question answering](https://arxiv.org/abs/2307.11278)

[Multi-Head RAG: Solving Multi-Aspect Problems with LLMs](https://arxiv.org/pdf/2406.05085)

- Generator Enhancement

- Prompt Engineering

[Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)

[Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://doi.org/10.48550/arXiv.2310.06117)

[Active Prompting with Chain-of-Thought for Large Language Models](https://doi.org/10.48550/arXiv.2302.12246)

[Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](http://papers.nips.cc/paper\_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html)

[LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://aclanthology.org/2023.emnlp-main.825)

[Lost in the Middle: How Language Models Use Long Contexts](https://doi.org/10.48550/arXiv.2307.03172)

[ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model](https://doi.org/10.1109/ICCV51070.2023.00040)

[Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)](https://arxiv.org/abs/2304.06815)

[Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning](https://doi.org/10.1109/ICSE48619.2023.00205)

[XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing](https://doi.org/10.18653/v1/2022.findings-emnlp.384)

[Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models](https://proceedings.mlr.press/v202/huang23i.html)

- Decoding Tuning

[InferFix: End-to-End Program Repair with LLMs](https://doi.org/10.1145/3611643.3613892)

[Synchromesh: Reliable Code Generation from Pre-trained Language Models](https://openreview.net/forum?id=KmtVD97J43e)


- Finetune Generator

[Improving Language Models by Retrieving from Trillions of Tokens](https://proceedings.mlr.press/v162/borgeaud22a.html)

[When Language Model Meets Private Library](https://doi.org/10.18653/v1/2022.findings-emnlp.21)

[CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis](https://arxiv.org/abs/2203.13474)

[Concept-Aware Video Captioning: Describing Videos With Effective Prior Information](https://doi.org/10.1109/TIP.2023.3307969)

[Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation](https://doi.org/10.48550/arXiv.2307.06940)

[Lora: Low-rank adaptation of large language models](https://arxiv.org/abs/2106.09685)

[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)

- Result Enhancement

- Rewrite Output

[Automated Code Editing with Search-Generate-Modify](https://doi.org/10.48550/arXiv.2306.06490)

[Repair Is Nearly Generation: Multilingual Program Repair with LLMs](https://doi.org/10.1609/aaai.v37i4.25642)

[Case-based Reasoning for Natural Language Queries over Knowledge Bases](https://doi.org/10.18653/v1/2021.emnlp-main.755)

- RAG Pipeline Enhancement

- Adaptive Retrieval

- Rule-Baesd

[Active retrieval augmented generation](https://arxiv.org/abs/2305.06983)

[Efficient Nearest Neighbor Language Models](https://doi.org/10.18653/v1/2021.emnlp-main.461)

[Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172)

[Nonparametric masked language modeling](https://arxiv.org/abs/2212.01349)

[When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories](https://doi.org/10.18653/v1/2023.acl-long.546)

[How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering](https://doi.org/10.1162/tacl\_a\_00407)

[Large Language Models Struggle to Learn Long-Tail Knowledge](https://proceedings.mlr.press/v202/kandpal23a.html)

- Model-Based

[Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://doi.org/10.48550/arXiv.2310.11511)

[Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation](https://doi.org/10.48550/arXiv.2307.11019)

[Self-Knowledge Guided Retrieval Augmentation for Large Language Models](https://aclanthology.org/2023.findings-emnlp.691)

[Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models](https://arxiv.org/abs/2402.10612)

[Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity](https://arxiv.org/abs/2403.14403)

- Iterative RAG

[RepoCoder: Repository-Level Through Iterative Retrieval and Generation](https://aclanthology.org/2023.emnlp-main.151)

[Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy](https://aclanthology.org/2023.findings-emnlp.620)

[Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training](https://arxiv.org/abs/2010.12688)

## Applications Taxonomy


image

image

### RAG for Text
- Question Answering

[Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering](https://doi.org/10.18653/v1/2021.eacl-main.74)

[REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909)

[Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training](https://doi.org/10.18653/v1/2021.naacl-main.278)

[Atlas: Few-shot Learning with Retrieval Augmented Language Models](http://jmlr.org/papers/v24/23-0037.html)

[Improving Language Models by Retrieving from Trillions of Tokens](https://proceedings.mlr.press/v162/borgeaud22a.html)

[Self-Knowledge Guided Retrieval Augmentation for Large Language Models](https://aclanthology.org/2023.findings-emnlp.691)

[Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering](https://doi.org/10.48550/arXiv.2306.04136)

[Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph](https://doi.org/10.48550/arXiv.2307.07697)

[Nonparametric Masked Language Modeling](https://doi.org/10.18653/v1/2023.findings-acl.132)

[CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering](https://doi.org/10.18653/v1/2022.findings-naacl.165)

[One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval](https://proceedings.neurips.cc/paper/2021/hash/3df07fdae1ab273a967aaa1d355b8bb6-Abstract.html)

[Entities as Experts: Sparse Memory Access with Entity Supervision](https://arxiv.org/abs/2004.07202)

[When to Read Documents or QA History: On Unified and Selective Open-domain QA](https://doi.org/10.18653/v1/2023.findings-acl.401)

[Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation](https://arxiv.org/abs/2311.04177)

[DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Service](https://arxiv.org/pdf/2309.11325.pdf)

- Fact verification

[CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval](https://aclanthology.org/2022.coling-1.86)

[Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization](https://arxiv.org/pdf/2405.02816)

- Commonsense Reasoning

[KG-BART: Knowledge Graph-Augmented {BART} for Generative Commonsense Reasoning](https://doi.org/10.1609/aaai.v35i7.16796)

[What Evidence Do Language Models Find Convincing?](https://arxiv.org/abs/2402.11782v1)

[Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models](https://arxiv.org/abs/2310.04027)

- Human-Machine Conversation

[Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs](https://doi.org/10.18653/v1/2020.acl-main.184)

[Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory](https://doi.org/10.18653/v1/n19-1124)

[Internet-Augmented Dialogue Generation](https://doi.org/10.18653/v1/2022.acl-long.579)

[BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage](https://doi.org/10.48550/arXiv.2208.03188)

[A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems](https://doi.org/10.18653/v1/2021.findings-emnlp.33)

[From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL](https://openreview.net/forum?id=KLPLCXo4aD)

[Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages](https://aclanthology.org/2023.findings-acl.528/)

[Citation-Enhanced Generation for LLM-based Chatbot](https://arxiv.org/pdf/2402.16063v1.pdf)

[KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants](https://aclanthology.org/2024.scichat-1.5/)

- Neural Machine Translation

[Neural Machine Translation with Monolingual Translation Memory](https://doi.org/10.18653/v1/2021.acl-long.567)

[Nearest Neighbor Machine Translation](https://openreview.net/forum?id=7wCBOfJ8hJM)

[Training Language Models with Memory Augmentation](https://doi.org/10.18653/v1/2022.emnlp-main.382)

- Event Extraction

[Retrieval-Augmented Generative Question Answering for Event Argument Extraction](https://doi.org/10.18653/v1/2022.emnlp-main.307)

- Summarization

[Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training](https://doi.org/10.18653/v1/2022.findings-naacl.92)

[Unlimiformer: Long-Range Transformers with Unlimited Length Input](https://doi.org/10.48550/arXiv.2305.01625)

[Retrieval-based Full-length Wikipedia Generation for Emergent Events](https://arxiv.org/abs/2402.18264v1)

[RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation](https://arxiv.org/abs/2312.10466)

[M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions](https://arxiv.org/pdf/2405.16420)

### RAG for Code
- Code Generation

[Retrieval-Based Neural Code Generation](https://doi.org/10.18653/v1/d18-1111)

[Retrieval Augmented Code Generation and Summarization](https://doi.org/10.18653/v1/2021.findings-emnlp.232)

[When Language Model Meets Private Library](https://doi.org/10.18653/v1/2022.findings-emnlp.21)

[Language Models of Code are Few-Shot Commonsense Learners](https://doi.org/10.18653/v1/2022.emnlp-main.90)

[DocPrompting: Generating Code by Retrieving the Docs](https://openreview.net/pdf?id=ZTCxT2t2Ru)

[CodeT5+: Open Code Large Language Models for Code Understanding and Generation](https://aclanthology.org/2023.emnlp-main.68)

[AceCoder: Utilizing Existing Code to Enhance Code Generation](https://arxiv.org/abs/2303.17780)

[Syntax-Aware Retrieval Augmented Code Generation](https://aclanthology.org/2023.findings-emnlp.90)

[A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware](https://arxiv.org/abs/2312.05772)

[SkCoder: A Sketch-based Approach for Automatic Code Generation](https://ieeexplore.ieee.org/abstract/document/10172719)

[CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation](https://ieeexplore.ieee.org/abstract/document/10298327)

[ToolCoder: Teach Code Generation Models to use API search tools](https://arxiv.org/abs/2305.04032)

[CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges](https://arxiv.org/abs/2401.07339)

[RRGcode: Deep hierarchical search-based code generation](https://www.sciencedirect.com/science/article/pii/S0164121224000256)

[Code Search Is All You Need? Improving Code Suggestions with Code Search](https://www.computer.org/csdl/proceedings-article/icse/2024/021700a857/1V5BkjI3196)

[ARKS: Active Retrieval in Knowledge Soup for Code Generation](https://arxiv.org/abs/2402.12317)

- Code Summary

[Retrieval-based neural source code summarization](https://doi.org/10.1145/3377811.3380383)

[Retrieve and Refine: Exemplar-based Neural Comment Generation](https://doi.org/10.1145/3324884.3416578)

[EditSum: A Retrieve-and-Edit Framework for Source Code Summarization](https://doi.org/10.1109/ASE51524.2021.9678724)

[Retrieval-Augmented Generation for Code Summarization via Hybrid GNN](https://openreview.net/forum?id=zv-typ1gPxA)

[Context-aware Retrieval-based Deep Commit Message Generation](https://dl.acm.org/doi/abs/10.1145/3464689)

[RACE: Retrieval-augmented Commit Message Generation](https://doi.org/10.18653/v1/2022.emnlp-main.372)

[BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT](https://doi.org/10.1109/ICSME55016.2022.00016)

[Retrieval-Based Transformer Pseudocode Generation](https://www.mdpi.com/2227-7390/10/4/604)

[A Simple Retrieval-based Method for Code Comment Generation](https://ieeexplore.ieee.org/abstract/document/9825803)

[READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization](https://ieeexplore.ieee.org/abstract/document/10113620)

[Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization](https://arxiv.org/abs/2305.11074)

[Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)](https://arxiv.org/abs/2304.06815)

[Cross-Modal Retrieval-Enhanced Code Summarization based on Joint Learning for Retrieval and Generation](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4724884)

[Automatic Smart Contract Comment Generation via Large Language Models and In-Context Learning](https://www.sciencedirect.com/science/article/pii/S0950584924000107)

[UniLog: Automatic Logging via LLM and In-Context Learning](https://dl.acm.org/doi/abs/10.1145/3597503.3623326)

- Code Completion

[A Retrieve-and-Edit Framework for Predicting Structured Outputs](https://proceedings.neurips.cc/paper_files/paper/2018/hash/cd17d3ce3b64f227987cd92cd701cc58-Abstract.html)

[Generating Code with the Help of Retrieved Template Functions and Stack Overflow Answers](https://arxiv.org/abs/2104.05310)

[ReACC: A Retrieval-Augmented Code Completion Framework](https://doi.org/10.18653/v1/2022.acl-long.431)

[Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases](https://ieeexplore.ieee.org/abstract/document/10298575)

[RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation](https://aclanthology.org/2023.emnlp-main.151)

[CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context](https://doi.org/10.48550/arXiv.2212.10007)

[RepoFusion: Training Code Models to Understand Your Repository](https://arxiv.org/abs/2306.10998)

[Revisiting and Improving Retrieval-Augmented Deep Assertion Generation](https://ieeexplore.ieee.org/abstract/document/10298588)

[De-Hallucinator: Iterative Grounding for LLM-Based Code Completion](https://arxiv.org/abs/2401.01701)

[REPOFUSE: Repository-Level Code Completion with Fused Dual Context](https://arxiv.org/abs/2402.14323)

- Automatic Program Repair

[Repair Is Nearly Generation: Multilingual Program Repair with LLMs](https://doi.org/10.1609/aaai.v37i4.25642)

[Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning](https://doi.org/10.1109/ICSE48619.2023.00205)

[InferFix: End-to-End Program Repair with LLMs](https://doi.org/10.1145/3611643.3613892)

[RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair](https://dl.acm.org/doi/abs/10.1145/3611643.3616256)

[Automated Code Editing with Search-Generate-Modify](https://arxiv.org/abs/2306.06490)

[RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models](https://arxiv.org/abs/2311.16543)

- Text-to-SQL and Code-based Semantic Parsing

[XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing](https://doi.org/10.18653/v1/2022.findings-emnlp.384)

[Synchromesh: Reliable Code Generation from Pre-trained Language Models](https://openreview.net/forum?id=KmtVD97J43e)

[Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing](https://aclanthology.org/2022.emnlp-main.624/)

[RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL](https://ojs.aaai.org/index.php/AAAI/article/view/26535)

[Leveraging Code to Improve In-context Learning for Semantic Parsing](https://arxiv.org/abs/2311.09519)

[ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation](https://aclanthology.org/2023.findings-emnlp.48/)

[Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies](https://aclanthology.org/2023.findings-emnlp.996/)

[Selective Demonstrations for Cross-domain Text-to-SQL](https://aclanthology.org/2023.findings-emnlp.944/)

[Multi-Hop Table Retrieval for Open-Domain Text-to-SQL](https://arxiv.org/abs/2402.10666)

[CodeS: Towards Building Open-source Language Models for Text-to-SQL](https://arxiv.org/abs/2402.16347)

- Others

[De-fine: Decomposing and Refining Visual Programs with Auto-Feedback](https://arxiv.org/abs/2311.12890)

[Leveraging training data in few-shot prompting for numerical reasoning](https://arxiv.org/abs/2305.18170)

[Retrieval-Augmented Code Generation for Universal Information Extraction](https://arxiv.org/abs/2311.02962)

[E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification](https://arxiv.org/abs/2312.08477)

[Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant](https://arxiv.org/abs/2311.18450)

[Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model](https://arxiv.org/abs/2310.15657)

### RAG for Audio
- Audio Generation

[Retrieval-Augmented Text-to-Audio Generation](https://doi.org/10.48550/arXiv.2309.08051)

[Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://doi.org/10.1109/ICASSP49357.2023.10095969)

[Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models](https://proceedings.mlr.press/v202/huang23i.html)

- Audio Captioning

[RECAP: Retrieval-Augmented Audio Captioning](https://doi.org/10.48550/arXiv.2309.09836)

[Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval](https://arxiv.org/abs/2012.07331)

[Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://doi.org/10.1109/ICASSP49357.2023.10095969)

[CNN architectures for large-scale audio classification](https://doi.org/10.1109/ICASSP.2017.7952132)

[Natural language supervision for general-purpose audio representations](https://ieeexplore.ieee.org/abstract/document/10448504)

[Weakly-supervised Automated Audio Captioning via text only training](https://arxiv.org/abs/2309.12242)

[Training Audio Captioning Models without Audio](https://ieeexplore.ieee.org/abstract/document/10448115)

### RAG for Image
- Image Generation

[Retrievegan: Image synthesis via differentiable patch retrieval](https://arxiv.org/abs/2007.08513)

[Instance-conditioned gan](https://arxiv.org/abs/2109.05070)

[Memory-driven text-to-image generation](https://arxiv.org/abs/2208.07022)

[Re-imagen: Retrieval-augmented text-to-image generator](https://arxiv.org/abs/2209.14491)

[KNN-Diffusion: Image Generation via Large-Scale Retrieval](https://arxiv.org/abs/2204.02849)

[Retrieval-Augmented Diffusion Models](https://arxiv.org/abs/2204.11824)

[Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models](https://arxiv.org/abs/2207.13038)

[X&Fuse: Fusing Visual Information in Text-to-Image Generation](https://arxiv.org/abs/2303.01000)

[Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs](https://arxiv.org/abs/2401.11708)

- Image Captioning

[Memory-augmented image captioning](https://ojs.aaai.org/index.php/AAAI/article/view/16220)

[Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning](https://www.sciencedirect.com/science/article/pii/S0950705120308595)

[Retrieval-Augmented Transformer for Image Captioning](https://arxiv.org/abs/2207.13162)

[Retrieval-augmented image captioning](https://arxiv.org/abs/2302.08268)

[Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory](https://arxiv.org/abs/2212.05221)

[SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation](https://arxiv.org/abs/2209.15323)

[Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning](https://www.mdpi.com/2072-4292/16/1/196)

- Others

[An empirical study of gpt-3 for few-shot knowledge-based vqa](https://ojs.aaai.org/index.php/AAAI/article/view/20215)

[Retrieval augmented visual question answering with outside knowledge](https://aclanthology.org/2022.emnlp-main.772/)

[Augmenting transformers with KNN-based composite memory for dialog](https://doi.org/10.1162/tacl_a_00356)

[Maria: A visual experience powered conversational agent](https://aclanthology.org/2021.acl-long.435/)

[Neural machine translation with phrase-level universal visual representations](https://aclanthology.org/2022.acl-long.390/)

### RAG for Video
- Video Captioning

[Incorporating Background Knowledge into Video Description Generation](https://aclanthology.org/D18-1433/)

[Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning](https://doi.org/10.1145/3539225)

[Concept-Aware Video Captioning: Describing Videos With Effective Prior Information](https://doi.org/10.1109/TIP.2023.3307969)

[Retrieval-Augmented Egocentric Video Captioning](https://arxiv.org/abs/2401.00789)

- Video QA&Dialogue

[Memory augmented deep recurrent neural network for video question answering](https://doi.org/10.1109/TNNLS.2019.2938015)

[Retrieving-to-answer: Zero-shot video question answering with frozen large language models](https://openaccess.thecvf.com/content/ICCV2023W/MMFM/html/Pan_Retrieving-to-Answer_Zero-Shot_Video_Question_Answering_with_Frozen_Large_Language_Models_ICCVW_2023_paper.html)

[Tvqa+: Spatio-temporal grounding for video question answering](https://aclanthology.org/2020.acl-main.730/)

[Vgnmn: Video-grounded neural module networks for video-grounded dialogue systems](https://aclanthology.org/2022.naacl-main.247/)

- Others

[Language models with image descriptors are strong few-shot video-language learners](https://proceedings.neurips.cc/paper_files/paper/2022/hash/381ceeae4a1feb1abc59c773f7e61839-Abstract-Conference.html)

[RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model](https://arxiv.org/abs/2402.10828)

[Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation](https://doi.org/10.48550/arXiv.2307.06940)

[Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval](https://doi.org/10.1109/ICCV48922.2021.00175)

### RAG for 3D
- Text-to-3D

[ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model](https://doi.org/10.1109/ICCV51070.2023.00040)

[AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion](https://arxiv.org/abs/2312.12763)

[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)

### RAG for Knowledge
- Knowledge Base Question Answering

[ReTraCk: A Flexible and Efficient Framework for Knowledge Base Question Answering](https://doi.org/10.18653/v1/2021.acl-demo.39)

[Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation](https://aclanthology.org/2021.findings-emnlp.50/)

[Case-based Reasoning for Natural Language Queries over Knowledge Bases](https://doi.org/10.18653/v1/2021.emnlp-main.755)

[Logical Form Generation via Multi-task Learning for Complex Question Answering over Knowledge Bases](https://aclanthology.org/2022.coling-1.145)

[Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database](https://aclanthology.org/2022.emnlp-main.605/)

[RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering](https://aclanthology.org/2022.acl-long.417/)

[TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Base](https://aclanthology.org/2022.emnlp-main.555/)

[DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases](https://openreview.net/forum?id=XHc5zRPxqV9)

[End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion](https://aclanthology.org/2023.eacl-main.255/)

[Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA](https://dl.acm.org/doi/abs/10.1145/3583780.3615150)

[Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering](https://arxiv.org/abs/2308.13259)

[Few-shot Transfer Learning for Knowledge Base Question Answering: Fusing Supervised Models with In-Context Learning](https://arxiv.org/abs/2311.08894)

[FC-KBQA: A Fine-to-Coarse Composition Framework for Knowledge Base Question Answering](https://aclanthology.org/2023.acl-long.57/)

[Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering](https://aclanthology.org/2023.nlrse-1.7/)

[Knowledge Graph-augmented Language Models for Complex Question Answering](https://aclanthology.org/2023.nlrse-1.1/)

[Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering](https://arxiv.org/abs/2309.11206)

[Distribution Shifts Are Bottlenecks: Extensive Evaluation for Grounding Language Models to Knowledge Bases](https://aclanthology.org/2024.eacl-srw.7/)

[Probing Structured Semantics Understanding and Generation of Language Models via Question Answering](https://arxiv.org/abs/2401.05777)

[Keqing: Knowledge-based Question Answering is A Nature Chain-of-Thought mentor of LLMs](https://arxiv.org/abs/2401.00426)

[Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models](https://arxiv.org/abs/2402.15131)

- Knowledge-augmented Open-domain Question Answering

[UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering](https://aclanthology.org/2022.findings-naacl.115/)

[KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering](https://aclanthology.org/2022.acl-long.340/)

[Empowering Language Models with Knowledge Graph Reasoning for Open-Domain Question Answering](https://aclanthology.org/2022.emnlp-main.650/)

[Grape: Knowledge Graph Enhanced Passage Reader for Open-domain Question Answering](https://aclanthology.org/2022.findings-emnlp.13/)

[Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation](https://dl.acm.org/doi/abs/10.1145/3581783.3611964)

[DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text](https://arxiv.org/abs/2310.20170)

[KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases](https://arxiv.org/abs/2308.11761)

[Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering](https://arxiv.org/abs/2403.02966)

[Two-stage Generative Question Answering on Temporal Knowledge Graph Using Large Language Models](https://arxiv.org/abs/2402.16568)

[KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph](https://arxiv.org/abs/2312.15880)

[GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning](https://arxiv.org/pdf/2405.20139)

- Table Question Answering

[NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned](https://proceedings.mlr.press/v133/min21a.html)

[Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering](https://aclanthology.org/2021.acl-long.315/)

[End-to-End Table Question Answering via Retrieval-Augmented Generation](https://arxiv.org/abs/2203.16714)

[OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering](https://aclanthology.org/2022.naacl-main.68/)

[Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering](https://www.ijcai.org/proceedings/2022/0629.pdf)

[Conversational Question Answering on Heterogeneous Sources](https://dl.acm.org/doi/abs/10.1145/3477495.3531815)

[Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge](https://aclanthology.org/2022.findings-emnlp.392/)

[StructGPT: A General Framework for Large Language Model to Reason over Structured Data](https://aclanthology.org/2023.emnlp-main.574/)

[cTBLS: Augmenting Large Language Models with Conversational Tables](https://aclanthology.org/2023.nlp4convai-1.6/)

[RINK: Reader-Inherited Evidence Reranker for Table-and-Text Open Domain Question Answering](https://ojs.aaai.org/index.php/AAAI/article/view/26577)

[Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables](https://aclanthology.org/2023.findings-ijcnlp.1/)

[Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data](https://arxiv.org/abs/2402.12869)

[ERATTA: Extreme RAG for Table To Answers with Large Language Models](https://arxiv.org/pdf/2405.03963)

- Others

[Improving Knowledge-Aware Dialogue Response Generation by Using Human-Written Prototype Dialogues](https://aclanthology.org/2020.findings-emnlp.126/)

[Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation](https://arxiv.org/abs/2305.18846)

[RHO: Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding](https://aclanthology.org/2023.findings-acl.275/)

[Retrieval-Enhanced Generative Model for Large-Scale Knowledge Graph Completion](https://doi.org/10.1145/3539618.3592052)

[Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion](https://arxiv.org/abs/2311.06318)

[G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering](https://arxiv.org/abs/2402.07630)

[RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models](https://arxiv.org/pdf/2405.00449)

[HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models](https://arxiv.org/pdf/2405.14831)

### RAG for Science
- Drug Discovery

[Retrieval-based controllable molecule generation](https://arxiv.org/abs/2208.11126)

[Prompt-based 3d molecular diffusion models for structure-based drug design](https://openreview.net/forum?id=FWsGuAFn3n)

- Biomedical Informatics Enhancement

[PoET: A generative model of protein families as sequences-of-sequences](https://proceedings.neurips.cc/paper_files/paper/2023/hash/f4366126eba252699b280e8f93c0ab2f-Abstract-Conference.html)

[Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making](https://dl.acm.org/doi/abs/10.1145/3584371.3612956)

[BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature](https://aclanthology.org/2022.emnlp-main.390/)

[Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation](https://arxiv.org/abs/2106.06471)

[From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process](https://arxiv.org/abs/2402.01717)

[RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts](https://arxiv.org/pdf/2405.13179)

- Math Applications

[Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference](https://arxiv.org/abs/2310.03184)

[LeanDojo: Theorem Proving with Retrieval-Augmented Language Models](https://proceedings.neurips.cc/paper_files/paper/2023/hash/4441469427094f8873d0fecb0c4e1cee-Abstract-Datasets_and_Benchmarks.html)

## Benchmark
[Benchmarking Large Language Models in Retrieval-Augmented Generation](https://doi.org/10.48550/arXiv.2309.01431)

[CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models](https://doi.org/10.48550/arXiv.2401.17043)

[ARES: An Automated Evaluation Framework for Retrieval-AugmentedGeneration Systems](https://doi.org/10.48550/arXiv.2311.09476)

[RAGAS: Automated Evaluation of Retrieval Augmented Generation](https://doi.org/10.48550/arXiv.2309.15217)

[KILT: a Benchmark for Knowledge Intensive Language Tasks](https://arxiv.org/abs/2009.02252)

## Citation
if you find this work useful, please cite our paper:
```
@article{zhao2024retrieval,
title={Retrieval-Augmented Generation for AI-Generated Content: A Survey},
author={Zhao, Penghao and Zhang, Hailin and Yu, Qinhan and Wang, Zhengren and Geng, Yunteng and Fu, Fangcheng and Yang, Ling and Zhang, Wentao and Cui, Bin},
journal={arXiv preprint arXiv:2402.19473},
year={2024}
}
```