Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hymie122/RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
https://github.com/hymie122/RAG-Survey
aigc diffusion-models llm multimodality rag survey
Last synced: 2 months ago
JSON representation
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
- Host: GitHub
- URL: https://github.com/hymie122/RAG-Survey
- Owner: hymie122
- Created: 2024-02-23T09:47:08.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-08-20T11:52:42.000Z (5 months ago)
- Last Synced: 2024-08-20T13:53:56.908Z (5 months ago)
- Topics: aigc, diffusion-models, llm, multimodality, rag, survey
- Homepage:
- Size: 6.49 MB
- Stars: 1,016
- Watchers: 23
- Forks: 78
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - hymie122/RAG-Survey - Baesd、基于模型)、迭代 RAG。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
- awesome-llm-and-aigc - hymie122/RAG-Survey - Survey?style=social"/> : Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey". (**[arXiv 2024](https://arxiv.org/abs/2402.19473)**). " 微信公众号「数智笔记」《[2024检索增强生成RAG最新综述](https://mp.weixin.qq.com/s/F-shRy1m7wQIS87ujOS7Dw)》"。 (Summary)
- awesome-llm-and-aigc - hymie122/RAG-Survey - Survey?style=social"/> : Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey". (**[arXiv 2024](https://arxiv.org/abs/2402.19473)**). " 微信公众号「数智笔记」《[2024检索增强生成RAG最新综述](https://mp.weixin.qq.com/s/F-shRy1m7wQIS87ujOS7Dw)》"。 (Summary)
README
# Retrieval-Augmented Generation for AI-Generated Content: A Survey
This repo is constructed for collecting and categorizing papers about RAG according to our survey paper: [*Retrieval-Augmented Generation for AI-Generated Content: A Survey*](https://arxiv.org/abs/2402.19473). Considering the rapid growth of this field, we will continue to update both [paper](https://arxiv.org/abs/2402.19473) and this repo.# Overview
# Catalogue
## Methods Taxonomy
### RAG Foundations
- Query-based RAG
[REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909)
[Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511)[REPLUG: Retrieval-Augmented Black-Box Language Models](https://arxiv.org/abs/2301.12652)
[In-Context Retrieval-Augmented Language Models](https://arxiv.org/abs/2302.00083)
[When Language Model Meets Private Library](https://arxiv.org/abs/2210.17236)
[DocPrompting: Generating Code by Retrieving the Docs](https://openreview.net/pdf?id=ZTCxT2t2Ru)[Retrieval-based prompt selection for code-related few-shot learning](https://doi.org/10.1109/ICSE48619.2023.00205)
[Inferfix: End-to-end program repair with llms](https://doi.org/10.1145/3611643.3613892)
[Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models](https://proceedings.mlr.press/v202/huang23i.html)
[Reacc: A retrieval-augmented code completion framework](https://doi.org/10.18653/v1/2022.acl-long.431)
[Uni-parser: Unified semantic parser for question answering on knowledge base and database](https://doi.org/10.18653/v1/2022.emnlp-main.605)
[RNG-KBQA: generation augmented iterative ranking for knowledge base question answering](https://doi.org/10.18653/v1/2022.acl-long.417)
[End-to-end casebased reasoning for commonsense knowledge base completion](https://doi.org/10.18653/v1/2023.eacl-main.255)
[Combining transfer learning with in-context learning using blackbox llms for zero-shot knowledge base question answering](https://doi.org/10.48550/arXiv.2311.08894)
[Genegpt: Augmenting large language models with domain tools for improved access to biomedical information](https://arxiv.org/abs/2304.09667)
[Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making](https://dl.acm.org/doi/10.1145/3584371.3612956)
[Retrievegan:Image synthesis via differentiable patch retrieval](https://link.springer.com/chapter/10.1007/978-3-030-58598-3_15)
[Instance-conditioned gan](https://proceedings.neurips.cc/paper/2021/file/e7ac288b0f2d41445904d071ba37aaff-Paper.pdf)[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)
- Latent Representation-based RAG[Leveraging passage retrieval with generative models for open domain question answering](https://doi.org/10.18653/v1/2021.eacl-main.74)
[Bashexplainer: Retrieval-augmented bash code comment generation based on finetuned codebert](https://doi.org/10.1109/ICSME55016.2022.00016)
[EditSum: A Retrieve-and-Edit Framework for Source Code Summarization](https://doi.org/10.1109/ASE51524.2021.9678724)
[Retrieve and Refine: Exemplar-based Neural Comment Generation](https://arxiv.org/abs/2010.04459)
[RACE: retrieval-augmented commit message generation](https://doi.org/10.18653/v1/2022.emnlp-main.372)[Unik-qa: Unified representations of structured and unstructured knowledge for open-domain question answering](https://doi.org/10.18653/v1/2022.findings-naacl.115)
[A Retrieve-and-Edit Framework for Predicting Structured Outputs](https://proceedings.neurips.cc/paper/2018/hash/cd17d3ce3b64f227987cd92cd701cc58-Abstract.html)
[DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases](https://openreview.net/pdf?id=XHc5zRPxqV9)
[Bridging the kb-text gap: Leveraging structured knowledge-aware pre-training for KBQA](https://doi.org/10.1145/3583780.3615150)
[Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering](https://doi.org/10.48550/arXiv.2308.13259)
[Retrieval-enhanced generative model for large-scale knowledge graph completion](https://doi.org/10.1145/3539618.3592052)
[Case-based reasoning for natural language queries over knowledge bases](https://doi.org/10.18653/v1/2021)
[A Protein-Ligand Interaction-focused 3D Molecular Generative Framework for Generalizable Structure-based Drug Design](https://chemrxiv.org/engage/chemrxiv/article-details/6482d9dbbe16ad5c57af1937)
[Improving language models by retrieving from trillions of tokens](https://proceedings.mlr.press/v162/borgeaud22a.html)
[Remodiffuse: Retrieval-augmented motion diffusion model](https://doi.org/10.1109/ICCV51070.2023.00040)
[Memorizing transformers](https://openreview.net/forum?id=TrjbxzRcnf-)
[Audio captioning using pre-trained large-scale language model guided by audio-based similar caption retrieval](https://arxiv.org/abs/2012.07331)
[Retrieval augmented convolutional encoder-decoder networks for video captioning](https://doi.org/10.1145/3539225)
[Retrieval-augmented egocentric video captioning](https://doi.org/10.48550/arXiv.2401.00789)
[Re-imagen: Retrievalaugmented text-to-image generator](https://arxiv.org/abs/2209.14491)
[Knn-diffusion: Image generation via large-scale retrieval](https://arxiv.org/abs/2204.02849)
[Retrieval-augmented diffusion models](https://proceedings.neurips.cc/paper_files/paper/2022/file/62868cc2fc1eb5cdf321d05b4b88510c-Paper-Conference.pdf)
[Text-guided synthesis of artistic images with retrieval-augmented diffusion models](https://arxiv.org/abs/2207.13038)
[Memory-driven text-to-image generation](https://arxiv.org/abs/2208.07022)
[Mention memory: incorporating textual knowledge into transformers through entity mention attention](https://arxiv.org/abs/2110.06176)
[Unlimiformer:Long-range transformers with unlimited length input](https://doi.org/10.48550/arXiv.2305.01625)
[Entities as experts: Sparse memory access with entity supervision](https://arxiv.org/abs/2004.07202)
[Amd: Anatomical motion diffusion with interpretable motion decomposition and fusion](https://arxiv.org/abs/2312.12763)
[Retrieval-augmented text-to-audio generation](https://doi.org/10.48550/arXiv.2309.08051)
[Concept-aware video captioning: Describing videos with effective prior information](https://doi.org/10.1109/TIP.2023.3307969)
- Logit-based RAG
[Generalization through memorization: Nearest neighbor language models](https://openreview.net/forum?id=HklBjCEKvH)
[Syntax-Aware Retrieval Augmented Code Generation](https://aclanthology.org/2023.findings-emnlp.90)
[Memory-augmented image captioning](https://aaai.org/papers/01317-memory-augmented-image-captioning/)
[Retrieval-based neural source code summarization](https://doi.org/10.1145/3377811.3380383)
[Efficient nearest neighbor language models](https://doi.org/10.18653/v1/2021.emnlp-main.461)
[Nonparametric masked language modeling](https://doi.org/10.18653/v1/2023.findings-acl.132)
[Editsum:A retrieve-and-edit framework for source code summarization](https://doi.org/10.1109/ASE51524.2021.9678724)
- Speculative RAG[REST: Retrieval-Based Speculative Decoding](https://doi.org/10.48550/arXiv.2311.08252)
[GPTCache](https://github.com/zilliztech/GPTCache)[COPY IS ALL YOU NEED](https://arxiv.org/abs/2307.06962)
[RETRIEVAL IS ACCURATE GENERATION](https://arxiv.org/abs/2402.17532)
### RAG Enhancements
- Input Enhancement
- Query Transformations
[Query2doc: Query Expansion with Large Language Models](https://aclanthology.org/2023.emnlp-main.585)[Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models](https://openreview.net/forum?id=vDvFT7IX4O)
[Precise Zero-Shot Dense Retrieval without Relevance Labels](https://doi.org/10.18653/v1/2023.acl-long.99)
[RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation](https://arxiv.org/pdf/2404.00610)
[Dynamic Contexts for Generating Suggestion Questions in RAG Based Conversational Systems](https://arxiv.org/pdf/2403.11413)
- Data Augmentation
[LESS: selecting influential data for targeted instruction tuning](https://arxiv.org/abs/2402.04333)[Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models](https://proceedings.mlr.press/v202/huang23i.html)
[Telco-RAG: Navigating the challenges of retrieval-augmented language models for telecommunications](https://arxiv.org/pdf/2404.15939)
- Retriever Enhancement
- Recursive Retrieve[Query Expansion by Prompting Large Language Models](https://doi.org/10.48550/arXiv.2305.03653)
[Rat: Retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation](https://arxiv.org/abs/2403.05313)
[React: Synergizing reasoning and acting in language models](https://arxiv.org/abs/2210.03629)
[Chain-of-thought prompting elicits reasoning in large language models](https://arxiv.org/abs/2201.11903)
[Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search](https://aclanthology.org/2023.findings-emnlp.86)
[ACTIVERAG: Revealing the Treasures of Knowledge via Active Learning](https://arxiv.org/abs/2402.13547)[Retrieval-Augmented Thought Process as Sequential Decision Making](https://arxiv.org/abs/2402.07812)
[In search of needles in a 10m haystack: Recurrent memory finds what llms miss](https://arxiv.org/abs/2402.10790v1)
[Lost in the middle: How language models use long contexts](https://arxiv.org/abs/2307.03172)
- Chunk Optimization[LlamaIndex](https://github.com/jerryjliu/llama_index)
[RAPTOR: RECURSIVE ABSTRACTIVE PROCESSING FOR TREE-ORGANIZED RETRIEVAL](https://arxiv.org/pdf/2401.18059.pdf)
[Prompt-RAG: Pioneering Vector Embedding-Free Retrieval-Augmented Generation in Niche Domains, Exemplified by Korean Medicine](https://arxiv.org/pdf/2401.11246)
[Question-Based Retrieval using Atomic Units for Enterprise RAG](https://arxiv.org/pdf/2405.12363)
- Finetune Retriever[C-Pack: Packaged Resources To Advance General Chinese Embedding](https://arxiv.org/abs/2309.07597)
[BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation](https://arxiv.org/abs/2402.03216)
[LM-Cocktail: Resilient Tuning of Language Models via Model Merging](https://arxiv.org/abs/2311.13534)
[Retrieve Anything To Augment Large Language Models](https://arxiv.org/abs/2310.07554)
[Replug: Retrieval-augmented black-box language models](https://arxiv.org/abs/2301.12652)
[When Language Model Meets Private Library](https://doi.org/10.18653/v1/2022.findings-emnlp.21)
[EditSum: A Retrieve-and-Edit Framework for Source Code Summarization](https://doi.org/10.1109/ASE51524.2021.9678724)
[Synchromesh: Reliable Code Generation from Pre-trained Language Models](https://openreview.net/forum?id=KmtVD97J43e)
[Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning](https://doi.org/10.1145/3539225)
[Reinforcement learning for optimizing RAG for domain chatbots](https://arxiv.org/abs/2401.06800)
- Hybrid Retrieve[RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair](https://doi.org/10.1145/3611643.3616256)
[ReACC: A Retrieval-Augmented Code Completion Framework](https://doi.org/10.18653/v1/2022.acl-long.431)
[Retrieval-based neural source code summarization](https://doi.org/10.1145/3377811.3380383)
[BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT](https://doi.org/10.1109/ICSME55016.2022.00016)
[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)
[Corrective Retrieval Augmented Generation](https://arxiv.org/abs/2401.15884)
[Retrieval augmented generation with rich answer encoding](https://aclanthology.org/2023.ijcnlp-main.65.pdf)
[Unims-rag: A unified multi-source retrieval-augmented generation for personalized dialogue systems](https://arxiv.org/abs/2401.13256)
[You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval](https://arxiv.org/pdf/2403.07222v1)
[Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers](https://arxiv.org/pdf/2404.07220)
- Re-ranking[Re2G: Retrieve, Rerank, Generate](https://doi.org/10.18653/v1/2022.naacl-main.194)
[Passage Re-ranking with BERT](http://arxiv.org/abs/1901.04085)
[AceCoder: Utilizing Existing Code to Enhance Code Generation](https://arxiv.org/abs/2303.17780)
[XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing](https://doi.org/10.18653/v1/2022.findings-emnlp.384)
[A Fine-tuning Enhanced RAG System with Quantized Influence Measure as AI Judge](https://arxiv.org/abs/2402.17081v1)
[UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers](https://arxiv.org/pdf/2303.00807.pdf)
[Learning to Retrieve In-Context Examples for Large Language Models](https://arxiv.org/pdf/2307.07164.pdf)
[The Chronicles of RAG: The Retriever, the Chunk and the Generator](https://arxiv.org/pdf/2401.07883.pdf)
[Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases](https://arxiv.org/pdf/2403.10446)
- Retrieval Transformation
[Learning to filter context for retrieval-augmented generation](https://arxiv.org/abs/2311.08377)
[Fid-light: Efficient and effective retrieval-augmented text generation](https://arxiv.org/abs/2209.14290)
[Gar-meets-rag paradigm for zero-shot information retrieval](https://arxiv.org/abs/2310.20158)
- Others[PineCone](https://www.pinecone.io)
[Generate rather than retrieve: Large language models are strong context generators](https://arxiv.org/abs/2209.10063)
[Generator-retriever-generator: A novel approach to open-domain question answering](https://arxiv.org/abs/2307.11278)[Multi-Head RAG: Solving Multi-Aspect Problems with LLMs](https://arxiv.org/pdf/2406.05085)
- Generator Enhancement- Prompt Engineering
[Prompt Engineering Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)[Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://doi.org/10.48550/arXiv.2310.06117)
[Active Prompting with Chain-of-Thought for Large Language Models](https://doi.org/10.48550/arXiv.2302.12246)
[Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](http://papers.nips.cc/paper\_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html)
[LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://aclanthology.org/2023.emnlp-main.825)
[Lost in the Middle: How Language Models Use Long Contexts](https://doi.org/10.48550/arXiv.2307.03172)
[ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model](https://doi.org/10.1109/ICCV51070.2023.00040)
[Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)](https://arxiv.org/abs/2304.06815)
[Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning](https://doi.org/10.1109/ICSE48619.2023.00205)
[XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing](https://doi.org/10.18653/v1/2022.findings-emnlp.384)
[Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models](https://proceedings.mlr.press/v202/huang23i.html)
- Decoding Tuning[InferFix: End-to-End Program Repair with LLMs](https://doi.org/10.1145/3611643.3613892)
[Synchromesh: Reliable Code Generation from Pre-trained Language Models](https://openreview.net/forum?id=KmtVD97J43e)
- Finetune Generator
[Improving Language Models by Retrieving from Trillions of Tokens](https://proceedings.mlr.press/v162/borgeaud22a.html)[When Language Model Meets Private Library](https://doi.org/10.18653/v1/2022.findings-emnlp.21)
[CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis](https://arxiv.org/abs/2203.13474)[Concept-Aware Video Captioning: Describing Videos With Effective Prior Information](https://doi.org/10.1109/TIP.2023.3307969)
[Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation](https://doi.org/10.48550/arXiv.2307.06940)
[Lora: Low-rank adaptation of large language models](https://arxiv.org/abs/2106.09685)[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)
- Result Enhancement- Rewrite Output
[Automated Code Editing with Search-Generate-Modify](https://doi.org/10.48550/arXiv.2306.06490)[Repair Is Nearly Generation: Multilingual Program Repair with LLMs](https://doi.org/10.1609/aaai.v37i4.25642)
[Case-based Reasoning for Natural Language Queries over Knowledge Bases](https://doi.org/10.18653/v1/2021.emnlp-main.755)
- RAG Pipeline Enhancement
- Adaptive Retrieval
- Rule-Baesd
[Active retrieval augmented generation](https://arxiv.org/abs/2305.06983)[Efficient Nearest Neighbor Language Models](https://doi.org/10.18653/v1/2021.emnlp-main.461)
[Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172)
[Nonparametric masked language modeling](https://arxiv.org/abs/2212.01349)
[When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories](https://doi.org/10.18653/v1/2023.acl-long.546)
[How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering](https://doi.org/10.1162/tacl\_a\_00407)
[Large Language Models Struggle to Learn Long-Tail Knowledge](https://proceedings.mlr.press/v202/kandpal23a.html)
- Model-Based
[Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://doi.org/10.48550/arXiv.2310.11511)[Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation](https://doi.org/10.48550/arXiv.2307.11019)
[Self-Knowledge Guided Retrieval Augmentation for Large Language Models](https://aclanthology.org/2023.findings-emnlp.691)
[Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models](https://arxiv.org/abs/2402.10612)[Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity](https://arxiv.org/abs/2403.14403)
- Iterative RAG
[RepoCoder: Repository-Level Through Iterative Retrieval and Generation](https://aclanthology.org/2023.emnlp-main.151)[Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy](https://aclanthology.org/2023.findings-emnlp.620)
[Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training](https://arxiv.org/abs/2010.12688)## Applications Taxonomy
### RAG for Text
- Question Answering[Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering](https://doi.org/10.18653/v1/2021.eacl-main.74)
[REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909)
[Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training](https://doi.org/10.18653/v1/2021.naacl-main.278)
[Atlas: Few-shot Learning with Retrieval Augmented Language Models](http://jmlr.org/papers/v24/23-0037.html)
[Improving Language Models by Retrieving from Trillions of Tokens](https://proceedings.mlr.press/v162/borgeaud22a.html)
[Self-Knowledge Guided Retrieval Augmentation for Large Language Models](https://aclanthology.org/2023.findings-emnlp.691)
[Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering](https://doi.org/10.48550/arXiv.2306.04136)
[Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph](https://doi.org/10.48550/arXiv.2307.07697)
[Nonparametric Masked Language Modeling](https://doi.org/10.18653/v1/2023.findings-acl.132)
[CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering](https://doi.org/10.18653/v1/2022.findings-naacl.165)
[One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval](https://proceedings.neurips.cc/paper/2021/hash/3df07fdae1ab273a967aaa1d355b8bb6-Abstract.html)
[Entities as Experts: Sparse Memory Access with Entity Supervision](https://arxiv.org/abs/2004.07202)
[When to Read Documents or QA History: On Unified and Selective Open-domain QA](https://doi.org/10.18653/v1/2023.findings-acl.401)
[Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation](https://arxiv.org/abs/2311.04177)
[DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Service](https://arxiv.org/pdf/2309.11325.pdf)
- Fact verification
[CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval](https://aclanthology.org/2022.coling-1.86)
[Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization](https://arxiv.org/pdf/2405.02816)
- Commonsense Reasoning
[KG-BART: Knowledge Graph-Augmented {BART} for Generative Commonsense Reasoning](https://doi.org/10.1609/aaai.v35i7.16796)
[What Evidence Do Language Models Find Convincing?](https://arxiv.org/abs/2402.11782v1)
[Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models](https://arxiv.org/abs/2310.04027)
- Human-Machine Conversation
[Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs](https://doi.org/10.18653/v1/2020.acl-main.184)
[Skeleton-to-Response: Dialogue Generation Guided by Retrieval Memory](https://doi.org/10.18653/v1/n19-1124)
[Internet-Augmented Dialogue Generation](https://doi.org/10.18653/v1/2022.acl-long.579)
[BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage](https://doi.org/10.48550/arXiv.2208.03188)
[A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems](https://doi.org/10.18653/v1/2021.findings-emnlp.33)
[From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL](https://openreview.net/forum?id=KLPLCXo4aD)
[Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages](https://aclanthology.org/2023.findings-acl.528/)
[Citation-Enhanced Generation for LLM-based Chatbot](https://arxiv.org/pdf/2402.16063v1.pdf)
[KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants](https://aclanthology.org/2024.scichat-1.5/)
- Neural Machine Translation
[Neural Machine Translation with Monolingual Translation Memory](https://doi.org/10.18653/v1/2021.acl-long.567)
[Nearest Neighbor Machine Translation](https://openreview.net/forum?id=7wCBOfJ8hJM)
[Training Language Models with Memory Augmentation](https://doi.org/10.18653/v1/2022.emnlp-main.382)
- Event Extraction
[Retrieval-Augmented Generative Question Answering for Event Argument Extraction](https://doi.org/10.18653/v1/2022.emnlp-main.307)
- Summarization
[Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training](https://doi.org/10.18653/v1/2022.findings-naacl.92)
[Unlimiformer: Long-Range Transformers with Unlimited Length Input](https://doi.org/10.48550/arXiv.2305.01625)
[Retrieval-based Full-length Wikipedia Generation for Emergent Events](https://arxiv.org/abs/2402.18264v1)[RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation](https://arxiv.org/abs/2312.10466)
[M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions](https://arxiv.org/pdf/2405.16420)
### RAG for Code
- Code Generation[Retrieval-Based Neural Code Generation](https://doi.org/10.18653/v1/d18-1111)
[Retrieval Augmented Code Generation and Summarization](https://doi.org/10.18653/v1/2021.findings-emnlp.232)
[When Language Model Meets Private Library](https://doi.org/10.18653/v1/2022.findings-emnlp.21)
[Language Models of Code are Few-Shot Commonsense Learners](https://doi.org/10.18653/v1/2022.emnlp-main.90)
[DocPrompting: Generating Code by Retrieving the Docs](https://openreview.net/pdf?id=ZTCxT2t2Ru)
[CodeT5+: Open Code Large Language Models for Code Understanding and Generation](https://aclanthology.org/2023.emnlp-main.68)
[AceCoder: Utilizing Existing Code to Enhance Code Generation](https://arxiv.org/abs/2303.17780)
[Syntax-Aware Retrieval Augmented Code Generation](https://aclanthology.org/2023.findings-emnlp.90)
[A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware](https://arxiv.org/abs/2312.05772)
[SkCoder: A Sketch-based Approach for Automatic Code Generation](https://ieeexplore.ieee.org/abstract/document/10172719)
[CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation](https://ieeexplore.ieee.org/abstract/document/10298327)
[ToolCoder: Teach Code Generation Models to use API search tools](https://arxiv.org/abs/2305.04032)
[CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges](https://arxiv.org/abs/2401.07339)
[RRGcode: Deep hierarchical search-based code generation](https://www.sciencedirect.com/science/article/pii/S0164121224000256)
[Code Search Is All You Need? Improving Code Suggestions with Code Search](https://www.computer.org/csdl/proceedings-article/icse/2024/021700a857/1V5BkjI3196)
[ARKS: Active Retrieval in Knowledge Soup for Code Generation](https://arxiv.org/abs/2402.12317)
- Code Summary
[Retrieval-based neural source code summarization](https://doi.org/10.1145/3377811.3380383)
[Retrieve and Refine: Exemplar-based Neural Comment Generation](https://doi.org/10.1145/3324884.3416578)
[EditSum: A Retrieve-and-Edit Framework for Source Code Summarization](https://doi.org/10.1109/ASE51524.2021.9678724)
[Retrieval-Augmented Generation for Code Summarization via Hybrid GNN](https://openreview.net/forum?id=zv-typ1gPxA)
[Context-aware Retrieval-based Deep Commit Message Generation](https://dl.acm.org/doi/abs/10.1145/3464689)
[RACE: Retrieval-augmented Commit Message Generation](https://doi.org/10.18653/v1/2022.emnlp-main.372)
[BashExplainer: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT](https://doi.org/10.1109/ICSME55016.2022.00016)
[Retrieval-Based Transformer Pseudocode Generation](https://www.mdpi.com/2227-7390/10/4/604)
[A Simple Retrieval-based Method for Code Comment Generation](https://ieeexplore.ieee.org/abstract/document/9825803)
[READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization](https://ieeexplore.ieee.org/abstract/document/10113620)
[Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization](https://arxiv.org/abs/2305.11074)
[Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)](https://arxiv.org/abs/2304.06815)
[Cross-Modal Retrieval-Enhanced Code Summarization based on Joint Learning for Retrieval and Generation](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4724884)
[Automatic Smart Contract Comment Generation via Large Language Models and In-Context Learning](https://www.sciencedirect.com/science/article/pii/S0950584924000107)
[UniLog: Automatic Logging via LLM and In-Context Learning](https://dl.acm.org/doi/abs/10.1145/3597503.3623326)
- Code Completion
[A Retrieve-and-Edit Framework for Predicting Structured Outputs](https://proceedings.neurips.cc/paper_files/paper/2018/hash/cd17d3ce3b64f227987cd92cd701cc58-Abstract.html)
[Generating Code with the Help of Retrieved Template Functions and Stack Overflow Answers](https://arxiv.org/abs/2104.05310)
[ReACC: A Retrieval-Augmented Code Completion Framework](https://doi.org/10.18653/v1/2022.acl-long.431)
[Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases](https://ieeexplore.ieee.org/abstract/document/10298575)
[RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation](https://aclanthology.org/2023.emnlp-main.151)
[CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context](https://doi.org/10.48550/arXiv.2212.10007)
[RepoFusion: Training Code Models to Understand Your Repository](https://arxiv.org/abs/2306.10998)
[Revisiting and Improving Retrieval-Augmented Deep Assertion Generation](https://ieeexplore.ieee.org/abstract/document/10298588)
[De-Hallucinator: Iterative Grounding for LLM-Based Code Completion](https://arxiv.org/abs/2401.01701)
[REPOFUSE: Repository-Level Code Completion with Fused Dual Context](https://arxiv.org/abs/2402.14323)
- Automatic Program Repair
[Repair Is Nearly Generation: Multilingual Program Repair with LLMs](https://doi.org/10.1609/aaai.v37i4.25642)
[Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning](https://doi.org/10.1109/ICSE48619.2023.00205)
[InferFix: End-to-End Program Repair with LLMs](https://doi.org/10.1145/3611643.3613892)
[RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair](https://dl.acm.org/doi/abs/10.1145/3611643.3616256)
[Automated Code Editing with Search-Generate-Modify](https://arxiv.org/abs/2306.06490)
[RTLFixer: Automatically Fixing RTL Syntax Errors with Large Language Models](https://arxiv.org/abs/2311.16543)
- Text-to-SQL and Code-based Semantic Parsing
[XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing](https://doi.org/10.18653/v1/2022.findings-emnlp.384)
[Synchromesh: Reliable Code Generation from Pre-trained Language Models](https://openreview.net/forum?id=KmtVD97J43e)
[Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing](https://aclanthology.org/2022.emnlp-main.624/)
[RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL](https://ojs.aaai.org/index.php/AAAI/article/view/26535)
[Leveraging Code to Improve In-context Learning for Semantic Parsing](https://arxiv.org/abs/2311.09519)
[ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation](https://aclanthology.org/2023.findings-emnlp.48/)
[Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies](https://aclanthology.org/2023.findings-emnlp.996/)
[Selective Demonstrations for Cross-domain Text-to-SQL](https://aclanthology.org/2023.findings-emnlp.944/)
[DBCopilot: Scaling Natural Language Querying to Massive Databases via Schema Routing](https://arxiv.org/abs/2312.03463)
[Multi-Hop Table Retrieval for Open-Domain Text-to-SQL](https://arxiv.org/abs/2402.10666)
[CodeS: Towards Building Open-source Language Models for Text-to-SQL](https://arxiv.org/abs/2402.16347)
- Others
[De-fine: Decomposing and Refining Visual Programs with Auto-Feedback](https://arxiv.org/abs/2311.12890)
[Leveraging training data in few-shot prompting for numerical reasoning](https://arxiv.org/abs/2305.18170)
[Retrieval-Augmented Code Generation for Universal Information Extraction](https://arxiv.org/abs/2311.02962)
[E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification](https://arxiv.org/abs/2312.08477)
[Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant](https://arxiv.org/abs/2311.18450)
[Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model](https://arxiv.org/abs/2310.15657)
### RAG for Audio
- Audio Generation[Retrieval-Augmented Text-to-Audio Generation](https://doi.org/10.48550/arXiv.2309.08051)
[Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://doi.org/10.1109/ICASSP49357.2023.10095969)
[Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models](https://proceedings.mlr.press/v202/huang23i.html)
- Audio Captioning
[RECAP: Retrieval-Augmented Audio Captioning](https://doi.org/10.48550/arXiv.2309.09836)
[Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval](https://arxiv.org/abs/2012.07331)
[Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://doi.org/10.1109/ICASSP49357.2023.10095969)
[CNN architectures for large-scale audio classification](https://doi.org/10.1109/ICASSP.2017.7952132)
[Natural language supervision for general-purpose audio representations](https://ieeexplore.ieee.org/abstract/document/10448504)
[Weakly-supervised Automated Audio Captioning via text only training](https://arxiv.org/abs/2309.12242)
[Training Audio Captioning Models without Audio](https://ieeexplore.ieee.org/abstract/document/10448115)
### RAG for Image
- Image Generation[Retrievegan: Image synthesis via differentiable patch retrieval](https://arxiv.org/abs/2007.08513)
[Instance-conditioned gan](https://arxiv.org/abs/2109.05070)
[Memory-driven text-to-image generation](https://arxiv.org/abs/2208.07022)
[Re-imagen: Retrieval-augmented text-to-image generator](https://arxiv.org/abs/2209.14491)
[KNN-Diffusion: Image Generation via Large-Scale Retrieval](https://arxiv.org/abs/2204.02849)
[Retrieval-Augmented Diffusion Models](https://arxiv.org/abs/2204.11824)
[Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models](https://arxiv.org/abs/2207.13038)
[X&Fuse: Fusing Visual Information in Text-to-Image Generation](https://arxiv.org/abs/2303.01000)
[Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs](https://arxiv.org/abs/2401.11708)
- Image Captioning
[Memory-augmented image captioning](https://ojs.aaai.org/index.php/AAAI/article/view/16220)
[Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning](https://www.sciencedirect.com/science/article/pii/S0950705120308595)
[Retrieval-Augmented Transformer for Image Captioning](https://arxiv.org/abs/2207.13162)
[Retrieval-augmented image captioning](https://arxiv.org/abs/2302.08268)
[Reveal: Retrieval-augmented visual-language pre-training with multi-source multimodal knowledge memory](https://arxiv.org/abs/2212.05221)
[SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation](https://arxiv.org/abs/2209.15323)
[Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning](https://www.mdpi.com/2072-4292/16/1/196)
- Others
[An empirical study of gpt-3 for few-shot knowledge-based vqa](https://ojs.aaai.org/index.php/AAAI/article/view/20215)
[Retrieval augmented visual question answering with outside knowledge](https://aclanthology.org/2022.emnlp-main.772/)
[Augmenting transformers with KNN-based composite memory for dialog](https://doi.org/10.1162/tacl_a_00356)
[Maria: A visual experience powered conversational agent](https://aclanthology.org/2021.acl-long.435/)
[Neural machine translation with phrase-level universal visual representations](https://aclanthology.org/2022.acl-long.390/)### RAG for Video
- Video Captioning[Incorporating Background Knowledge into Video Description Generation](https://aclanthology.org/D18-1433/)
[Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning](https://doi.org/10.1145/3539225)
[Concept-Aware Video Captioning: Describing Videos With Effective Prior Information](https://doi.org/10.1109/TIP.2023.3307969)
[Retrieval-Augmented Egocentric Video Captioning](https://arxiv.org/abs/2401.00789)
- Video QA&Dialogue
[Memory augmented deep recurrent neural network for video question answering](https://doi.org/10.1109/TNNLS.2019.2938015)
[Retrieving-to-answer: Zero-shot video question answering with frozen large language models](https://openaccess.thecvf.com/content/ICCV2023W/MMFM/html/Pan_Retrieving-to-Answer_Zero-Shot_Video_Question_Answering_with_Frozen_Large_Language_Models_ICCVW_2023_paper.html)
[Tvqa+: Spatio-temporal grounding for video question answering](https://aclanthology.org/2020.acl-main.730/)
[Vgnmn: Video-grounded neural module networks for video-grounded dialogue systems](https://aclanthology.org/2022.naacl-main.247/)
- Others
[Language models with image descriptors are strong few-shot video-language learners](https://proceedings.neurips.cc/paper_files/paper/2022/hash/381ceeae4a1feb1abc59c773f7e61839-Abstract-Conference.html)
[RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model](https://arxiv.org/abs/2402.10828)[Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation](https://doi.org/10.48550/arXiv.2307.06940)
[Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval](https://doi.org/10.1109/ICCV48922.2021.00175)
### RAG for 3D
- Text-to-3D[ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model](https://doi.org/10.1109/ICCV51070.2023.00040)
[AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion](https://arxiv.org/abs/2312.12763)
[Retrieval-Augmented Score Distillation for Text-to-3D Generation](https://doi.org/10.48550/arXiv.2402.02972)
### RAG for Knowledge
- Knowledge Base Question Answering[ReTraCk: A Flexible and Efficient Framework for Knowledge Base Question Answering](https://doi.org/10.18653/v1/2021.acl-demo.39)
[Unseen Entity Handling in Complex Question Answering over Knowledge Base via Language Generation](https://aclanthology.org/2021.findings-emnlp.50/)
[Case-based Reasoning for Natural Language Queries over Knowledge Bases](https://doi.org/10.18653/v1/2021.emnlp-main.755)
[Logical Form Generation via Multi-task Learning for Complex Question Answering over Knowledge Bases](https://aclanthology.org/2022.coling-1.145)
[Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database](https://aclanthology.org/2022.emnlp-main.605/)
[RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering](https://aclanthology.org/2022.acl-long.417/)
[TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Base](https://aclanthology.org/2022.emnlp-main.555/)
[DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases](https://openreview.net/forum?id=XHc5zRPxqV9)
[End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion](https://aclanthology.org/2023.eacl-main.255/)
[Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA](https://dl.acm.org/doi/abs/10.1145/3583780.3615150)
[Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering](https://arxiv.org/abs/2308.13259)
[Few-shot Transfer Learning for Knowledge Base Question Answering: Fusing Supervised Models with In-Context Learning](https://arxiv.org/abs/2311.08894)
[FC-KBQA: A Fine-to-Coarse Composition Framework for Knowledge Base Question Answering](https://aclanthology.org/2023.acl-long.57/)
[Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering](https://aclanthology.org/2023.nlrse-1.7/)
[Knowledge Graph-augmented Language Models for Complex Question Answering](https://aclanthology.org/2023.nlrse-1.1/)
[Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering](https://arxiv.org/abs/2309.11206)
[Distribution Shifts Are Bottlenecks: Extensive Evaluation for Grounding Language Models to Knowledge Bases](https://aclanthology.org/2024.eacl-srw.7/)
[Probing Structured Semantics Understanding and Generation of Language Models via Question Answering](https://arxiv.org/abs/2401.05777)
[Keqing: Knowledge-based Question Answering is A Nature Chain-of-Thought mentor of LLMs](https://arxiv.org/abs/2401.00426)
[Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models](https://arxiv.org/abs/2402.15131)
- Knowledge-augmented Open-domain Question Answering
[UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering](https://aclanthology.org/2022.findings-naacl.115/)
[KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering](https://aclanthology.org/2022.acl-long.340/)
[Empowering Language Models with Knowledge Graph Reasoning for Open-Domain Question Answering](https://aclanthology.org/2022.emnlp-main.650/)
[Grape: Knowledge Graph Enhanced Passage Reader for Open-domain Question Answering](https://aclanthology.org/2022.findings-emnlp.13/)
[Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation](https://dl.acm.org/doi/abs/10.1145/3581783.3611964)
[DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text](https://arxiv.org/abs/2310.20170)
[KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases](https://arxiv.org/abs/2308.11761)
[Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering](https://arxiv.org/abs/2403.02966)
[Two-stage Generative Question Answering on Temporal Knowledge Graph Using Large Language Models](https://arxiv.org/abs/2402.16568)
[KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph](https://arxiv.org/abs/2312.15880)
[GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning](https://arxiv.org/pdf/2405.20139)
- Table Question Answering
[NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned](https://proceedings.mlr.press/v133/min21a.html)
[Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering](https://aclanthology.org/2021.acl-long.315/)
[End-to-End Table Question Answering via Retrieval-Augmented Generation](https://arxiv.org/abs/2203.16714)
[OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering](https://aclanthology.org/2022.naacl-main.68/)
[Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering](https://www.ijcai.org/proceedings/2022/0629.pdf)
[Conversational Question Answering on Heterogeneous Sources](https://dl.acm.org/doi/abs/10.1145/3477495.3531815)
[Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge](https://aclanthology.org/2022.findings-emnlp.392/)
[StructGPT: A General Framework for Large Language Model to Reason over Structured Data](https://aclanthology.org/2023.emnlp-main.574/)
[cTBLS: Augmenting Large Language Models with Conversational Tables](https://aclanthology.org/2023.nlp4convai-1.6/)
[RINK: Reader-Inherited Evidence Reranker for Table-and-Text Open Domain Question Answering](https://ojs.aaai.org/index.php/AAAI/article/view/26577)
[Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables](https://aclanthology.org/2023.findings-ijcnlp.1/)
[Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data](https://arxiv.org/abs/2402.12869)
[ERATTA: Extreme RAG for Table To Answers with Large Language Models](https://arxiv.org/pdf/2405.03963)
- Others
[Improving Knowledge-Aware Dialogue Response Generation by Using Human-Written Prototype Dialogues](https://aclanthology.org/2020.findings-emnlp.126/)
[Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation](https://arxiv.org/abs/2305.18846)
[RHO: Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding](https://aclanthology.org/2023.findings-acl.275/)
[Retrieval-Enhanced Generative Model for Large-Scale Knowledge Graph Completion](https://doi.org/10.1145/3539618.3592052)
[Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion](https://arxiv.org/abs/2311.06318)
[G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering](https://arxiv.org/abs/2402.07630)
[RAG-based Explainable Prediction of Road Users Behaviors for Automated Driving using Knowledge Graphs and Large Language Models](https://arxiv.org/pdf/2405.00449)
[HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models](https://arxiv.org/pdf/2405.14831)
### RAG for Science
- Drug Discovery
[Retrieval-based controllable molecule generation](https://arxiv.org/abs/2208.11126)
[Prompt-based 3d molecular diffusion models for structure-based drug design](https://openreview.net/forum?id=FWsGuAFn3n)
- Biomedical Informatics Enhancement[PoET: A generative model of protein families as sequences-of-sequences](https://proceedings.neurips.cc/paper_files/paper/2023/hash/f4366126eba252699b280e8f93c0ab2f-Abstract-Conference.html)
[Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making](https://dl.acm.org/doi/abs/10.1145/3584371.3612956)[BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature](https://aclanthology.org/2022.emnlp-main.390/)
[Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation](https://arxiv.org/abs/2106.06471)
[From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process](https://arxiv.org/abs/2402.01717)
[RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts](https://arxiv.org/pdf/2405.13179)
- Math Applications
[Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference](https://arxiv.org/abs/2310.03184)
[LeanDojo: Theorem Proving with Retrieval-Augmented Language Models](https://proceedings.neurips.cc/paper_files/paper/2023/hash/4441469427094f8873d0fecb0c4e1cee-Abstract-Datasets_and_Benchmarks.html)
## Benchmark
[Benchmarking Large Language Models in Retrieval-Augmented Generation](https://doi.org/10.48550/arXiv.2309.01431)
[CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models](https://doi.org/10.48550/arXiv.2401.17043)
[ARES: An Automated Evaluation Framework for Retrieval-AugmentedGeneration Systems](https://doi.org/10.48550/arXiv.2311.09476)
[RAGAS: Automated Evaluation of Retrieval Augmented Generation](https://doi.org/10.48550/arXiv.2309.15217)[KILT: a Benchmark for Knowledge Intensive Language Tasks](https://arxiv.org/abs/2009.02252)
## Citation
if you find this work useful, please cite our paper:
```
@article{zhao2024retrieval,
title={Retrieval-Augmented Generation for AI-Generated Content: A Survey},
author={Zhao, Penghao and Zhang, Hailin and Yu, Qinhan and Wang, Zhengren and Geng, Yunteng and Fu, Fangcheng and Yang, Ling and Zhang, Wentao and Cui, Bin},
journal={arXiv preprint arXiv:2402.19473},
year={2024}
}
```