{"id":24695064,"url":"https://github.com/HITsz-TMG/awesome-llm-reader","last_synced_at":"2025-10-09T01:31:22.380Z","repository":{"id":199383647,"uuid":"702763265","full_name":"HITsz-TMG/awesome-llm-reader","owner":"HITsz-TMG","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-15T06:14:21.000Z","size":21,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-30T09:02:02.007Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HITsz-TMG.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-10T01:04:46.000Z","updated_at":"2025-08-07T12:29:35.000Z","dependencies_parsed_at":"2023-10-23T09:42:02.812Z","dependency_job_id":"6d7be12d-a969-4dbc-8baf-240d2730e9b0","html_url":"https://github.com/HITsz-TMG/awesome-llm-reader","commit_stats":null,"previous_names":["hitsz-tmg/awesome-llm-reader"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HITsz-TMG/awesome-llm-reader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HITsz-TMG%2Fawesome-llm-reader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HITsz-TMG%2Fawesome-llm-reader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HITsz-TMG%2Fawesome-llm-reader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HITsz-TMG%2Fawesome-llm-reader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HITsz-TMG","download_url":"https://codeload.github.com/HITsz-TMG/awesome-llm-reader/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HITsz-TMG%2Fawesome-llm-reader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000725,"owners_count":26082895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-27T00:01:44.250Z","updated_at":"2025-10-09T01:31:22.064Z","avatar_url":"https://github.com/HITsz-TMG.png","language":null,"funding_links":[],"categories":["Other Collections"],"sub_categories":["Workshops"],"readme":"#  A Repository of Retrieval-augmented LLMs\n\n\n\n\n* [2023/05] **Active Retrieval Augmented Generation.** *Zhengbao Jiang et al. arXiv.* [[paper](https://browse.arxiv.org/pdf/2305.06983.pdf)]\n\n* [2023/05] **Augmented Large Language Models with Parametric Knowledge Guiding.** *Ziyang Luo et al. arXiv.* [[paper](https://arxiv.org/pdf/2305.04757.pdf)]\n\n* [2023/05] **RET-LLM: Towards a General Read-Write Memory for Large Language Models.** *Ali Modarressi et al. arXiv.* [[paper](https://arxiv.org/pdf/2305.14322.pdf)]\n\n* [2023/05] **Query Rewriting for Retrieval-Augmented Large Language Models.** *Xinbei Ma et al. EMNLP.* [[paper](https://browse.arxiv.org/pdf/2305.14283.pdf)]\n\n* [2023/05] **Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy.** *Zhihong Shao et al. EMNLP.* [[paper](https://browse.arxiv.org/pdf/2305.15294.pdf)]\n\n* [2023/05] **WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences.** *Xiao Liu et al. KDD.* [[paper](https://arxiv.org/pdf/2306.07906.pdf)]\n\n* [2023/07] **Chain of Thought Prompting Elicits Knowledge Augmentation.** *Dingjun Wu et al. arXiv.* [[paper](https://arxiv.org/pdf/2307.01640.pdf)]\n\n* [2023/10] **Retrieval-Generation Synergy Augmented Large Language Models.** *Zhangyin Feng et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.05149v1.pdf)]\n\n* [2023/10] **FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation.** *Tu Vu et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.03214v1.pdf)]\n\n* [2023/10] **Hexa: Self-Improving for Knowledge-Grounded Dialogue System.** *Daejin Jo et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.06404v1.pdf)]\n\n* [2023/10] **Retrieve Anything To Augment Large Language Models.** *Peitian Zhang et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.07554.pdf)]\n\n* [2023/10] **Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection.** *Akari Asai et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.11511.pdf)]\n\n* [2024/01] **DocLLM: A layout-aware generative language model for multimodal document understanding.** *Dongsheng Wang et al. arXiv.* [[paper](https://arxiv.org/pdf/2401.00908v1.pdf)]\n\n* [2024/03] **Uni-SMART: Universal Science Multimodal Analysis and Research Transformer.** *Hengxing Cai et al. arXiv.* [[paper](https://arxiv.org/pdf/2403.10301.pdf)]\n\n* [2024/03] **RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback.** *Yanming Liu et al. arXiv.* [[paper](https://arxiv.org/pdf/2403.06840.pdf)]\n\n## :memo: Knowledge Preprocessing\n\n* [2023/09] **PDFTriage: Question Answering over Long, Structured Documents.** *Jon Saad-Falcon et al. arXiv.* [[paper](https://browse.arxiv.org/pdf/2309.08872v1.pdf)]\n\n* [2023/10] **Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading.** *Howard Chen et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.05029.pdf)]\n\n* [2023/11] **LLatrieval: LLM-Verified Retrieval for Verifiable Generation.** *Xiaonan Li et al. arXiv.* [[paper](https://arxiv.org/pdf/2311.07838v1.pdf)]\n\n* [2023/11] **Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models.** *Wenhao Yu et al. arXiv.* [[paper](https://arxiv.org/pdf/2311.09210v1.pdf)]\n\n* [2023/11] **Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering.** *Inderjeet Nair et al. arXiv.* [[paper](https://arxiv.org/pdf/2311.13565v1.pdf)]\n\n* [2023/11] **Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Recognition.** *Demiao LIN arXiv.* [[paper](https://arxiv.org/pdf/2401.12599.pdf)]\n\n* [2024/06] **LumberChunker: Long-Form Narrative Document Segmentation.** *André V. Duarte arXiv.* [[paper](https://arxiv.org/pdf/2406.17526v1)]\n\n\n## :chart_with_upwards_trend: Evaluation\n\n* [2023/04] **Can ChatGPT-like Generative Models Guarantee Factual Accuracy? On the Mistakes of New Generation Search Engines.** *Ruochen Zhao et al. arXiv.* [[paper](https://browse.arxiv.org/pdf/2304.11076.pdf)]\n\n* [2023/06] **ToolQA: A Dataset for LLM Question Answering with External Tools.** *Yuchen Zhuang et al. arXiv.* [[paper](https://arxiv.org/pdf/2306.13304.pdf)]\n\n* [2023/09] **Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues.** *Norbert Braunschweiler et al. arXiv.* [[paper](https://browse.arxiv.org/pdf/2309.11838v1.pdf)]\n\n* [2023/09] **Benchmarking Large Language Models in Retrieval-Augmented Generation.** *Jiawei Chen et al. arXiv.* [[paper](https://browse.arxiv.org/pdf/2309.01431v1.pdf)]\n\n* [2023/10] **Understanding Retrieval Augmentation for Long-Form Question Answering.** *Hung-Ting Chen et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.12150.pdf)]\n\n* [2024/01] **Corrective Retrieval Augmented Generation.** *Shi-Qi Yan et al. arXiv.* [[paper](https://arxiv.org/pdf/2401.15884v1.pdf)]\n\n* [2024/01] **CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models.** *Yuanjie Lyu et al. arXiv.* [[paper](https://arxiv.org/pdf/2401.17043v1.pdf)]\n\n* [2024/04] **How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs’ internal prior.** *Kevin Wu et al. arXiv.* [[paper](https://arxiv.org/pdf/2404.10198v1.pdf)]\n\n* [2024/07] **Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach.** *Zhuowan Li et al. arXiv.* [[paper](https://arxiv.org/pdf/2407.16833v1)]\n\n\n## :rocket: Efficiency\n\n* [2022/12] **Parallel Context Windows for Large Language Models.** *Nir Ratner et al. ACL.* [[paper](https://arxiv.org/pdf/2212.10947.pdf)]\n\n* [2023/05] **Plug-and-Play Knowledge Injection for Pre-trained Language Models.** *Zhengyan Zhang et al. ACL.* [[paper](https://arxiv.org/pdf/2305.17691.pdf)]\n\n* [2023/05] **Adapting Language Models to Compress Contexts.** *Alexis Chevalier et al. arXiv.* [[paper](https://arxiv.org/pdf/2305.14788.pdf)]\n\n* [2023/07] **Thrust: Adaptively Propels Large Language Models with External Knowledge.** *Xinran Zhao et al. arXiv.* [[paper](https://arxiv.org/pdf/2307.10442.pdf)]\n\n* [2023/10] **RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.** *Fangyuan Xu et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.04408v1.pdf)]\n\n* [2023/10] **Compressing Context to Enhance Inference Efficiency of Large Language Models.** *Yucheng Li et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.06201v1.pdf)]\n\n* [2023/10] **CacheGen: Fast Context Loading for Language Model Applications.** *Yuhan Liu et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.07240.pdf)]\n\n* [2023/10] **TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction.** *Junyi Liu et al. arXiv.* [[paper](https://arxiv.org/pdf/2310.15556v1.pdf)]\n\n* [2023/11] **Learning to Filter Context for Retrieval-Augmented Generation.** *Zhiruo Wang et al. arXiv.* [[paper](https://arxiv.org/pdf/2311.08377v1.pdf)]\n\n* [2024/02] **Generative Representational Instruction Tuning.** *Niklas Muennighoff et al. arXiv.* [[paper](https://arxiv.org/pdf/2402.09906.pdf)]\n\n* [2024/02] **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts.** *Kuang-Huei Lee et al. arXiv.* [[paper](https://arxiv.org/pdf/2402.09727.pdf)]\n\n* [2024/02] **Superposition Prompting: Improving and Accelerating RetrievalAugmented Generation.** *Thomas Merth et al. arXiv.* [[paper](https://arxiv.org/pdf/2404.06910.pdf)]\n\n* [2024/05] **Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection.** *Yun Zhu et al. arXiv.* [[paper](https://arxiv.org/pdf/2405.16178)]","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHITsz-TMG%2Fawesome-llm-reader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHITsz-TMG%2Fawesome-llm-reader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHITsz-TMG%2Fawesome-llm-reader/lists"}