Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
https://github.com/Xnhyacinth/Awesome-LLM-Long-Context-Modeling
Last synced: 1 day ago
JSON representation
-
11. Benchmark and Evaluation
-
11.1 LLM
- **LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation.**
- ![GitHub Repo stars - LM/tree/ssm/examples/mamba)
- **Long Range Arena : A Benchmark for Efficient Transformers.**
- ![GitHub Repo stars - research/long-range-arena)
- **LUQ: Long-text Uncertainty Quantification for LLMs.**
- ![GitHub Repo stars
- **Long-context LLMs Struggle with Long In-context Learning.**
- ![GitHub Repo stars - AI-Lab/LongICLBench)
- **CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems.**
- ![GitHub Repo stars
- **XL2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies.**
- ![GitHub Repo stars - nlp/XL2Bench)
- ![GitHub Repo stars - deepmind/loft)
- **Long Range Arena : A Benchmark for Efficient Transformers.**
- ![GitHub Repo stars - research/long-range-arena)
- **MuLD: The Multitask Long Document Benchmark.**
- ![GitHub Repo stars
- **Lost in the Middle: How Language Models Use Long Contexts.**
- ![GitHub Repo stars - liu/lost-in-the-middle)
- **L-Eval: Instituting Standardized Evaluation for Long Context Language Models.**
- ![GitHub Repo stars
- **LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding.**
- ![GitHub Repo stars - coai/LOT-LongLM)
- **SCROLLS: Standardized CompaRison Over Long Language Sequences.**
- ![GitHub Repo stars - nlp/scrolls)
- ![GitHub Repo stars
- **Content Reduction, Surprisal and Information Density Estimation for Long Documents.**
- ![GitHub Repo stars
- **LooGLE: Long Context Evaluation for Long-Context Language Models.**
- ![GitHub Repo stars - nlco/loogle)
- **The Impact of Reasoning Step Length on Large Language Models.**
- **DocFinQA: A Long-Context Financial Reasoning Dataset.** - Kedziorski, Viet Dac Lai, Chris Tanner.* Arxiv 2024.
- **LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents.**
- **PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models.**
- **LongHealth: A Question Answering Benchmark with Long Clinical Documents.** - Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem.* Arxiv 2024.
- **Long-form evaluation of model editing.**
- **In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss.**
- ![GitHub Repo stars
- ![GitHub Repo stars - nlp/scrolls)
- ![GitHub Repo stars - coai/LOT-LongLM)
- **SCROLLS: Standardized CompaRison Over Long Language Sequences.**
- **MuLD: The Multitask Long Document Benchmark.**
- ![GitHub Repo stars
- **Lost in the Middle: How Language Models Use Long Contexts.**
- ![GitHub Repo stars - liu/lost-in-the-middle)
- **L-Eval: Instituting Standardized Evaluation for Long Context Language Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding.**
- **Content Reduction, Surprisal and Information Density Estimation for Long Documents.**
- ![GitHub Repo stars
- ![GitHub Repo stars - nlco/loogle)
- **The Impact of Reasoning Step Length on Large Language Models.**
- **LongHealth: A Question Answering Benchmark with Long Clinical Documents.** - Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem.* Arxiv 2024.
- **Long-form evaluation of model editing.**
- **In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss.**
- **DocFinQA: A Long-Context Financial Reasoning Dataset.** - Kedziorski, Viet Dac Lai, Chris Tanner.* Arxiv 2024.
- **LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents.**
- **PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models.**
- ![GitHub Repo stars
- **∞Bench: Extending Long Context Evaluation Beyond 100K Tokens.**
- **Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models.**
- ![GitHub Repo stars - Task-More-Tokens)
- **Evaluating Very Long-Term Conversational Memory of LLM Agents.** - Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang.* Arxiv 2024.
- ![GitHub Repo stars - research/LoCoMo)
- **∞Bench: Extending Long Context Evaluation Beyond 100K Tokens.**
- **Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models.**
- ![GitHub Repo stars - Task-More-Tokens)
- **Evaluating Very Long-Term Conversational Memory of LLM Agents.** - Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang.* Arxiv 2024.
- ![GitHub Repo stars - research/LoCoMo)
- **Language Models as Science Tutors.** - Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen.* Arxiv 2024.
- ![GitHub Repo stars - nlp/LM-Science-Tutor)
- **Needle in a haystack - pressure testing llms.**
- ![GitHub Repo stars
- **Language Models as Science Tutors.** - Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen.* Arxiv 2024.
- ![GitHub Repo stars - nlp/LM-Science-Tutor)
- **Needle in a haystack - pressure testing llms.**
- ![GitHub Repo stars
- **In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss.**
- **LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.**
- **In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss.**
- **LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models.**
- ![GitHub Repo stars - Stars)
- **Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models.**
- ![GitHub Repo stars - Stars)
- **NovelQA: A Benchmark for Long-Range Novel Question Answering.**
- ![GitHub Repo stars
- **NovelQA: A Benchmark for Long-Range Novel Question Answering.**
- ![GitHub Repo stars
- **CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models.**
- **Long-form factuality in large language models.**
- ![GitHub Repo stars - deepmind/long-form-factuality)
- **LUQ: Long-text Uncertainty Quantification for LLMs.**
- ![GitHub Repo stars
- **Long-context LLMs Struggle with Long In-context Learning.**
- ![GitHub Repo stars - AI-Lab/LongICLBench)
- **CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems.**
- ![GitHub Repo stars
- **XL2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies.**
- ![GitHub Repo stars - nlp/XL2Bench)
- **OLAPH: Improving Factuality in Biomedical Long-form Question Answering.**
- **Long-form factuality in large language models.**
- ![GitHub Repo stars - deepmind/long-form-factuality)
- ![GitHub Repo stars
- **LongEmbed: Extending Embedding Models for Long Context Retrieval.**
- ![GitHub Repo stars - pku/LongEmbed)
- **Make Your LLM Fully Utilize the Context.** - Guang Lou.* Arxiv 2024.
- ![GitHub Repo stars
- **S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models.**
- ![GitHub Repo stars
- **In-Context Learning with Long-Context Models: An In-Depth Exploration.**
- ![GitHub Repo stars - context-icl)
- **Many-shot Jailbreaking.**
- **DOLOMITES: Domain-Specific Long-Form Methodical Tasks.**
- **Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis.**
- **FinTextQA: A Dataset for Long-form Financial Question Answering.**
- **A Multi-Perspective Analysis of Memorization in Large Language Models.**
- ![GitHub Repo stars - lab/OLAPH)
- **Can LLMs Solve longer Math Word Problems Better?.**
- ![GitHub Repo stars - USTC/CoLeG-Math)
- **Base of RoPE Bounds Context Length.**
- **Many-shot In-Context Learning.** - Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle.* Arxiv 2024.
- **Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models.**
- ![GitHub Repo stars
- **Language Models Need Inductive Biases to Count Inductively.**
- **An Empirical Study of Mamba-based Language Models.**
- **BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack.**
- **Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!.**
- ![GitHub Repo stars
- **What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling.**
- ![GitHub Repo stars - compass/Ada-LEval)
- **RULER: What's the Real Context Size of Your Long-Context Language Models?.** - Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg.* Arxiv 2024.
- ![GitHub Repo stars
- **Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors.**
- **Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.**
- **LongEmbed: Extending Embedding Models for Long Context Retrieval.**
- ![GitHub Repo stars - pku/LongEmbed)
- **Make Your LLM Fully Utilize the Context.** - Guang Lou.* Arxiv 2024.
- ![GitHub Repo stars
- **S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models.**
- ![GitHub Repo stars
- **In-Context Learning with Long-Context Models: An In-Depth Exploration.**
- ![GitHub Repo stars - context-icl)
- **Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors.**
- **Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.**
- ![GitHub Repo stars - compass/Ada-LEval)
- **RULER: What's the Real Context Size of Your Long-Context Language Models?.** - Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg.* Arxiv 2024.
- ![GitHub Repo stars - lab/OLAPH)
- ![GitHub Repo stars - USTC/CoLeG-Math)
- **Base of RoPE Bounds Context Length.**
- **Many-shot In-Context Learning.** - Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle.* Arxiv 2024.
- ![GitHub Repo stars - deepmind/loft)
- **Many-shot Jailbreaking.**
- **DOLOMITES: Domain-Specific Long-Form Methodical Tasks.**
- **Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis.**
- **FinTextQA: A Dataset for Long-form Financial Question Answering.**
- **A Multi-Perspective Analysis of Memorization in Large Language Models.**
- **Language Models Need Inductive Biases to Count Inductively.**
- ![GitHub Repo stars
- **Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding.** - Seng Chua.* Arxiv 2024.
- **CRAG -- Comprehensive RAG Benchmark.** - tau Yih, Xin Luna Dong.* Arxiv 2024.
- **BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack.**
- **Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!.**
- ![GitHub Repo stars
- **What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling.**
- **Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.**
- **Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?.** - Wei Chang, Kelvin Guu.* Arxiv 2024.
- **Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell.**
- ![GitHub Repo stars - dont-tell)
- **MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens.**
- ![GitHub Repo stars - fans/MedOdyssey)
- **USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations.**
- **One Thousand and One Pairs: A "novel" challenge for long-context language models.**
- ![GitHub Repo stars
- **LongIns: A Challenging Long-context Instruction-based Exam for LLMs.**
- **Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA.**
- ![GitHub Repo stars
- **Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding.** - Seng Chua.* Arxiv 2024.
- **CRAG -- Comprehensive RAG Benchmark.** - tau Yih, Xin Luna Dong.* Arxiv 2024.
- ![GitHub Repo stars
- **Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.**
- **Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?.** - Wei Chang, Kelvin Guu.* Arxiv 2024.
- **Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization.** - Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister.* Arxiv 2024.
- ![GitHub Repo stars
- **LongIns: A Challenging Long-context Instruction-based Exam for LLMs.**
- **Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA.**
- ![GitHub Repo stars
- **VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation.**
- ![GitHub Repo stars - Song/VeriScore)
- **ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP.**
- **Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems.** - Sheng Wu.* Arxiv 2024.
- ![GitHub Repo stars - of-a-haystack)
- ![GitHub Repo stars - needle-in-a-haystack)
- **LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs.** - Wei Lee.* Arxiv 2024.
- **What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.**
- **Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks.**
- **A Controlled Study on Long Context Extension and Generalization in LLMs.**
- ![GitHub Repo stars
- **RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues.** - Lin Kuo, Feng-Ting Liao, Mu-Wei Hsieh, Fu-Chieh Chang, Po-Chun Hsu, Da-Shan Shiu.* Arxiv 2024.
- ![GitHub Repo stars - Bench)
- **Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation.**
- **Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries.** - Baptiste Lespiau, Nithya Attaluri, Kate Olszewska.* Arxiv 2024.
- **DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels.**
- **LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA.**
- ![GitHub Repo stars
- **Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell.**
- ![GitHub Repo stars - dont-tell)
- ![GitHub Repo stars - fans/MedOdyssey)
- **USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations.**
- **Entity-Level Sentiment: More than the Sum of Its Parts.**
- **Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction.**
- **RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension.**
- **Attribute or Abstain: Large Language Models as Long Document Assistants.**
- ![GitHub Repo stars - attribute-or-abstain)
- **How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities.**
- **DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems.**
- ![GitHub Repo stars - Zou/DocBench)
- **KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches.** - Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu.* Arxiv 2024.
- ![GitHub Repo stars
- **Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP.**
- **Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems.** - Sheng Wu.* Arxiv 2024.
- ![GitHub Repo stars - of-a-haystack)
- **Entity-Level Sentiment: More than the Sum of Its Parts.**
- **Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction.**
- **RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension.**
- **VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation.**
- ![GitHub Repo stars - Song/VeriScore)
- **ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.**
- **Attribute or Abstain: Large Language Models as Long Document Assistants.**
- ![GitHub Repo stars - attribute-or-abstain)
- **How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities.**
- **DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems.**
- ![GitHub Repo stars - Zou/DocBench)
- **NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?.**
- ![GitHub Repo stars - compass/opencompass)
- **LongLaMP: A Benchmark for Personalized Long-form Text Generation.**
- **RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering.**
- ![GitHub Repo stars - qa-arena)
- **Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models.** - S Dovonon, Jean Kaddour, Pasquale Minervini.* ICML 2024 TF2M workshop.
- **Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack.**
- **WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries.**
- **Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach.**
- **Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval.**
- ![GitHub Repo stars - key-retrieval-code-tasks)
- ![GitHub Repo stars - qa-arena)
- **NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?.**
- ![GitHub Repo stars - compass/opencompass)
- **LongLaMP: A Benchmark for Personalized Long-form Text Generation.**
- ![Static Badge - benchmark.github.io/)
- **RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering.**
- **Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models.** - S Dovonon, Jean Kaddour, Pasquale Minervini.* ICML 2024 TF2M workshop.
- **Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack.**
- ![GitHub Repo stars - USC/Lifelong-ICL)
- **Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks.**
- **A Controlled Study on Long Context Extension and Generalization in LLMs.**
- ![GitHub Repo stars
- **RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues.** - Lin Kuo, Feng-Ting Liao, Mu-Wei Hsieh, Fu-Chieh Chang, Po-Chun Hsu, Da-Shan Shiu.* Arxiv 2024.
- ![GitHub Repo stars - Bench)
- **Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation.**
- **Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries.** - Baptiste Lespiau, Nithya Attaluri, Kate Olszewska.* Arxiv 2024.
- **DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels.**
- **LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA.**
- **WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries.**
- **Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach.**
- **Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval.**
- ![GitHub Repo stars - key-retrieval-code-tasks)
- **Long Input Benchmark for Russian Analysis.**
- **CoverBench: A Challenging Benchmark for Complex Claim Verification.** - David, Uri Shaham, Amir Feder, Mor Geva, Dror Marcus, Avi Caciularu.* Arxiv 2024.
- **Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models.**
- ![GitHub Repo stars
- **HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.**
- ![GitHub Repo stars - bench)
- **Multilingual Evaluation of Long Context Retrieval and Reasoning.**
- **L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?**
- ![GitHub Repo stars - CITEEVAL)
- **HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly.**
- ![GitHub Repo stars - nlp/HELMET)
- **MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs.** - Peng Lim, Caiming Xiong, Doyen Sahoo.* Arxiv 2024.
- **Hyper-multi-step: The Truth Behind Difficult Long-context Tasks.**
- ![GitHub Repo stars
- **Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data.**
- ![GitHub Repo stars
- **How much do contextualized representations encode long-range context?.** - Ping Hsieh.* Arxiv 2024.
- **LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.** - Wei Chang, Dong Yu.* Arxiv 2024.
- ![GitHub Repo stars
- **When Attention Sink Emerges in Language Models: An Empirical View.**
- ![GitHub Repo stars - sg/Attention-Sink)
- **Long Input Benchmark for Russian Analysis.**
- **CoverBench: A Challenging Benchmark for Complex Claim Verification.** - David, Uri Shaham, Amir Feder, Mor Geva, Dror Marcus, Avi Caciularu.* Arxiv 2024.
- ![GitHub Repo stars
- **Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs.**
- **LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs.**
- **Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models.**
- ![GitHub Repo stars - needle-in-a-haystack)
- **LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs.** - Wei Lee.* Arxiv 2024.
- ![GitHub Repo stars
- **What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.**
- ![GitHub Repo stars - thu/LongPiBench)
- **ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage.**
- ![GitHub Repo stars - lab/ETHIC)
- **Long2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall.**
- ![GitHub Repo stars - roberts1/needle-threading)
- ![GitHub Repo stars
- ![GitHub Repo stars - ai/Spider2)
- ![GitHub Repo stars - ai/medical-eval-sphere)
- ![GitHub Repo stars
- **HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.**
- ![GitHub Repo stars - bench)
- **Multilingual Evaluation of Long Context Retrieval and Reasoning.**
- **L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?**
- ![GitHub Repo stars - CITEEVAL)
- **HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly.**
- ![GitHub Repo stars - nlp/HELMET)
- ![GitHub Repo stars
- **Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data.**
- ![GitHub Repo stars
- **How much do contextualized representations encode long-range context?.** - Ping Hsieh.* Arxiv 2024.
- **LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.** - Wei Chang, Dong Yu.* Arxiv 2024.
- ![GitHub Repo stars
- **When Attention Sink Emerges in Language Models: An Empirical View.**
- ![GitHub Repo stars - sg/Attention-Sink)
- **Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key.**
- **MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs.** - Peng Lim, Caiming Xiong, Doyen Sahoo.* Arxiv 2024.
- **Hyper-multi-step: The Truth Behind Difficult Long-context Tasks.**
- **Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs.**
- ![GitHub Repo stars - thu/LongPiBench)
- **ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage.**
- ![GitHub Repo stars - lab/ETHIC)
- **Long2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall.**
- ![GitHub Repo stars - roberts1/needle-threading)
- **Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation.**
- ![GitHub Repo stars
- ![GitHub Repo stars - ai/Spider2)
- ![GitHub Repo stars - ai/medical-eval-sphere)
- **A Benchmark for Long-Form Medical Question Answering.**
- **Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows.**
- **DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities.**
- **LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios.**
- **Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?.**
- **BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models.** - Rong Wen.* Arxiv 2023.
- **Retrieval meets Long Context Large Language Models.**
- **An Empirical Study of Mamba-based Language Models.**
-
11.2 MLLM
- ![GitHub Repo stars
- ![Static Badge - documents.github.io/)
- ![GitHub Repo stars
- **MileBench: Benchmarking MLLMs in Long Context.**
- **MovieSum: An Abstractive Summarization Dataset for Movie Screenplays.**
- ![GitHub Repo stars
- **SEED-Story: Multimodal Long Story Generation with Large Language Model.**
- ![GitHub Repo stars - Story)
- ![GitHub Repo stars - deepmind/lm_act)
- **MileBench: Benchmarking MLLMs in Long Context.**
- ![GitHub Repo stars
- **Many-Shot In-Context Learning in Multimodal Foundation Models.**
- ![GitHub Repo stars
- **MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.**
- ![GitHub Repo stars
- **Many-Shot In-Context Learning in Multimodal Foundation Models.**
- ![GitHub Repo stars
- **MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.**
- **RepoQA: Evaluating Long Context Code Understanding.**
- ![GitHub Repo stars
- **Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding.**
- ![GitHub Repo stars
- **RepoQA: Evaluating Long Context Code Understanding.**
- ![GitHub Repo stars
- **Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding.**
- ![GitHub Repo stars
- **Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models.**
- **Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts.**
- **MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.** - Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun.* Arxiv 2024.
- ![GitHub Repo stars - Doc)
- **InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.**
- **Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models.**
- ![GitHub Repo stars - ML-Lab/multimodal-needle-in-a-haystack)
- ![GitHub Repo stars - XComposer)
- **Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge.** - Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Byungsoo Ko, Jonghwan Hyeon, Ho-Jin Choi.* Arxiv 2024.
- ![GitHub Repo stars
- **SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers.**
- ![GitHub Repo stars
- **mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval.**
- **InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.**
- ![GitHub Repo stars - XComposer)
- **Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge.** - Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Byungsoo Ko, Jonghwan Hyeon, Ho-Jin Choi.* Arxiv 2024.
- ![GitHub Repo stars
- **SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers.**
- ![GitHub Repo stars - ML-Lab/multimodal-needle-in-a-haystack)
- **Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts.**
- **MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.** - Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun.* Arxiv 2024.
- ![GitHub Repo stars - Doc)
- **LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding.**
- ![GitHub Repo stars
- **mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval.**
- **MovieSum: An Abstractive Summarization Dataset for Movie Screenplays.**
- ![GitHub Repo stars
- ![GitHub Repo stars - deepmind/lm_act)
- **SEED-Story: Multimodal Long Story Generation with Large Language Model.**
- ![GitHub Repo stars - Story)
- **LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations.**
- **LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos.**
- **M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework.**
-
-
2. Efficient Attention
-
2.2 Linear Attention
- **Softmax Attention with Constant Cost per Token.**
- ![GitHub Repo stars
- **Softmax Attention with Constant Cost per Token.**
- ![GitHub Repo stars
- **Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.**
- **Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.**
- **Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention.**
- **Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective.**
- ![GitHub Repo stars
- **Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention.**
- **Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective.**
- **Attention as an RNN.**
- **Attention as an RNN.**
- **You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet.**
- **You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models.**
- **When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Learning to (Learn at Test Time): RNNs with Expressive Hidden States.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Gated Slot Attention for Efficient Linear-Time Sequence Modeling.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Gated Slot Attention for Efficient Linear-Time Sequence Modeling.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.**
- ![GitHub Repo stars - transformers)
- ![GitHub Repo stars
- **Masked language modeling for proteins via linearly scalable long-context transformers.**
- ![GitHub Repo stars - attention-transformer)
- **Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.**
- ![GitHub Repo stars - transformers)
- **Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations.**
- **Rethinking attention with performers.**
- ![GitHub Repo stars - pytorch)
- **Linformer: Self-attention with linear complexity.**
- **Random Feature Attention.**
- ![GitHub Repo stars - ARK/RFA)
- **Random Feature Attention.**
- ![GitHub Repo stars - ARK/RFA)
- **Luna: Linear unified nested attention.**
- ![GitHub Repo stars - transformer)
- **Fnet: Mixing tokens with fourier transforms.** - Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.* Arxiv 2021.
- ![GitHub Repo stars
- **Luna: Linear unified nested attention.**
- ![GitHub Repo stars - transformer)
- **Fnet: Mixing tokens with fourier transforms.** - Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.* Arxiv 2021.
- ![GitHub Repo stars
- **Gated Linear Attention Transformers with Hardware-Efficient Training.**
- ![GitHub Repo stars
- **Latent Attention for Linear Time Transformers.**
- ![GitHub Repo stars
- **Gated Linear Attention Transformers with Hardware-Efficient Training.**
- **Linear Attention Sequence Parallelism.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Latent Attention for Linear Time Transformers.**
- **Simple linear attention language models balance the recall-throughput tradeoff.**
- ![GitHub Repo stars
- **Linear Attention Sequence Parallelism.**
- ![GitHub Repo stars
-
2.3 Hierarchical Attention
- **Neural Legal Judgment Prediction in English.**
- **Neural Legal Judgment Prediction in English.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Hierarchical Neural Network Approaches for Long Document Classification.**
- **Hierarchical Neural Network Approaches for Long Document Classification.**
- **Hi-transformer: Hierarchical interactive transformer for efficient and effective long document modeling.** - IJCNLP 2021
- **Hi-transformer: Hierarchical interactive transformer for efficient and effective long document modeling.** - IJCNLP 2021
- **Erniesparse: Learning hierarchical efficient transformer through regularized self-attention.**
- **Erniesparse: Learning hierarchical efficient transformer through regularized self-attention.**
-
2.4 IO-Aware Attention
- **FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.**
- **Self-attention Does Not Need O(n^2) Memory.**
- **Faster Causal Attention Over Large Sequences Through Sparse Flash Attention.**
- **FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.**
- **FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.**
- **Efficient Memory Management for Large Language Model Serving with PagedAttention.**
- ![GitHub Repo stars - project/vllm)
- ![GitHub Repo stars - AILab/flash-attention)
- **FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.**
- ![GitHub Repo stars - project/vllm)
- **TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer.**
- ![GitHub Repo stars
- **TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer.**
- ![GitHub Repo stars
- ![GitHub Repo stars - attention)
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Efficient LLM Inference with Kcache.**
- **You Only Cache Once: Decoder-Decoder Architectures for Language Models.**
- **Fast Transformer Decoding: One Write-Head is All You Need.**
- **GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.** - Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai.* Arxiv 2023.
- **DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model.** - AI.* Arxiv 2024.
- ![GitHub Repo stars - ai/DeepSeek-V2)
- **Layer-Condensed KV Cache for Efficient Inference of Large Language Models.**
- **Efficient LLM Inference with Kcache.**
- **You Only Cache Once: Decoder-Decoder Architectures for Language Models.**
- ![GitHub Repo stars
- **Fast Transformer Decoding: One Write-Head is All You Need.**
- **GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.** - Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai.* Arxiv 2023.
- **DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model.** - AI.* Arxiv 2024.
- ![GitHub Repo stars - ai/DeepSeek-V2)
- **Layer-Condensed KV Cache for Efficient Inference of Large Language Models.**
- ![GitHub Repo stars
- **Reducing Transformer Key-Value Cache Size with Cross-Layer Attention.**
- ![GitHub Repo stars
- **Reducing Transformer Key-Value Cache Size with Cross-Layer Attention.**
- **PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.**
- ![GitHub Repo stars
- **PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.**
- ![GitHub Repo stars
- **Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression.** - Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen.* Arxiv 2024.
- **Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression.** - Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen.* Arxiv 2024.
- **MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.**
- **MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.**
- **PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling.**
- **Effectively Compress KV Heads for LLM.**
- **A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression.**
- **Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference.**
- ![GitHub Repo stars - han-lab/Quest)
- **Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters.**
- **CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling.** - Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung.* Arxiv 2024.
- **D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models.**
- **LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.**
- ![GitHub Repo stars - M)
- **Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache.**
- **QuickLLaMA: Query-aware Inference Acceleration for Large Language Models.**
- ![GitHub Repo stars - research/Q-LLM)
- **MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2024.
- ![GitHub Repo stars
- **Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks.**
- **Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization.**
- **Beyond KV Caching: Shared Attention for Efficient LLMs.**
- **Effectively Compress KV Heads for LLM.**
- **Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference.**
- **A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression.**
- ![GitHub Repo stars - han-lab/Quest)
- **Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters.**
- **CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling.** - Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung.* Arxiv 2024.
- **D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models.**
- **LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.**
- ![GitHub Repo stars - M)
- **Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache.**
- **QuickLLaMA: Query-aware Inference Acceleration for Large Language Models.**
- ![GitHub Repo stars - research/Q-LLM)
- **MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2024.
- ![GitHub Repo stars
- **Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks.**
- **Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization.**
- **PQCache: Product Quantization-based KVCache for Long Context LLM Inference.**
- **LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference.**
- **Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope.**
- **RazorAttention: Efficient KV Cache Compression Through Retrieval Heads.**
- **Beyond KV Caching: Shared Attention for Efficient LLMs.**
- ![GitHub Repo stars
- **FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.**
- **ThinK: Thinner Key Cache by Query-Driven Pruning.**
- **A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder.** - rae Jo, Dongkun Shin.* Arxiv 2024.
- **PQCache: Product Quantization-based KVCache for Long Context LLM Inference.**
- ![GitHub Repo stars
- **LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference.**
- **Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope.**
- **RazorAttention: Efficient KV Cache Compression Through Retrieval Heads.**
- **FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.**
- **ThinK: Thinner Key Cache by Query-Driven Pruning.**
- **A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder.** - rae Jo, Dongkun Shin.* Arxiv 2024.
- ![GitHub Repo stars - Notation/A2SF)
- **Cross-layer Attention Sharing for Large Language Models.**
- **NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time.**
- ![GitHub Repo stars - NACL)
- **Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters.**
- ![GitHub Repo stars - Notation/A2SF)
- ![GitHub Repo stars - NACL)
- **Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters.**
- **MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.** - Hsu Yen, Beidi Chen.* Arxiv 2024.
- ![GitHub Repo stars - AI-Lab/MagicDec/)
- **CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.**
- ![GitHub Repo stars
- **RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.**
- **InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference.**
- **CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.**
- **Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction.** - Phi Nguyen, Yingyu Liang, Shafiq Joty.* Arxiv 2024.
- ![GitHub Repo stars
- **Inference-Friendly Models With MixAttention.**
- **KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head.**
- ![GitHub Repo stars - kvcompress)
- **Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads.**
- ![GitHub Repo stars
- **InfiniPot: Infinite Context Processing on Memory-Constrained LLMs.**
- ![GitHub Repo stars
- **RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.**
- **InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference.**
- ![GitHub Repo stars
- **MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.** - Hsu Yen, Beidi Chen.* Arxiv 2024.
- ![GitHub Repo stars - AI-Lab/MagicDec/)
- **CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.**
- **CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.**
- **Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction.** - Phi Nguyen, Yingyu Liang, Shafiq Joty.* Arxiv 2024.
- ![GitHub Repo stars
- **Inference-Friendly Models With MixAttention.**
- **KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head.**
- ![GitHub Repo stars - kvcompress)
- **Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads.**
- ![GitHub Repo stars
- **InfiniPot: Infinite Context Processing on Memory-Constrained LLMs.**
- **UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference.**
- **LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy.**
- **UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference.**
- **LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy.**
- ![GitHub Repo stars
- **MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection.**
- **DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads.**
- ![GitHub Repo stars - han-lab/duo-attention)
- **In-context KV-Cache Eviction for LLMs via Attention-Gate.**
- **SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.**
- ![GitHub Repo stars - sg/SimLayerKV)
- **EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models.**
- **A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference.**
- **MagicPIG: LSH Sampling for Efficient LLM Generation.**
- **KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing.**
- ![GitHub Repo stars - AI-Lab/MagicPIG)
- **Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning.**
- **Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning.**
- **ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.** - Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen.* Arxiv 2024.
- ![GitHub Repo stars
- **BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference.**
- ![GitHub Repo stars - llm)
- **DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads.**
- ![GitHub Repo stars - han-lab/duo-attention)
- **In-context KV-Cache Eviction for LLMs via Attention-Gate.**
- **SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.**
- ![GitHub Repo stars - sg/SimLayerKV)
- **A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference.**
- **Lossless KV Cache Compression to 2%.**
- **MagicPIG: LSH Sampling for Efficient LLM Generation.**
- ![GitHub Repo stars - AI-Lab/MagicPIG)
- **Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning.**
- **Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning.**
- **ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.** - Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen.* Arxiv 2024.
- ![GitHub Repo stars
- **MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection.**
- **EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models.**
- **BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference.**
- ![GitHub Repo stars - llm)
- **VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration.**
- **TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection.**
- ![GitHub Repo stars - attention)
- **Squeezed Attention: Accelerating Long Context Length LLM Inference.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Star Attention: Efficient LLM Inference over Long Sequences.**
- ![GitHub Repo stars - attention)
- ![GitHub Repo stars
- **Star Attention: Efficient LLM Inference over Long Sequences.**
- ![GitHub Repo stars - Attention)
- **When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training.**
- ![GitHub Repo stars
- **Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache.**
- ![GitHub Repo stars
- ![GitHub Repo stars - Lab/AIM)
- **ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression.**
- ![GitHub Repo stars - Attention)
- ![GitHub Repo stars
- **Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity.**
- **AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.**
- ![GitHub Repo stars - Lab/AIM)
- **Recycled Attention: Efficient inference for long-context language models.**
- **SnapKV: LLM Knows What You are Looking for Before Generation.**
- **ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition.**
- **MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression.**
- **SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs.**
- **Cross-Self KV Cache Pruning for Efficient Vision-Language Inference.**
- **Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern.**
- **BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching.**
- **Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.**
- **Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification.**
- ![GitHub Repo stars
- **XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference.**
-
2.1 Sparse Attention
- **Generating Long Sequences with Sparse Transformers.**
- ![GitHub Repo stars
- **Blockwise selfattention for long document understanding.** - tau Yih, Sinong Wang, Jie Tang.* EMNLP 2020.
- **Longformer: The Long-Document Transformer.**
- ![GitHub Repo stars
- **ETC: Encoding Long and Structured Inputs in Transformers.**
- **Big Bird: Transformers for Longer Sequences.**
- ![GitHub Repo stars - research/bigbird)
- **Generating Long Sequences with Sparse Transformers.**
- **Blockwise selfattention for long document understanding.** - tau Yih, Sinong Wang, Jie Tang.* EMNLP 2020.
- ![GitHub Repo stars
- ![GitHub Repo stars
- **ETC: Encoding Long and Structured Inputs in Transformers.**
- **Big Bird: Transformers for Longer Sequences.**
- ![GitHub Repo stars - research/bigbird)
- **Reformer: The efficient transformer.**
- **Reformer: The efficient transformer.**
- ![GitHub Repo stars - pytorch)
- **Sparse Sinkhorn Attention.** - Cheng Juan.* ICML 2020.
- ![GitHub Repo stars - transformer)
- ![GitHub Repo stars - pytorch)
- **Sparse Sinkhorn Attention.** - Cheng Juan.* ICML 2020.
- **Sparse and continuous attention mechanisms.**
- **Efficient Content-Based Sparse Attention with Routing Transformers.**
- **Sparse and continuous attention mechanisms.**
- **LongT5: Efficient text-to-text transformer for long sequences.** - Hsuan Sung, Yinfei Yang.* NAACL 2022.
- ![GitHub Repo stars - research/longt5)
- **Efficient Content-Based Sparse Attention with Routing Transformers.**
- ![GitHub Repo stars - transformer)
- **Efficient Long-Text Understanding with Short-Text Models.**
- ![GitHub Repo stars
- **Parallel Context Windows for Large Language Models.** - Brown, Yoav Shoham.* ACL 2023.
- ![GitHub Repo stars - Context-Windows)
- **Unlimiformer: Long-Range Transformers with Unlimited Length Input.**
- ![GitHub Repo stars
- **Landmark Attention: Random-Access Infinite Context Length for Transformers.**
- ![GitHub Repo stars - attention)
- ![GitHub Repo stars - transformer)
- **LongT5: Efficient text-to-text transformer for long sequences.** - Hsuan Sung, Yinfei Yang.* NAACL 2022.
- ![GitHub Repo stars - research/longt5)
- **Efficient Long-Text Understanding with Short-Text Models.**
- **Parallel Context Windows for Large Language Models.** - Brown, Yoav Shoham.* ACL 2023.
- ![GitHub Repo stars
- ![GitHub Repo stars - Context-Windows)
- **Unlimiformer: Long-Range Transformers with Unlimited Length Input.**
- ![GitHub Repo stars
- **Landmark Attention: Random-Access Infinite Context Length for Transformers.**
- ![GitHub Repo stars - attention)
- **LONGNET: Scaling Transformers to 1,000,000,000 Tokens.**
- ![GitHub Repo stars
- **LONGNET: Scaling Transformers to 1,000,000,000 Tokens.**
- ![GitHub Repo stars
- **Blockwise Parallel Transformer for Long Context Large Models.**
- ![GitHub Repo stars
- **MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers.**
- ![GitHub Repo stars - pytorch)
- **Adapting Language Models to Compress Contexts.**
- ![GitHub Repo stars - nlp/AutoCompressors)
- **Blockwise Parallel Transformer for Long Context Large Models.**
- ![GitHub Repo stars
- **Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers.**
- **Sparse Token Transformer with Attention Back Tracking.**
- **Empower Your Model with Longer and Better Context Comprehension.**
- **MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers.**
- ![GitHub Repo stars - pytorch)
- **Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers.**
- **Long-range Language Modeling with Self-retrieval.**
- **Max-Margin Token Selection in Attention Mechanism.**
- ![GitHub Repo stars - transition)
- **Ring Attention with Blockwise Transformers for Near-Infinite Context.**
- **Efficient Streaming Language Models with Attention Sinks.**
- ![GitHub Repo stars - han-lab/streaming-llm)
- **HyperAttention: Long-context Attention in Near-Linear Time.**
- **Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention.**
- ![GitHub Repo stars - Transformer)
- **Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers.**
- **Long-range Language Modeling with Self-retrieval.**
- **Max-Margin Token Selection in Attention Mechanism.**
- **Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers.**
- **Ring Attention with Blockwise Transformers for Near-Infinite Context.**
- **Sparse Token Transformer with Attention Back Tracking.**
- **Empower Your Model with Longer and Better Context Comprehension.**
- ![GitHub Repo stars - transition)
- ![GitHub Repo stars - han-lab/streaming-llm)
- **HyperAttention: Long-context Attention in Near-Linear Time.**
- **Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention.**
- ![GitHub Repo stars - Transformer)
- **Training-Free Long-Context Scaling of Large Language Models.**
- ![GitHub Repo stars
- **LongHeads: Multi-Head Attention is Secretly a Long Context Processor.**
- **Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention.**
- ![GitHub Repo stars
- **LongHeads: Multi-Head Attention is Secretly a Long Context Processor.**
- **Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention.**
- **Training-Free Long-Context Scaling of Large Language Models.**
- **SnapKV: LLM Knows What You are Looking for Before Generation.**
- **Sequence can Secretly Tell You What to Discard.**
- **SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models.**
- ![GitHub Repo stars - GT-86/SinkLoRA)
- **HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning.**
- **Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens.**
- **Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers.**
- **Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.**
- **Neurocache: Efficient Vector Retrieval for Long-range Language Modeling.**
- ![GitHub Repo stars
- **Weighted Grouped Query Attention in Transformers.**
- **TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention.**
- ![GitHub Repo stars
- **Sequence can Secretly Tell You What to Discard.**
- **SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models.**
- ![GitHub Repo stars - GT-86/SinkLoRA)
- **HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning.**
- **Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.**
- **Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens.**
- **MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression.**
- **Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers.**
- **Neurocache: Efficient Vector Retrieval for Long-range Language Modeling.**
- ![GitHub Repo stars
- **Weighted Grouped Query Attention in Transformers.**
- **Selective Attention Improves Transformer.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding.**
- **Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix.**
- **Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures.**
- **Selective Attention: Enhancing Transformer through Principled Context Control.** - Chowdhury, Jiasi Chen, Samet Oymak.* NeurIPS 2024.
- **FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding.**
- **Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix.**
- **Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures.**
- **SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.** - Hay So, Ting Cao, Fan Yang, Mao Yang.* Arxiv 2024.
- **SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.** - Hay So, Ting Cao, Fan Yang, Mao Yang.* Arxiv 2024.
- ![GitHub Repo stars
- **Selective Attention: Enhancing Transformer through Principled Context Control.** - Chowdhury, Jiasi Chen, Samet Oymak.* NeurIPS 2024.
- ![GitHub Repo stars
- **ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition.**
-
-
5. Length Extrapolation
-
2.4 IO-Aware Attention
- ![GitHub Repo stars - attention)
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation.** - Chung Chi,Ting-Han Fan,Alexander I. Rudnicky.* Arxiv 2023.
- ![GitHub Repo stars - Alignment-Transformer-Length-Extrapolation)
- **CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending.**
- ![GitHub Repo stars - ai/Collinear-Constrained-Attention)
- **Structured Packing in LLM Training Improves Long Context Utilization.**
- **RoFormer: Enhanced Transformer with Rotary Position Embedding.**
- ![GitHub Repo stars
- **Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.**
- **RoFormer: Enhanced Transformer with Rotary Position Embedding.**
- ![GitHub Repo stars
- **Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.**
- ![GitHub Repo stars
- **KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation.** - Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky.* Arxiv 2022.
- ![GitHub Repo stars
- **A Length-Extrapolatable Transformer.**
- **KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation.** - Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky.* Arxiv 2022.
- ![GitHub Repo stars
- **Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis.** - Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge.* ACL 2023.
- **Randomized Positional Encodings Boost Length Generalization of Transformers.** - Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness.* ACL 2023.
- ![GitHub Repo stars - deepmind/randomized_positional_encodings)
- **The Impact of Positional Encoding on Length Generalization in Transformers.**
- ![GitHub Repo stars
- **Randomized Positional Encodings Boost Length Generalization of Transformers.** - Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness.* ACL 2023.
- ![GitHub Repo stars - deepmind/randomized_positional_encodings)
- **The Impact of Positional Encoding on Length Generalization in Transformers.**
- ![GitHub Repo stars - NLP/length-generalization)
- **Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis.** - Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge.* ACL 2023.
- **A Length-Extrapolatable Transformer.**
- **Focused Transformer: Contrastive Training for Context Scaling.**
- ![GitHub Repo stars
- **Extending Context Window of Large Language Models via Positional Interpolation.**
- **Exploring Transformer Extrapolation.**
- ![GitHub Repo stars
- ![GitHub Repo stars - NLP/length-generalization)
- **Focused Transformer: Contrastive Training for Context Scaling.**
- ![GitHub Repo stars
- **Extending Context Window of Large Language Models via Positional Interpolation.**
- **Exploring Transformer Extrapolation.**
- ![GitHub Repo stars
- **LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models.**
- ![GitHub Repo stars - research/LongLoRA)
- **Scaling Laws of RoPE-based Extrapolation.**
- **LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models.**
- ![GitHub Repo stars - Infinite)
- **YaRN: Efficient Context Window Extension of Large Language Models.**
- ![GitHub Repo stars
- **PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training.**
- ![GitHub Repo stars - pku/PoSE)
- **LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models.**
- ![GitHub Repo stars - Infinite)
- **YaRN: Efficient Context Window Extension of Large Language Models.**
- ![GitHub Repo stars
- **LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models.**
- ![GitHub Repo stars - research/LongLoRA)
- **Scaling Laws of RoPE-based Extrapolation.**
- **Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation.** - Chung Chi,Ting-Han Fan,Alexander I. Rudnicky.* Arxiv 2023.
- ![GitHub Repo stars - Alignment-Transformer-Length-Extrapolation)
- **CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending.**
- ![GitHub Repo stars - ai/Collinear-Constrained-Attention)
- **PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training.**
- ![GitHub Repo stars - pku/PoSE)
- **Structured Packing in LLM Training Improves Long Context Utilization.**
- **LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning.** - Yuan Chang, Huiyuan Chen, Xia Hu.* Arxiv 2024.
- **Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.**
- **Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models.**
- **Extending LLMs' Context Window with 100 Samples.**
- ![GitHub Repo stars - NLP/Entropy-ABF)
- **E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.**
- **With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation.**
- ![GitHub Repo stars - LoRA)
- **Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation.**
- ![GitHub Repo stars
- **Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens.**
- ![GitHub Repo stars - gram)
- **LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning.** - Yuan Chang, Huiyuan Chen, Xia Hu.* Arxiv 2024.
- **Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.**
- **Extending LLMs' Context Window with 100 Samples.**
- ![GitHub Repo stars - NLP/Entropy-ABF)
- **E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.**
- **With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation.**
- ![GitHub Repo stars - LoRA)
- **Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation.**
- ![GitHub Repo stars
- **Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens.**
- ![GitHub Repo stars - gram)
- **LongRoPE: Extending LLM ContextWindow Beyond 2 Million Tokens.**
- **Data Engineering for Scaling Language Models to 128K Context.**
- ![GitHub Repo stars - Context-Data-Engineering)
- **Long-Context Language Modeling with Parallel Context Encoding.**
- ![GitHub Repo stars - nlp/CEPE)
- **CLEX: Continuous Length Extrapolation for Large Language Models.**
- **LongRoPE: Extending LLM ContextWindow Beyond 2 Million Tokens.**
- ![GitHub Repo stars - Context-Data-Engineering)
- **Transformers Can Achieve Length Generalization But Not Robustly.**
- ![GitHub Repo stars - nlp/CEPE)
- **CLEX: Continuous Length Extrapolation for Large Language Models.**
- ![GitHub Repo stars - NLP-SG/CLEX)
- **Resonance RoPE: Improving Context Length Generalization of Large Language Models.**
- ![GitHub Repo stars - NLP-SG/CLEX)
- **Resonance RoPE: Improving Context Length Generalization of Large Language Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **Can't Remember Details in Long Documents? You Need Some R&R.**
- **Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.**
- ![GitHub Repo stars - Group/Ms-PoE)
- **InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory.**
- **Naive Bayes-based Context Extension for Large Language Models.**
- ![GitHub Repo stars - master)
- **In-Context Pretraining: Language Modeling Beyond Document Boundaries.** - tau Yih, Mike Lewis.* ICLR 2024 Spotlight.
- ![GitHub Repo stars - context-pretraining)
- **Can't Remember Details in Long Documents? You Need Some R&R.**
- ![GitHub Repo stars - and-r)
- **Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.**
- ![GitHub Repo stars - Group/Ms-PoE)
- **InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory.**
- **Naive Bayes-based Context Extension for Large Language Models.**
- **Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.**
- **In-Context Pretraining: Language Modeling Beyond Document Boundaries.** - tau Yih, Mike Lewis.* ICLR 2024 Spotlight.
- ![GitHub Repo stars - context-pretraining)
- **Effective Long-Context Scaling of Foundation Models.**
- **Fewer Truncations Improve Language Modeling.**
- **Length Generalization of Causal Transformers without Position Encoding.**
- ![GitHub Repo stars
- **Extending Llama-3's Context Ten-Fold Overnight.**
- **Extending Llama-3's Context Ten-Fold Overnight.**
- **Effective Long-Context Scaling of Foundation Models.**
- **Long Context Alignment with Short Instructions and Synthesized Positions.**
- ![GitHub Repo stars
- **xLSTM: Extended Long Short-Term Memory.**
- **Fewer Truncations Improve Language Modeling.**
- **Length Generalization of Causal Transformers without Position Encoding.**
- ![GitHub Repo stars
- **DAPE: Data-Adaptive Positional Encoding for Length Extrapolation.**
- ![GitHub Repo stars - Zheng/DAPE)
- **Contextual Position Encoding: Learning to Count What's Important.**
- **Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model.**
- **Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure.**
- ![GitHub Repo stars - coupling)
- **LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models.**
- **Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement.**
- **3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding.**
- **Mixture of In-Context Experts Enhance LLMs' Long Context Awareness.**
- **xLSTM: Extended Long Short-Term Memory.**
- **DAPE: Data-Adaptive Positional Encoding for Length Extrapolation.**
- ![GitHub Repo stars - Zheng/DAPE)
- **Long Context Alignment with Short Instructions and Synthesized Positions.**
- ![GitHub Repo stars
- **Contextual Position Encoding: Learning to Count What's Important.**
- **Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model.**
- **Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure.**
- ![GitHub Repo stars - coupling)
- **LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models.**
- **Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks.**
- **3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding.**
- ![GitHub Repo stars
- **Human-like Episodic Memory for Infinite Context LLMs.** - Ammar, Jun Wang.* Arxiv 2024.
- **FocusLLM: Scaling LLM's Context by Parallel Decoding.**
- **Human-like Episodic Memory for Infinite Context LLMs.** - Ammar, Jun Wang.* Arxiv 2024.
- **Scaling Granite Code Models to 128K Context.** - Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda.* Arxiv 2024.
- ![GitHub Repo stars - granite/granite-code-models)
- **ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities.**
- **Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly.**
- ![GitHub Repo stars
- **LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models.** - Kiong Ng, Zhiwei Jiang, Bryan Hooi.* Arxiv 2024.
- ![GitHub Repo stars
- **E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning.**
- **Scaling Granite Code Models to 128K Context.** - Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda.* Arxiv 2024.
- ![GitHub Repo stars - granite/granite-code-models)
- **ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities.**
- **Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly.**
- **FocusLLM: Scaling LLM's Context by Parallel Decoding.**
- ![GitHub Repo stars
- **LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models.** - Kiong Ng, Zhiwei Jiang, Bryan Hooi.* Arxiv 2024.
- ![GitHub Repo stars
- **E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning.**
- **Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models.**
- ![GitHub Repo stars - the-Knots)
- **PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead.**
- **Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models.**
- ![GitHub Repo stars - the-Knots)
- **PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead.**
- ![GitHub Repo stars - RAG)
- ![GitHub Repo stars - RAG)
- **Efficient Long-range Language Modeling with Self-supervised Causal Retrieval.**
- **A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts.**
- **Extending Context Window of Large Language Models from a Distributional Perspective.**
- ![GitHub Repo stars
- **How to Train Long-Context Language Models (Effectively).**
- ![GitHub Repo stars - nlp/ProLong)
- **Differential Transformer.**
- **DAPE V2: Process Attention Score as Feature Map for Length Extrapolation.**
- **Why Does the Effective Context Length of LLMs Fall Short?.**
- **LOGO -- Long cOntext aliGnment via efficient preference Optimization.**
- **Efficient Long-range Language Modeling with Self-supervised Causal Retrieval.**
- **A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts.**
- **Extending Context Window of Large Language Models from a Distributional Perspective.**
- ![GitHub Repo stars
- **How to Train Long-Context Language Models (Effectively).**
- ![GitHub Repo stars - nlp/ProLong)
- **Differential Transformer.**
- **DAPE V2: Process Attention Score as Feature Map for Length Extrapolation.**
- **Why Does the Effective Context Length of LLMs Fall Short?.**
- **LOGO -- Long cOntext aliGnment via efficient preference Optimization.**
- ![GitHub Repo stars
- **Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement.**
- ![GitHub Repo stars
- **Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement.**
- **Two are better than one: Context window extension with multi-grained self-injection.**
- **Two are better than one: Context window extension with multi-grained self-injection.**
- ![GitHub Repo stars
- **LongReward: Improving Long-context Large Language Models with AI Feedback.**
- ![GitHub Repo stars
- **HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation.**
- ![GitHub Repo stars - ML/LongPPL)
- **HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation.**
- ![GitHub Repo stars - ML/LongPPL)
- ![GitHub Repo stars
- **Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count.**
- **Large Language Models Can Self-Improve in Long-context Reasoning.**
- ![GitHub Repo stars
- **Circuit Complexity Bounds for RoPE-based Transformer Architecture.**
- **Transformers Can Do Arithmetic with the Right Embeddings.**
- **What is Wrong with Perplexity for Long-context Language Modeling?.**
- **Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models.**
- **Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models.**
-
-
9. Compress
-
2.4 IO-Aware Attention
- **Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs.**
- **LLoCO: Learning Long Contexts Offline.**
- ![GitHub Repo stars
- **In-Context Learning State Vector with Inner and Momentum Optimization.**
- **Learning to Compress Prompt in Natural Language Formats.** - Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu.* Arxiv 2024.
- **Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference.**
- **LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression.** - Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang.* Arxiv 2024.
- **PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models.**
- ![GitHub Repo stars - for-Prompt-Compression)
- **Compressed Context Memory for Online Language Model Interaction.** - Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song.* ICLR 2024.
- ![GitHub Repo stars - mllab/context-memory)
- ![GitHub Repo stars - TMG/ICL-State-Vector)
- **Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation.**
- **Improving Long Text Understanding with Knowledge Distilled from Summarization Model.**
- **Compressing Large Language Models by Streamlining the Unimportant Layer.**
- **PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression.**
- **Training LLMs over Neurally Compressed Text.** - Dickstein, Noah Constant.* Arxiv 2024.
- **Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models.**
- **Adapting LLMs for Efficient Context Processing through Soft Prompt Compression.**
- **OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning.**
- ![GitHub Repo stars - v2)
- **Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization.**
- ![GitHub Repo stars - Pt/UltraGist)
- **XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference.**
- **In-context Autoencoder for Context Compression in a Large Language Model.** - Qing Chen, Furu Wei.* ICLR 2024.
- ![GitHub Repo stars
- **Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs.**
- ![GitHub Repo stars
- **Recurrent Context Compression: Efficiently Expanding the Context Window of LLM.**
- ![GitHub Repo stars - G/RCC_Transformer)
- **LoCoCo: Dropping In Convolutions for Long Context Compression.**
- ![GitHub Repo stars - Group/LoCoCo)
- **Compressing Context to Enhance Inference Efficiency of Large Language Models.**
- **Adapting Language Models to Compress Contexts.**
- **Compressing Context to Enhance Inference Efficiency of Large Language Models.**
- ![GitHub Repo stars
- **LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2023.
- ![GitHub Repo stars
- ![GitHub Repo stars
- **LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2023.
- ![GitHub Repo stars
- **LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2023.
- **System 2 Attention (is something you might need too).**
- ![GitHub Repo stars - COCO)
- **Learning to Compress Prompt in Natural Language Formats.** - Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu.* Arxiv 2024.
- **Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference.**
- **LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression.** - Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang.* Arxiv 2024.
- **DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization.**
- **Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.**
- **Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.**
- **Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression.**
- **PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models.**
- ![GitHub Repo stars - for-Prompt-Compression)
- **Compressed Context Memory for Online Language Model Interaction.** - Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song.* ICLR 2024.
- ![GitHub Repo stars - mllab/context-memory)
- **Compressing Large Language Models by Streamlining the Unimportant Layer.**
- **PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression.**
- **Training LLMs over Neurally Compressed Text.** - Dickstein, Noah Constant.* Arxiv 2024.
- **Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models.**
- **Adapting LLMs for Efficient Context Processing through Soft Prompt Compression.**
- **Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs.**
- **LLoCO: Learning Long Contexts Offline.**
- **Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.**
- **LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2023.
- **System 2 Attention (is something you might need too).**
- **DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization.**
- **Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.**
- **Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression.**
- ![GitHub Repo stars
- **Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models.**
- ![GitHub Repo stars
- **In-context Autoencoder for Context Compression in a Large Language Model.** - Qing Chen, Furu Wei.* ICLR 2024.
- ![GitHub Repo stars
- ![GitHub Repo stars
- **In-Context Learning State Vector with Inner and Momentum Optimization.**
- ![GitHub Repo stars - TMG/ICL-State-Vector)
- **Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation.**
- **Improving Long Text Understanding with Knowledge Distilled from Summarization Model.**
- **OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning.**
- ![GitHub Repo stars - v2)
- **Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization.**
- ![GitHub Repo stars
- **Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models.**
- ![GitHub Repo stars
- **Your Transformer is Secretly Linear.**
- **Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs.**
- ![GitHub Repo stars - Institute/LLM-Microscope)
- **xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token.** - Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao.* Arxiv 2024.
- ![GitHub Repo stars
- **SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself.**
- **Compressing Lengthy Context With UltraGist.**
- **XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference.**
- ![GitHub Repo stars
- **Recurrent Context Compression: Efficiently Expanding the Context Window of LLM.**
- ![GitHub Repo stars - G/RCC_Transformer)
- **LoCoCo: Dropping In Convolutions for Long Context Compression.**
- ![GitHub Repo stars - Group/LoCoCo)
- **Evaluating Zero-Shot Long-Context LLM Compression.**
- **InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models.** - Do, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura.* Arxiv 2024.
- ![GitHub Repo stars
- **AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.** - Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han.* MLSys 2024 Best Paper Award.
- **Your Transformer is Secretly Linear.**
- ![GitHub Repo stars - Institute/LLM-Microscope)
- **xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token.** - Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao.* Arxiv 2024.
- ![GitHub Repo stars
- **SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself.**
- **Compressing Lengthy Context With UltraGist.**
- **In-Context Former: Lightning-fast Compressing Context for Large Language Model.**
- **UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs.**
- ![GitHub Repo stars - xmu/UIO-LLMs)
- **AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.** - Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han.* MLSys 2024 Best Paper Award.
- **In-Context Former: Lightning-fast Compressing Context for Large Language Model.**
- **UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs.**
- ![GitHub Repo stars - xmu/UIO-LLMs)
- **PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning.**
- **Concise and Precise Context Compression for Tool-Using Language Models.**
- **Context Embeddings for Efficient Answer Generation in RAG.**
- **Evaluating Zero-Shot Long-Context LLM Compression.**
- **InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models.** - Do, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura.* Arxiv 2024.
- ![GitHub Repo stars
- **Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models.**
- **QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression.**
- ![GitHub Repo stars
- **SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models.**
- **QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention.**
- **AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models.**
- **Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference.**
- ![GitHub Repo stars
- **Context Embeddings for Efficient Answer Generation in RAG.**
- **Characterizing Prompt Compression Methods for Long Context Inference.**
- **QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression.**
- ![GitHub Repo stars
- **PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning.**
- **Concise and Precise Context Compression for Tool-Using Language Models.**
- **SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models.**
- **QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention.**
- **AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models.**
- **Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference.**
- ![GitHub Repo stars
- **Familiarity-aware Evidence Compression for Retrieval Augmented Generation.**
- ![GitHub Repo stars - group/FaviComp)
- **Familiarity-aware Evidence Compression for Retrieval Augmented Generation.**
- ![GitHub Repo stars - group/FaviComp)
- **TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning.**
- **Parse Trees Guided LLM Prompt Compression.**
- **FineZip: Pushing the Limits of Large Language Models for Practical Lossless Text Compression.**
- ![GitHub Repo stars
- **Perception Compressor:A training-free prompt compression method in long context scenarios.** - Tao Zheng.* Arxiv 2024.
- **TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning.**
- **Parse Trees Guided LLM Prompt Compression.**
- **FineZip: Pushing the Limits of Large Language Models for Practical Lossless Text Compression.**
- ![GitHub Repo stars
- **Perception Compressor:A training-free prompt compression method in long context scenarios.** - Tao Zheng.* Arxiv 2024.
- **From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression.**
- **Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability.** - Yan Yeung.* EMNLP 2024.
- **From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression.**
- **Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability.** - Yan Yeung.* EMNLP 2024.
- **Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles.**
- ![GitHub Repo stars - nlp/AutoCompressors)
-
-
3. Recurrent Transformers
-
2.4 IO-Aware Attention
- ![GitHub Repo stars
- ![GitHub Repo stars - infctx-trainer)
- **Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.**
- **Transformer-XL: Attentive language models beyond a fixed-length context.**
- ![GitHub Repo stars - xl)
- **Transformer-XL: Attentive language models beyond a fixed-length context.**
- ![GitHub Repo stars - xl)
- **Compressive Transformers for Long-Range Sequence Modelling.**
- ![GitHub Repo stars - transformer-pytorch)
- **Memformer: The memory-augmented transformer.**
- ![GitHub Repo stars
- **Memformer: The memory-augmented transformer.**
- ![GitHub Repo stars
- **Compressive Transformers for Long-Range Sequence Modelling.**
- ![GitHub Repo stars - transformer-pytorch)
- **ERNIE-Doc: A Retrospective Long-Document Modeling Transformer.** - IJCNLP 2021.
- **Memorizing Transformers.**
- ![GitHub Repo stars - transformers-pytorch)
- **Recurrent Attention Networks for Long-text Modeling.**
- ![GitHub Repo stars
- **RWKV: Reinventing RNNs for the Transformer Era.** - Jie Zhu.* Arxiv 2023.
- ![GitHub Repo stars - LM)
- **RWKV: Reinventing RNNs for the Transformer Era.** - Jie Zhu.* Arxiv 2023.
- ![GitHub Repo stars - LM)
- ![GitHub Repo stars
- ![GitHub Repo stars - transformers-pytorch)
- **Recurrent Attention Networks for Long-text Modeling.**
- **Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model.**
- **Scaling Transformer to 1M tokens and beyond with RMT.**
- **Block-Recurrent Transformers.**
- ![GitHub Repo stars - recurrent-transformer-pytorch)
- ![GitHub Repo stars
- **Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model.**
- **Scaling Transformer to 1M tokens and beyond with RMT.**
- **Block-Recurrent Transformers.**
- ![GitHub Repo stars - recurrent-transformer-pytorch)
- **TRAMS: Training-free Memory Selection for Long-range Language Modeling.**
- ![GitHub Repo stars
- **Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models.** - Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre.* Arxiv 2024.
- **TRAMS: Training-free Memory Selection for Long-range Language Modeling.**
- ![GitHub Repo stars
- **Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models.** - Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre.* Arxiv 2024.
- **Extensible Embedding: A Flexible Multipler For LLM's Context Length.**
- **Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence.** - Jie Zhu.* Arxiv 2024.
- **Extensible Embedding: A Flexible Multipler For LLM's Context Length.**
- **Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence.** - Jie Zhu.* Arxiv 2024.
- ![GitHub Repo stars - LM)
- ![GitHub Repo stars
- ![GitHub Repo stars - infctx-trainer)
- **Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.**
- **Linearizing Large Language Models.**
- **RecurrentGemma: Moving Past Transformers for Efficient Open Language Models.** - Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas.* Arxiv 2024.
- ![GitHub Repo stars - ML/linear_open_lm)
- **VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models.**
- ![GitHub Repo stars - hou/VisualRWKV)
- **Just read twice: closing the recall gap for recurrent language models.**
- ![GitHub Repo stars - linear-attention)
- **Linearizing Large Language Models.**
- **VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models.**
- ![GitHub Repo stars - ML/linear_open_lm)
- ![GitHub Repo stars - hou/VisualRWKV)
- **Just read twice: closing the recall gap for recurrent language models.**
- ![GitHub Repo stars - linear-attention)
- ![GitHub Repo stars - recurrent-memory-transformer)
- **GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression.**
- **Associative Recurrent Memory Transformer.**
- ![GitHub Repo stars - recurrent-memory-transformer)
- ![GitHub Repo stars - paper)
- ![GitHub Repo stars - paper)
- **Analysis of Argument Structure Constructions in a Deep Recurrent Language Model.**
-
-
4. State Space Models
-
2.4 IO-Aware Attention
- **Mamba: Linear-Time Sequence Modeling with Selective State Spaces.**
- **Mamba: Linear-Time Sequence Modeling with Selective State Spaces.**
- ![GitHub Repo stars - spaces/mamba)
- **MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts.**
- ![GitHub Repo stars - spaces/mamba)
- **MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts.**
- **MambaByte: Token-free Selective State Space Model.**
- **LOCOST: State-Space Models for Long Document Abstractive Summarization.**
- **State Space Models as Foundation Models: A Control Theoretic Overview.**
- **Jamba: A Hybrid Transformer-Mamba Language Model.** - Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham.* Arxiv 2024.
- **Robustifying State-space Models for Long Sequences via Approximate Diagonalization.**
- **MambaByte: Token-free Selective State Space Model.**
- **LOCOST: State-Space Models for Long Document Abstractive Summarization.**
- **State Space Models as Foundation Models: A Control Theoretic Overview.**
- **Jamba: A Hybrid Transformer-Mamba Language Model.** - Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham.* Arxiv 2024.
- **Robustifying State-space Models for Long Sequences via Approximate Diagonalization.**
- **Zamba: A Compact 7B SSM Hybrid Model.**
- **Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality.**
- **Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling.**
- **Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality.**
- **B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory.**
- **MambaForGCN: Enhancing Long-Range Dependency with State Space Model and Kolmogorov-Arnold Networks for Aspect-Based Sentiment Analysis.**
- **Discrete Diffusion Language Model for Long Text Summarization.**
- **ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2.**
- **Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling.**
- ![GitHub Repo stars
- **Jamba-1.5: Hybrid Transformer-Mamba Models at Scale.**
- **SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models.**
- **ReMamba: Equip Mamba with Effective Long-Sequence Modeling.**
- **Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling.**
- ![GitHub Repo stars - mamba)
- **Taipan: Efficient and Expressive State Space Language Models with Selective Attention.**
- **Rethinking Token Reduction for State Space Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars - lab/attamba)
- ![GitHub Repo stars
- ![GitHub Repo stars - LM/tree/ssm/examples/mamba)
- **B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory.**
- **MambaForGCN: Enhancing Long-Range Dependency with State Space Model and Kolmogorov-Arnold Networks for Aspect-Based Sentiment Analysis.**
- **Discrete Diffusion Language Model for Long Text Summarization.**
- **ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2.**
- ![GitHub Repo stars - Mamba)
- **SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models.**
- **Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling.**
- ![GitHub Repo stars - mamba)
- **Taipan: Efficient and Expressive State Space Language Models with Selective Attention.**
- **Rethinking Token Reduction for State Space Models.**
- ![GitHub Repo stars
- ![GitHub Repo stars - lab/attamba)
- **Attamba: Attending To Multi-Token States.**
- **Gated Delta Networks: Improving Mamba2 with Delta Rule.**
- ![GitHub Repo stars
-
-
7. RAG and ICL
-
2.4 IO-Aware Attention
- **Feature-Adaptive and Data-Scalable In-Context Learning.**
- ![GitHub Repo stars - ICL)
- **KG-RAG: Bridging the Gap Between Knowledge and Creativity.**
- ![GitHub Repo stars - RAG)
- **HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models.**
- ![GitHub Repo stars - NLP-Group/HippoRAG)
- **Implicit In-context Learning.**
- ![GitHub Repo stars
- **Are Long-LLMs A Necessity For Long-Context Tasks?.**
- **Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading.**
- **Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing.**
- **BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.**
- **Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity.**
- ![GitHub Repo stars - RAG)
- **RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.** - Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu.* Arxiv 2024.
- ![GitHub Repo stars - RAG)
- **Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts.**
- **Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation.**
- **Multi-view Content-aware Indexing for Long Document Retrieval.**
- **Retrieval Head Mechanistically Explains Long-Context Factuality.**
- ![GitHub Repo stars
- **Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading.**
- **Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing.**
- **BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.**
- **Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity.**
- ![GitHub Repo stars - RAG)
- **RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.** - Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu.* Arxiv 2024.
- ![GitHub Repo stars - RAG)
- **Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts.**
- **Retrieval Head Mechanistically Explains Long-Context Factuality.**
- ![GitHub Repo stars
- **FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference.**
- **Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation.**
- **Multi-view Content-aware Indexing for Long Document Retrieval.**
- **Feature-Adaptive and Data-Scalable In-Context Learning.**
- ![GitHub Repo stars - ICL)
- **KG-RAG: Bridging the Gap Between Knowledge and Creativity.**
- ![GitHub Repo stars - RAG)
- **HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models.**
- ![GitHub Repo stars - NLP-Group/HippoRAG)
- **Implicit In-context Learning.**
- ![GitHub Repo stars
- **MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery.**
- **You Only Use Reactive Attention Slice For Long Context Retrieval.**
- ![GitHub Repo stars
- **SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval.**
- **Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.**
- **Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding.**
- **ALR2: A Retrieve-then-Reason Framework for Long-context Question Answering.**
- **Inference Scaling for Long-Context Retrieval Augmented Generation.**
- **GARLIC: LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph for Long Document QA.**
- **Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG.**
- **Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models.**
- **SEGMENT+: Long Text Processing with Short-Context Language Models.**
- ![GitHub Repo stars - 9/segmentplus)
- **Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs.**
- ![GitHub Repo stars - uiuc/GoR)
- **Are Long-LLMs A Necessity For Long-Context Tasks?.**
- **In Defense of RAG in the Era of Long-Context Language Models.**
- **ChuLo: Chunk-Level Key Information Representation for Long Document Processing.**
- **TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text.**
- **LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models.**
- ![GitHub Repo stars
- **Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism.**
- **LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering.**
- ![GitHub Repo stars
- **Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection.** - Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen.* Arxiv 2024.
- **Is In-Context Learning Sufficient for Instruction Following in LLMs?.**
- ![GitHub Repo stars - epfl/icl-alignment)
- **FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models.**
- **Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.**
- ![GitHub Repo stars
- **Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions.**
- **Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding.**
- **Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection.** - Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen.* Arxiv 2024.
- **Is In-Context Learning Sufficient for Instruction Following in LLMs?.**
- **FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models.**
- ![GitHub Repo stars - epfl/icl-alignment)
- **Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.**
- ![GitHub Repo stars
- **Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions.**
- **Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding.**
- **FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering.**
- **FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering.**
- **Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations.**
- **LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs.**
- ![GitHub Repo stars - AI-Lab/LongRAG)
- **Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning.**
- **From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data.**
- **Memory3: Language Modeling with Explicit Memory.**
- **Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting.** - Yu Lee, Tomas Pfister.* Arxiv 2024.
- **Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach.**
- **R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation.**
- ![GitHub Repo stars
- **Making Long-Context Language Models Better Multi-Hop Reasoners.**
- ![GitHub Repo stars - Lab/LongContextReasoner)
- **Large Language Models Know What Makes Exemplary Contexts.**
- ![GitHub Repo stars - ICL)
- **RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.**
- ![GitHub Repo stars - science/RAGChecker)
- **Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations.**
- **Making Long-Context Language Models Better Multi-Hop Reasoners.**
- ![GitHub Repo stars - Lab/LongContextReasoner)
- **Large Language Models Know What Makes Exemplary Contexts.**
- **LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs.**
- ![GitHub Repo stars - AI-Lab/LongRAG)
- **Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning.**
- **From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data.**
- ![GitHub Repo stars - ICL)
- **RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.**
- **Memory3: Language Modeling with Explicit Memory.**
- **Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting.** - Yu Lee, Tomas Pfister.* Arxiv 2024.
- **Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach.**
- ![GitHub Repo stars
- ![GitHub Repo stars - science/RAGChecker)
- **Writing in the Margins: Better Inference Pattern for Long Context Retrieval.**
- ![GitHub Repo stars - in-the-margins)
- ![GitHub Repo stars
- **In Defense of RAG in the Era of Long-Context Language Models.**
- **MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery.**
- **You Only Use Reactive Attention Slice For Long Context Retrieval.**
- ![GitHub Repo stars
- **SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval.**
- **Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.**
- **Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding.**
- **ALR2: A Retrieve-then-Reason Framework for Long-context Question Answering.**
- **Inference Scaling for Long-Context Retrieval Augmented Generation.**
- **GARLIC: LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph for Long Document QA.**
- **Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG.**
- **MemLong: Memory-Augmented Retrieval for Long Text Modeling.**
- **ChuLo: Chunk-Level Key Information Representation for Long Document Processing.**
- **TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text.**
- **LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models.**
- ![GitHub Repo stars
- **Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism.**
- **Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models.**
- **SEGMENT+: Long Text Processing with Short-Context Language Models.**
- ![GitHub Repo stars - 9/segmentplus)
- **Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs.**
- ![GitHub Repo stars - uiuc/GoR)
- **LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering.**
- ![GitHub Repo stars
- **Reducing Distraction in Long-Context Language Models by Focused Learning.**
-
-
6. Long Term Memory
-
2.4 IO-Aware Attention
- **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System.**
- ![GitHub Repo stars
- **MemoryBank: Enhancing Large Language Models with Long-Term Memory.**
- ![GitHub Repo stars - SiliconFriend)
- **Improve Long-term Memory Learning Through Rescaling the Error Temporally.**
- **Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models.**
- **Empowering Working Memory for Large Language Model Agents.**
- **Evolving Large Language Model Assistant with Long-Term Conditional Memory.**
- **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System.**
- ![GitHub Repo stars
- **MemoryBank: Enhancing Large Language Models with Long-Term Memory.**
- ![GitHub Repo stars - SiliconFriend)
- **Improve Long-term Memory Learning Through Rescaling the Error Temporally.**
- **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts.** - Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer.* Arxiv 2024.
- **Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models.**
- **Empowering Working Memory for Large Language Model Agents.**
- **Evolving Large Language Model Assistant with Long-Term Conditional Memory.**
- **Steering Conversational Large Language Models for Long Emotional Support Conversations.**
- **SPAR: Personalized Content-Based Recommendation via Long Engagement Attention.** - Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long.* Arxiv 2024.
- **Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations.**
- ![GitHub Repo stars
- **Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization.**
- ![GitHub Repo stars
- **HMT: Hierarchical Memory Transformer for Long Context Language Processing.**
- ![GitHub Repo stars - pytorch)
- **SirLLM: Streaming Infinite Retentive LLM.**
- ![GitHub Repo stars
- **Toward Conversational Agents with Context and Time Sensitive Long-term Memory.**
- ![GitHub Repo stars
- **Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue.** - Ling Mao, Wenfeng Xie, Dangyang Chen.* Arxiv 2024.
- **Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation.**
- **Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement.** - iunn Ong, Seoyeon Kim, Dongha Lee, Jinyoung Yeo.* Arxiv 2024.
- **HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.**
- ![GitHub Repo stars
- **CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs.**
- **Steering Conversational Large Language Models for Long Emotional Support Conversations.**
- **SPAR: Personalized Content-Based Recommendation via Long Engagement Attention.** - Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long.* Arxiv 2024.
- **Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations.**
- **StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses.** - Nan Li, Quan Tu, Cunli Mao, Zhengtao Yu, Ji-Rong Wen, Rui Yan.* Arxiv 2024.
- **Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization.**
- ![GitHub Repo stars
- **HMT: Hierarchical Memory Transformer for Long Context Language Processing.**
- ![GitHub Repo stars - pytorch)
- **SirLLM: Streaming Infinite Retentive LLM.**
- ![GitHub Repo stars
- **Toward Conversational Agents with Context and Time Sensitive Long-term Memory.**
- **Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement.** - iunn Ong, Seoyeon Kim, Dongha Lee, Jinyoung Yeo.* Arxiv 2024.
- **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts.** - Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer.* Arxiv 2024.
- ![GitHub Repo stars
- **Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue.** - Ling Mao, Wenfeng Xie, Dangyang Chen.* Arxiv 2024.
- **Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation.**
- **HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.**
- ![GitHub Repo stars
- **CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs.**
-
-
12. Long Text Generation
-
11.2 MLLM
- ![GitHub Repo stars
- **LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs.**
- **Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key.**
- ![GitHub Repo stars
- **Integrating Planning into Single-Turn Long-Form Text Generation.**
- **Large Language Models Still Exhibit Bias in Long Text.**
- **LoGU: Long-form Generation with Uncertainty Expressions.**
- ![GitHub Repo stars
- ![GitHub Repo stars - Lengthen)
- ![GitHub Repo stars
- ![GitHub Repo stars - Lengthen)
- **Integrating Planning into Single-Turn Long-Form Text Generation.**
- **LongGenBench: Long-context Generation Benchmark.**
- **LoGU: Long-form Generation with Uncertainty Expressions.**
- **Large Language Models Still Exhibit Bias in Long Text.**
- **Language Models can Self-Lengthen to Generate Long Texts.**
- **Suri: Multi-constraint Instruction Following for Long-form Text Generation.**
-
-
8. Agent
-
2.4 IO-Aware Attention
- **A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis.**
- **PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents.**
- **LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration.**
- **A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis.**
- **PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents.**
- ![GitHub Repo stars
- **AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents.**
- ![GitHub Repo stars - Austin-RPL/amago)
- **Chain of Agents: Large Language Models Collaborating on Long-Context Tasks.**
- **GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models.**
- **Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks.**
- **Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks.**
- ![GitHub Repo stars - VL/Optimus-1)
- **Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks.**
- ![GitHub Repo stars
- **AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents.**
- ![GitHub Repo stars - Austin-RPL/amago)
- **Chain of Agents: Large Language Models Collaborating on Long-Context Tasks.**
- **GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models.**
- **Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks.**
- ![GitHub Repo stars - VL/Optimus-1)
-
-
10. Long Video and Image
-
2.4 IO-Aware Attention
- ![GitHub Repo stars
- ![GitHub Repo stars - PLUG/mPLUG-Owl)
- **LongVILA: Scaling Long-Context Visual Language Models for Long Videos.**
- **DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework.**
- **Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding.** - Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai.* ECCV 2024 Workshop.
- **EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture.**
- ![GitHub Repo stars - apps/EasyAnimate)
- **VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.** - Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal.* Arxiv 2024.
- **PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization.**
- **Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies.** - Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu.* Arxiv 2024.
- ![GitHub Repo stars
- **Towards Event-oriented Long Video Understanding.** - Rong Wen.* Arxiv 2024.
- ![GitHub Repo stars - Bench)
- **An End-to-End Speech Summarization Using Large Language Model.**
- **OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding.**
- **MATE: Meet At The Embedding -- Connecting Images with Long Texts.**
- **mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.**
- ![GitHub Repo stars - PLUG/mPLUG-Owl)
- **EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture.**
- **Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies.** - Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu.* Arxiv 2024.
- ![GitHub Repo stars
- ![GitHub Repo stars - apps/EasyAnimate)
- **VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.** - Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal.* Arxiv 2024.
- **PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization.**
- **Towards Event-oriented Long Video Understanding.** - Rong Wen.* Arxiv 2024.
- ![GitHub Repo stars - Bench)
- **An End-to-End Speech Summarization Using Large Language Model.**
- **KeyVideoLLM: Towards Large-scale Video Keyframe Selection.**
- **OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding.**
- **MATE: Meet At The Embedding -- Connecting Images with Long Texts.**
- **Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models.**
- **SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation.** - Wei Chang, Lingjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang.* Arxiv 2024.
- ![GitHub Repo stars - vgen/slowfast-vgen)
- **LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation.**
- ![GitHub Repo stars
- **ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos.**
- ![GitHub Repo stars
- ![GitHub Repo stars
- **mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.**
- **DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework.**
- **Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding.** - Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai.* ECCV 2024 Workshop.
- ![GitHub Repo stars
- **LongVILA: Scaling Long-Context Visual Language Models for Long Videos.**
- ![GitHub Repo stars
- **VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges.**
- ![GitHub Repo stars - nlco/VideoLLaMB)
- **Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.**
- **LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture.**
- ![GitHub Repo stars
- **VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges.**
- **Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.**
- **LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture.**
- ![GitHub Repo stars
- **VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models.**
- **VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models.**
- **Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models.**
- **SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation.** - Wei Chang, Lingjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang.* Arxiv 2024.
- ![GitHub Repo stars - vgen/slowfast-vgen)
- ![GitHub Repo stars
- ![GitHub Repo stars
- ![GitHub Repo stars
- **T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs.**
-
-
13. Blogs
-
11.2 MLLM
- **Extending Context is Hard…but not Impossible†.**
- **NTK-Aware Scaled RoPE.**
- **The Secret Sauce behind 100K context window in LLMs: all tricks in one place.**
- **Transformer升级之路:7、长度外推性与局部注意力.**
- **Transformer升级之路:9、一种全局长度外推的新思路.**
- **Transformer升级之路:12、无限外推的ReRoPE.**
- **Transformer升级之路:14、当HWFA遇见ReRoPE.**
- **Transformer升级之路:15、Key归一化助力长度外推.**
- **Transformer升级之路:16、“复盘”长度外推技术.**
- **缓存与效果的极限拉扯:从MHA、MQA、GQA到MLA.**
- **Towards 100x Speedup: Full Stack Transformer Inference Optimization.**
- **2024.5 A Side-by-Side Comparison of the Long Context of Various LLMs (128k articles).**
- ![GitHub Repo stars
- **2024.5 A Side-by-Side Comparison of the Long Context of Various LLMs (32k articles).**
- **Transformer升级之路:18、RoPE的底数设计原则.**
- **Generalizing an LLM from 8k to 1M Context using Qwen-Agent.**
- **FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.**
- **Extending Context is Hard…but not Impossible†.**
- **NTK-Aware Scaled RoPE.**
- **The Secret Sauce behind 100K context window in LLMs: all tricks in one place.**
- **Transformer升级之路:7、长度外推性与局部注意力.**
- **Transformer升级之路:9、一种全局长度外推的新思路.**
- **2024.5 A Side-by-Side Comparison of the Long Context of Various LLMs (128k articles).**
- ![GitHub Repo stars
- **Transformer升级之路:12、无限外推的ReRoPE.**
- **Transformer升级之路:14、当HWFA遇见ReRoPE.**
- **Transformer升级之路:15、Key归一化助力长度外推.**
- **Transformer升级之路:16、“复盘”长度外推技术.**
- **Introducing RAG 2.0.**
- **How Do Language Models put Attention Weights over Long Context?.**
- **An open-source and open-access RAG platform.**
- **Many-shot Jailbreaking.**
- **Full Stack Transformer Inference Optimization Season 2: Deploying Long-Context Models.**
- **缓存与效果的极限拉扯:从MHA、MQA、GQA到MLA.**
- **2024.5 A Side-by-Side Comparison of the Long Context of Various LLMs (32k articles).**
- **Transformer升级之路:18、RoPE的底数设计原则.**
- **Generalizing an LLM from 8k to 1M Context using Qwen-Agent.**
- **FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.**
- **Introducing RAG 2.0.**
- **How Do Language Models put Attention Weights over Long Context?.**
- **An open-source and open-access RAG platform.**
- **Many-shot Jailbreaking.**
- **Full Stack Transformer Inference Optimization Season 2: Deploying Long-Context Models.**
- **The Secret Sauce behind 100K context window in LLMs: all tricks in one place.**
-
-
1. Survey Papers
- **Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding.**
- **Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art.**
- **Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey.**
- ![GitHub Repo stars - llms-learning)
- **Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding.**
- **The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey.**
- **Efficient Transformers: A Survey.**
- **A Survey on Long Text Modeling with Transformers.**
- **Efficient Transformers: A Survey.**
- **A Survey on Long Text Modeling with Transformers.**
- **Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art.**
- **Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey.**
- ![GitHub Repo stars - llms-learning)
- ![GitHub Repo stars - AHU/Mamba_State_Space_Model_Paper_List)
- **A Survey on Efficient Inference for Large Language Models.** - Ping Zhang, Yuhan Dong, Yu Wang.* Arxiv 2024.
- **The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey.**
- **State Space Model for New-Generation Network Alternative to Transformers: A Survey.**
- **A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models.** - Seng Chua, Qing Li.* Arxiv 2024.
- **Evaluation of Retrieval-Augmented Generation: A Survey.**
- ![GitHub Repo stars - RAG-Evaluation)
- **The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.**
- **Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption.**
- **State Space Model for New-Generation Network Alternative to Transformers: A Survey.**
- ![GitHub Repo stars - AHU/Mamba_State_Space_Model_Paper_List)
- **A Survey on Efficient Inference for Large Language Models.** - Ping Zhang, Yuhan Dong, Yu Wang.* Arxiv 2024.
- **A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models.** - Seng Chua, Qing Li.* Arxiv 2024.
- **Evaluation of Retrieval-Augmented Generation: A Survey.**
- ![GitHub Repo stars - RAG-Evaluation)
- **The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.**
- **Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption.**
- ![GitHub Repo stars - charlie/Awesome-KV-Cache)
- **Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey.**
- ![GitHub Repo stars - Compression)
- **Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely.**
- ![GitHub Repo stars - charlie/Awesome-KV-Cache)
- **Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey.**
- **Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely.**
- **Prompt Compression for Large Language Models: A Survey.**
- **Prompt Compression for Large Language Models: A Survey.**
-
Month Papers
- IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark
- Star Attention: Efficient LLM Inference over Long Sequences
- Squeezed Attention: Accelerating Long Context Length LLM Inference
- Large Language Models Can Self-Improve in Long-context Reasoning
- Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
- IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark
- Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation
- LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios
- LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
- M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
- Recycled Attention: Efficient inference for long-context language models
- Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
- What is Wrong with Perplexity for Long-context Language Modeling?
- Language Models can Self-Lengthen to Generate Long Texts
- Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
-
Week Papers
- AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
- Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
- LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
- Transformers Can Do Arithmetic with the Right Embeddings
- Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
- DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities
- T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
- Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
- Star Attention: Efficient LLM Inference over Long Sequences
- Attamba: Attending To Multi-Token States
- ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
- A Benchmark for Long-Form Medical Question Answering
- When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Programming Languages
Categories
2. Efficient Attention
418
11. Benchmark and Evaluation
410
5. Length Extrapolation
228
9. Compress
159
7. RAG and ICL
143
3. Recurrent Transformers
70
10. Long Video and Image
62
6. Long Term Memory
54
4. State Space Models
52
13. Blogs
44
1. Survey Papers
39
8. Agent
21
12. Long Text Generation
17
Month Papers
16
Week Papers
15