Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
https://github.com/Xnhyacinth/Awesome-LLM-Long-Context-Modeling

Last synced: 5 days ago
JSON representation

2. Efficient Attention
- 2.1 Sparse Attention
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - research/bigbird)
  - **Generating Long Sequences with Sparse Transformers.**
  - **Blockwise selfattention for long document understanding.** - tau Yih, Sinong Wang, Jie Tang.* EMNLP 2020.
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **ETC: Encoding Long and Structured Inputs in Transformers.**
  - **Big Bird: Transformers for Longer Sequences.**
  - ![GitHub Repo stars - research/bigbird)
  - **Reformer: The efficient transformer.**
  - ![GitHub Repo stars - pytorch)
  - ![GitHub Repo stars - transformer)
  - ![GitHub Repo stars - pytorch)
  - **Sparse Sinkhorn Attention.** - Cheng Juan.* ICML 2020.
  - **Sparse and continuous attention mechanisms.**
  - ![GitHub Repo stars - research/longt5)
  - ![GitHub Repo stars - transformer)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Context-Windows)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - attention)
  - ![GitHub Repo stars - transformer)
  - ![GitHub Repo stars - research/longt5)
  - **Efficient Long-Text Understanding with Short-Text Models.**
  - **Parallel Context Windows for Large Language Models.** - Brown, Yoav Shoham.* ACL 2023.
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Context-Windows)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - attention)
  - ![GitHub Repo stars
  - **LONGNET: Scaling Transformers to 1,000,000,000 Tokens.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - pytorch)
  - **Adapting Language Models to Compress Contexts.**
  - ![GitHub Repo stars - nlp/AutoCompressors)
  - **Blockwise Parallel Transformer for Long Context Large Models.**
  - ![GitHub Repo stars
  - **MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers.**
  - ![GitHub Repo stars - pytorch)
  - ![GitHub Repo stars - transition)
  - ![GitHub Repo stars - han-lab/streaming-llm)
  - ![GitHub Repo stars - Transformer)
  - **Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers.**
  - **Long-range Language Modeling with Self-retrieval.**
  - **Max-Margin Token Selection in Attention Mechanism.**
  - **Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers.**
  - **Sparse Token Transformer with Attention Back Tracking.**
  - ![GitHub Repo stars - transition)
  - ![GitHub Repo stars - han-lab/streaming-llm)
  - ![GitHub Repo stars - Transformer)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **LongHeads: Multi-Head Attention is Secretly a Long Context Processor.**
  - **Training-Free Long-Context Scaling of Large Language Models.**
  - ![GitHub Repo stars - GT-86/SinkLoRA)
  - ![GitHub Repo stars
  - **Sequence can Secretly Tell You What to Discard.**
  - ![GitHub Repo stars - GT-86/SinkLoRA)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding.**
  - **Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix.**
  - **Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures.**
  - **SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.** - Hay So, Ting Cao, Fan Yang, Mao Yang.* Arxiv 2024.
  - ![GitHub Repo stars
  - **Selective Attention: Enhancing Transformer through Principled Context Control.** - Chowdhury, Jiasi Chen, Samet Oymak.* NeurIPS 2024.
  - ![GitHub Repo stars
  - **MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression.**
  - ![GitHub Repo stars - theta-attention)
  - ![GitHub Repo stars - optimal-gqa)
  - ![GitHub Repo stars
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars - lab/TokenButler)
  - ![GitHub Repo stars - ai/transformer-tricks)
  - ![GitHub Repo stars - project/vllm)
  - ![GitHub Repo stars - AILab/flash-attention)
  - ![GitHub Repo stars - project/vllm)
  - ![GitHub Repo stars
  - **TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **Efficient LLM Inference with Kcache.**
  - **You Only Cache Once: Decoder-Decoder Architectures for Language Models.**
  - ![GitHub Repo stars
  - **Fast Transformer Decoding: One Write-Head is All You Need.**
  - **Layer-Condensed KV Cache for Efficient Inference of Large Language Models.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.**
  - ![GitHub Repo stars
  - **Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression.** - Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen.* Arxiv 2024.
  - **MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.**
  - ![GitHub Repo stars - han-lab/Quest)
  - ![GitHub Repo stars - M)
  - ![GitHub Repo stars - research/Q-LLM)
  - ![GitHub Repo stars
  - **Effectively Compress KV Heads for LLM.**
  - ![GitHub Repo stars - han-lab/Quest)
  - ![GitHub Repo stars - M)
  - ![GitHub Repo stars - research/Q-LLM)
  - ![GitHub Repo stars
  - **Beyond KV Caching: Shared Attention for Efficient LLMs.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Notation/A2SF)
  - ![GitHub Repo stars - NACL)
  - ![GitHub Repo stars - Notation/A2SF)
  - ![GitHub Repo stars - NACL)
  - **Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters.**
  - ![GitHub Repo stars - AI-Lab/MagicDec/)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - kvcompress)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - AI-Lab/MagicDec/)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - kvcompress)
  - ![GitHub Repo stars
  - **UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference.**
  - **LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - han-lab/duo-attention)
  - ![GitHub Repo stars - sg/SimLayerKV)
  - ![GitHub Repo stars - AI-Lab/MagicPIG)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - llm)
  - ![GitHub Repo stars - han-lab/duo-attention)
  - **In-context KV-Cache Eviction for LLMs via Attention-Gate.**
  - ![GitHub Repo stars - sg/SimLayerKV)
  - ![GitHub Repo stars - AI-Lab/MagicPIG)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - llm)
  - ![GitHub Repo stars - attention)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - attention)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Attention)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Lab/AIM)
  - ![GitHub Repo stars - Attention)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Lab/AIM)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - decoding)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - distill)
  - ![GitHub Repo stars - Lab/ZeroMerge)
  - **Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - spin/adasplash)
  - ![GitHub Repo stars - han-lab/omniserve)
  - ![GitHub Repo stars - Ushio/MHA2MLA)
  - **ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression.**
  - ![GitHub Repo stars
  - **ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition.**
  - **SnapKV: LLM Knows What You are Looking for Before Generation.**
  - **Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification.**
  - **Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity.**
  - **AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ai/SCOPE)
  - **Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs.**
  - ![GitHub Repo stars - NLP-Chang/KVLink)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - MLSys-Lab/MEDA)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
- 2.2 Linear Attention
  - ![GitHub Repo stars
  - **Softmax Attention with Constant Cost per Token.**
  - ![GitHub Repo stars
  - **Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.**
  - ![GitHub Repo stars
  - **Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **Gated Slot Attention for Efficient Linear-Time Sequence Modeling.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - transformers)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - attention-transformer)
  - **Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.**
  - ![GitHub Repo stars - transformers)
  - ![GitHub Repo stars - pytorch)
  - ![GitHub Repo stars - ARK/RFA)
  - ![GitHub Repo stars - ARK/RFA)
  - ![GitHub Repo stars - transformer)
  - ![GitHub Repo stars
  - **Luna: Linear unified nested attention.**
  - ![GitHub Repo stars - transformer)
  - **Fnet: Mixing tokens with fourier transforms.** - Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.* Arxiv 2021.
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **Gated Linear Attention Transformers with Hardware-Efficient Training.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
- 2.3 Hierarchical Attention
📜 Papers
- 2. Efficient Attention
  - **HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading.**
  - **Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression.**
  - **KVCrush: Key value cache size-reduction using similarity in head-behaviour.**
  - **Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA.** - ai/transformer-tricks)](https://github.com/OpenMachine-ai/transformer-tricks)
  - **Softmax Attention with Constant Cost per Token.**
  - **Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.**
  - **Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective.**
  - **Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention.**
  - **Attention as an RNN.**
  - **You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet.**
  - **Learning to (Learn at Test Time): RNNs with Expressive Hidden States.** - time-training/ttt-lm-pytorch](https://github.com/test-time-training/ttt-lm-pytorch)
  - **Gated Slot Attention for Efficient Linear-Time Sequence Modeling.** - linear-attention](https://github.com/sustcsonglin/flash-linear-attention)
  - ![GitHub Repo stars
  - **Neural Legal Judgment Prediction in English.**
  - **Hi-transformer: Hierarchical interactive transformer for efficient and effective long document modeling.** - IJCNLP 2021
  - **Erniesparse: Learning hierarchical efficient transformer through regularized self-attention.**
  - **Self-attention Does Not Need O(n^2) Memory.**
  - **Faster Causal Attention Over Large Sequences Through Sparse Flash Attention.**
  - **FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.** - AILab/flash-attention)](https://github.com/Dao-AILab/flash-attention)
  - **FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.** - AILab/flash-attention)](https://github.com/Dao-AILab/flash-attention)
  - **TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer.**
  - **Efficient LLM Inference with Kcache.**
  - **You Only Cache Once: Decoder-Decoder Architectures for Language Models.**
  - **Fast Transformer Decoding: One Write-Head is All You Need.**
  - **Layer-Condensed KV Cache for Efficient Inference of Large Language Models.**
  - **GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.** - Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai._ Arxiv 2023.
  - **Reducing Transformer Key-Value Cache Size with Cross-Layer Attention.**
  - **PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference.**
  - **Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression.** - Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen._ Arxiv 2024.
  - **MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.**
  - **PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling.**
  - **Effectively Compress KV Heads for LLM.**
  - **Beyond KV Caching: Shared Attention for Efficient LLMs.**
  - **A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression.**
  - **Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks.**
  - **Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization.**
  - **PQCache: Product Quantization-based KVCache for Long Context LLM Inference.**
  - **LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference.**
  - **Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope.**
  - **RazorAttention: Efficient KV Cache Compression Through Retrieval Heads.**
  - **FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.**
  - **ThinK: Thinner Key Cache by Query-Driven Pruning.**
  - **A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder.** - rae Jo, Dongkun Shin._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/Dirac-Notation/A2SF)](https://github.com/Dirac-Notation/A2SF)
  - **Cross-layer Attention Sharing for Large Language Models.**
  - **NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time.** - NACL)
  - **Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters.**
  - **CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.**
  - **UNComp: Uncertainty-Aware Long-Context Compressor for Efficient Large Language Model Inference.**
  - **LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy.**
  - **In-context KV-Cache Eviction for LLMs via Attention-Gate.**
  - **A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference.**
  - **MagicPIG: LSH Sampling for Efficient LLM Generation.** - AI-Lab/MagicPIG)](https://github.com/Infini-AI-Lab/MagicPIG)
  - **Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning.**
  - **MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection.**
  - **EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models.**
  - **VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration.**
  - **Star Attention: Efficient LLM Inference over Long Sequences.** - Attention)](https://github.com/NVIDIA/Star-Attention)
  - **When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training.**
  - **Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity.**
  - **Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache.**
  - **Squeezed Attention: Accelerating Long Context Length LLM Inference.**
  - **TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection.**
  - **Generating Long Sequences with Sparse Transformers.**
  - **Blockwise selfattention for long document understanding.** - tau Yih, Sinong Wang, Jie Tang._ EMNLP 2020. [![GitHub Repo stars](https://img.shields.io/github/stars/xptree/BlockBERT)](https://github.com/xptree/BlockBERT)
  - **Longformer: The Long-Document Transformer.**
  - **ETC: Encoding Long and Structured Inputs in Transformers.**
  - **Big Bird: Transformers for Longer Sequences.** - research/bigbird)](https://github.com/google-research/bigbird)
  - **Reformer: The efficient transformer.** - pytorch)](https://github.com/lucidrains/reformer-pytorch)
  - **Sparse Sinkhorn Attention.** - Cheng Juan._ ICML 2020. [![GitHub Repo stars](https://img.shields.io/github/stars/lucidrains/sinkhorn-transformer)](https://github.com/lucidrains/sinkhorn-transformer)
  - **Sparse and continuous attention mechanisms.**
  - **Efficient Content-Based Sparse Attention with Routing Transformers.** - transformer)](https://github.com/lucidrains/routing-transformer)
  - **Efficient Long-Text Understanding with Short-Text Models.**
  - **Parallel Context Windows for Large Language Models.** - Brown, Yoav Shoham._ ACL 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/AI21Labs/Parallel-Context-Windows)](https://github.com/AI21Labs/Parallel-Context-Windows)
  - **LongT5: Efficient text-to-text transformer for long sequences.** - Hsuan Sung, Yinfei Yang._ NAACL 2022. [![GitHub Repo stars](https://img.shields.io/github/stars/google-research/longt5)](https://github.com/google-research/longt5)
  - **Unlimiformer: Long-Range Transformers with Unlimited Length Input.**
  - **LONGNET: Scaling Transformers to 1,000,000,000 Tokens.**
  - **Blockwise Parallel Transformer for Long Context Large Models.** - Parallel-Transformer)](https://github.com/lhao499/llm_large_context)
  - **MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers.** - pytorch)](https://github.com/lucidrains/MEGABYTE-pytorch)
  - **Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers.**
  - **Sparse Token Transformer with Attention Back Tracking.**
  - **Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers.**
  - **Long-range Language Modeling with Self-retrieval.**
  - **Max-Margin Token Selection in Attention Mechanism.**
  - **Ring Attention with Blockwise Transformers for Near-Infinite Context.**
  - **Empower Your Model with Longer and Better Context Comprehension.** - transition)](https://github.com/yileijin/attention-transition)
  - **HyperAttention: Long-context Attention in Near-Linear Time.**
  - **Training-Free Long-Context Scaling of Large Language Models.**
  - **LongHeads: Multi-Head Attention is Secretly a Long Context Processor.**
  - **Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention.**
  - **Sequence can Secretly Tell You What to Discard.**
  - **HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning.**
  - **Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.**
  - **Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens.**
  - **MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression.**
  - **Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers.**
  - **Selective Attention Improves Transformer.**
  - **FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding.**
  - **Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix.**
  - **Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures.**
  - **Selective Attention: Enhancing Transformer through Principled Context Control.** - Chowdhury, Jiasi Chen, Samet Oymak._ NeurIPS 2024.
  - **Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.** - transformers)](https://github.com/idiap/fast-transformers)
  - **SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs.** - Hay So, Ting Cao, Fan Yang, Mao Yang._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/SeerAttention)](https://github.com/microsoft/SeerAttention)
  - **Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations.**
  - **Rethinking attention with performers.** - pytorch)](https://github.com/lucidrains/performer-pytorch)
  - **Random Feature Attention.** - ARK/RFA)](https://github.com/Noahs-ARK/RFA)
  - **Luna: Linear unified nested attention.** - transformer)](https://github.com/sooftware/luna-transformer)
  - **Fnet: Mixing tokens with fourier transforms.** - Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon._ Arxiv 2021. [![GitHub Repo stars](https://img.shields.io/github/stars/jaketae/fnet)](https://github.com/jaketae/fnet)
  - **Gated Linear Attention Transformers with Hardware-Efficient Training.**
  - **Latent Attention for Linear Time Transformers.**
  - **Simple linear attention language models balance the recall-throughput tradeoff.**
  - **Linear Attention Sequence Parallelism.**
  - **HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing.** - Anne Hartley, Brian Gravelle, Furong Huang, Cornelia Fermüller, Yiannis Aloimonos._ Arxiv 2024.
  - **LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid.** - MoE](https://github.com/OpenSparseLLMs/Linear-MoE)
  - **Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving.** - kv-compression)](https://github.com/LLMkvsys/rethink-kv-compression)
  - **SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching.**
  - **LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important.** - Lab-China-Merchants-Bank/LagKV)](https://github.com/AI-Lab-China-Merchants-Bank/LagKV)
  - **X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression.**
  - **KV-Distill: Nearly Lossless Learnable Context Compression for LLMs.** - distill)](https://github.com/vnchari/kv-distill)
  - **Radar: Fast Long-Context Decoding for Any Transformer.**
  - **LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference.**
  - **PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention.**
  - **RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression.** - An Tsai, Zhiding Yu, Alexey Tumanov._ Arxiv 2025.
  - **Tensor Product Attention Is All You Need.** - Chih Yao._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/tensorgi/T6)](https://github.com/tensorgi/T6)
  - **ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition.**
  - **Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification.**
  - **CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences.**
  - **Cross-Self KV Cache Pruning for Efficient Vision-Language Inference.**
  - **Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern.**
  - **Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models.** - Hao Chiang, Zhongxin Guo, Chen Lin, Kun Kuang, Wenjie Li, Yelong Shen, Jian Jiao, Peng Cheng, Mao Yang._ Arxiv 2025.
  - **Twilight: Adaptive Attention Sparsity with Hierarchical Top-p Pruning.**
  - **Reducing Transformer Key-Value Cache Size with Cross-Layer Attention.**
  - **A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression.**
  - **Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference.** - han-lab/Quest)](https://github.com/mit-han-lab/Quest)
  - **Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters.**
  - **CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling.** - Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung._ Arxiv 2024.
  - **D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models.**
  - **LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.** - M)](https://github.com/SUSTechBruce/LOOK-M)
  - **Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache.**
  - **QuickLLaMA: Query-aware Inference Acceleration for Large Language Models.** - research/Q-LLM)](https://github.com/dvlab-research/Q-LLM)
  - **MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention.** - Yew Lin, Yuqing Yang, Lili Qiu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/MInference)](https://github.com/microsoft/MInference)
  - **Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks.**
  - **Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization.**
  - **PQCache: Product Quantization-based KVCache for Long Context LLM Inference.**
  - **LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference.**
  - **Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope.**
  - **RazorAttention: Efficient KV Cache Compression Through Retrieval Heads.**
  - **FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision.**
  - **ThinK: Thinner Key Cache by Query-Driven Pruning.**
  - **A2SF: Accumulative Attention Scoring with Forgetting Factor for Token Pruning in Transformer Decoder.** - rae Jo, Dongkun Shin._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/Dirac-Notation/A2SF)](https://github.com/Dirac-Notation/A2SF)
  - **MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.** - Hsu Yen, Beidi Chen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/Infini-AI-Lab/MagicDec)](https://github.com/Infini-AI-Lab/MagicDec/)
  - **CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios.**
  - **RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.**
  - **InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference.**
  - **CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.**
  - **Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction.** - Phi Nguyen, Yingyu Liang, Shafiq Joty._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/SalesforceAIResearch/GemFilter)](https://github.com/SalesforceAIResearch/GemFilter)
  - **Inference-Friendly Models With MixAttention.**
  - **KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head.** - kvcompress)](https://github.com/IsaacRe/vllm-kvcompress)
  - **Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads.**
  - **InfiniPot: Infinite Context Processing on Memory-Constrained LLMs.**
  - **DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads.** - han-lab/duo-attention)](https://github.com/mit-han-lab/duo-attention)
  - **SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.** - sg/SimLayerKV)](https://github.com/sail-sg/SimLayerKV)
  - **A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference.**
  - **KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing.**
  - **Lossless KV Cache Compression to 2%.**
  - **MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection.**
  - **EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models.**
  - **MagicPIG: LSH Sampling for Efficient LLM Generation.** - AI-Lab/MagicPIG)](https://github.com/Infini-AI-Lab/MagicPIG)
  - **Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning.**
  - **Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning.**
  - **ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.** - Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/bytedance/ShadowKV)](https://github.com/bytedance/ShadowKV)
  - **BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference.** - llm)](https://github.com/JunqiZhao888/buzz-llm)
  - **TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection.**
  - **Recycled Attention: Efficient inference for long-context language models.** - attention)](https://github.com/carriex/recycled-attention)
  - **Squeezed Attention: Accelerating Long Context Length LLM Inference.**
  - **Star Attention: Efficient LLM Inference over Long Sequences.** - Attention)](https://github.com/NVIDIA/Star-Attention)
  - **Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache.**
  - **Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference.**
  - **TokenButler: Token Importance is Predictable.** - Chih Chang, Nilesh Jain, Mohamed S. Abdelfattah._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/abdelfattah-lab/TokenButler)](https://github.com/abdelfattah-lab/TokenButler)
  - **SnapKV: LLM Knows What You are Looking for Before Generation.**
  - **Core Context Aware Attention for Long Context Language Modeling.**
  - **AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning.** - Lab/AIM)](https://github.com/LaVi-Lab/AIM)
  - **ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression.**
  - **BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching.**
  - **DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs.**
  - **AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference.**
  - **TreeKV: Smooth Key-Value Cache Compression with Tree Structures.**
  - **LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention.** - han-lab/omniserve)](https://github.com/mit-han-lab/omniserve)
  - **BalanceKV: KV Cache Compression through Discrepancy Theory.**
  - **TransMLA: Multi-Head Latent Attention Is All You Need.**
  - **Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding.** - theta-attention)](https://github.com/kostyanoob/top-theta-attention)
  - **Online Scheduling for LLM Inference with KV Cache Constraints.**
  - **KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference.** - Ling Zhen, Wulong Liu, Yiwu Yao, Sinno Jialin Pan, Mingxuan Yuan._ Arxiv 2025.
  - **Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective.**
  - **Inference-time sparse attention with asymmetric indexing.** - Emmanuel Mazaré, Gergely Szilvasy, Maria Lomeli, Francisco Massa, Naila Murray, Hervé Jégou, Matthijs Douze._ Arxiv 2025.
  - **MoBA: Mixture of Block Attention for Long-Context LLMs.**
  - **A2ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization.**
  - **SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator.**
  - **ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty.**
  - **EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance.**
  - **XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference.**
  - **SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs.**
  - **SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs.** - Hong Deng, Jing Han._ Arxiv 2025.
  - **KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference.**
  - **xKV: Cross-Layer SVD for KV-Cache Compression.** - Chih Chang, Chien-Yu Lin, Yash Akhauri, Wei-Cheng Lin, Kai-Chiang Wu, Luis Ceze, Mohamed S. Abdelfattah._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/abdelfattah-lab/xKV)](https://github.com/abdelfattah-lab/xKV)
  - **WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference.**
  - **BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache.** - DuDa/BitDecoding)](https://github.com/DD-DuDa/BitDecoding)
  - **Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression.**
  - **Efficient Content-Based Sparse Attention with Routing Transformers.** - transformer)](https://github.com/lucidrains/routing-transformer)
  - **LongT5: Efficient text-to-text transformer for long sequences.** - Hsuan Sung, Yinfei Yang._ NAACL 2022. [![GitHub Repo stars](https://img.shields.io/github/stars/google-research/longt5)](https://github.com/google-research/longt5)
  - **Unlimiformer: Long-Range Transformers with Unlimited Length Input.**
  - **Landmark Attention: Random-Access Infinite Context Length for Transformers.** - attention)](https://github.com/epfml/landmark-attention)
  - **Empower Your Model with Longer and Better Context Comprehension.** - transition)](https://github.com/yileijin/attention-transition)
  - **Ring Attention with Blockwise Transformers for Near-Infinite Context.**
  - **Efficient Streaming Language Models with Attention Sinks.** - han-lab/streaming-llm)](https://github.com/mit-han-lab/streaming-llm)
  - **HyperAttention: Long-context Attention in Near-Linear Time.**
  - **Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention.** - Transformer)](https://github.com/ZiweiHe/Fovea-Transformer)
  - **Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention.**
  - **SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models.** - GT-86/SinkLoRA)](https://github.com/Dexter-GT-86/SinkLoRA)
  - **HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning.**
  - **Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens.**
  - **Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers.**
  - **Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.**
  - **Neurocache: Efficient Vector Retrieval for Long-range Language Modeling.**
  - **Weighted Grouped Query Attention in Transformers.**
  - **TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention.**
  - **Cost-Optimal Grouped-Query Attention for Long-Context LLMs.** - optimal-gqa)](https://github.com/THUNLP/cost-optimal-gqa)
  - **Masked language modeling for proteins via linearly scalable long-context transformers.**
  - **Linformer: Self-attention with linear complexity.** - attention-transformer)](https://github.com/lucidrains/linear-attention-transformer)
  - **Random Feature Attention.** - ARK/RFA)](https://github.com/Noahs-ARK/RFA)
  - **Latent Attention for Linear Time Transformers.**
  - **Linear Attention Sequence Parallelism.**
  - **Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention.**
  - **Attention as an RNN.**
  - **You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet.**
  - **When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models.** - EIC/Linearized-LLM](https://github.com/GATECH-EIC/Linearized-LLM)
  - **Hierarchical Neural Network Approaches for Long Document Classification.**
  - **Hi-transformer: Hierarchical interactive transformer for efficient and effective long document modeling.** - IJCNLP 2021
  - **FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.** - AILab/flash-attention)](https://github.com/Dao-AILab/flash-attention)
  - **FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.** - AILab/flash-attention)](https://github.com/Dao-AILab/flash-attention)
  - **Efficient Memory Management for Large Language Model Serving with PagedAttention.** - project/vllm)](https://github.com/vllm-project/vllm)
  - **GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints.** - Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai._ Arxiv 2023.
  - **More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression.**
  - **Boosting Long-Context Information Seeking via Query-Guided Activation Refilling.**
  - **SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation.** - ai/SCOPE)](https://github.com/Linking-ai/SCOPE)
  - **DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance.**
  - **KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse.** - NLP-Chang/KVLink)](https://github.com/UCSB-NLP-Chang/KVLink)
  - **FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference.**
  - **SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention.** - Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan._ Arxiv 2025.
  - **Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference.**
  - **Neural Attention Search.**
  - **Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention.**
  - **QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache.**
  - **APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs.**
  - **AdaSplash: Adaptive Sparse Flash Attention.** - spin/adasplash)](https://github.com/deep-spin/adasplash)
  - **InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU.**
  - **RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models.** - Yiu Yau, Hoi-To Wai, Yang (Katie)Zhao, Dongyeop Kang, Youngsuk Park, Mingyi Hong._ Arxiv 2025.
  - **Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs.** - decoding)](https://github.com/ryansynk/topk-decoding)
  - **AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference.**
  - **MoM: Linear Sequence Modeling with Mixture-of-Memories.**
  - **CoKV: Optimizing KV Cache Allocation via Cooperative Game.**
  - **MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference.** - MLSys-Lab/MEDA)](https://github.com/AIoT-MLSys-Lab/MEDA)
  - **FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference.**
  - **QuickLLaMA: Query-aware Inference Acceleration for Large Language Models.** - research/Q-LLM)](https://github.com/dvlab-research/Q-LLM)
  - **MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention.** - Yew Lin, Yuqing Yang, Lili Qiu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/MInference)](https://github.com/microsoft/MInference)
  - **MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.** - Hsu Yen, Beidi Chen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/Infini-AI-Lab/MagicDec)](https://github.com/Infini-AI-Lab/MagicDec/)
  - **RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval.**
  - **Landmark Attention: Random-Access Infinite Context Length for Transformers.** - attention)](https://github.com/epfml/landmark-attention)
  - **Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention.** - Transformer)](https://github.com/ZiweiHe/Fovea-Transformer)
  - **SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models.** - GT-86/SinkLoRA)](https://github.com/Dexter-GT-86/SinkLoRA)
  - **Neurocache: Efficient Vector Retrieval for Long-range Language Modeling.**
  - **Weighted Grouped Query Attention in Transformers.**
  - **When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models.** - EIC/Linearized-LLM](https://github.com/GATECH-EIC/Linearized-LLM)
  - **Hierarchical Neural Network Approaches for Long Document Classification.**
  - **Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference.** - han-lab/Quest)](https://github.com/mit-han-lab/Quest)
  - **LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation.**
  - **Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters.**
  - **CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling.** - Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung._ Arxiv 2024.
  - **D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models.**
  - **LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.** - M)](https://github.com/SUSTechBruce/LOOK-M)
  - **Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache.**
  - **InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference.**
  - **CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.**
  - **Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction.** - Phi Nguyen, Yingyu Liang, Shafiq Joty._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/SalesforceAIResearch/GemFilter)](https://github.com/SalesforceAIResearch/GemFilter)
  - **Inference-Friendly Models With MixAttention.**
  - **KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head.** - kvcompress)](https://github.com/IsaacRe/vllm-kvcompress)
  - **Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads.**
  - **InfiniPot: Infinite Context Processing on Memory-Constrained LLMs.**
  - **DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads.** - han-lab/duo-attention)](https://github.com/mit-han-lab/duo-attention)
  - **SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.** - sg/SimLayerKV)](https://github.com/sail-sg/SimLayerKV)
  - **Long Sequence Modeling with Attention Tensorization: From Sequence to Tensor Learning.**
  - **ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.** - Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/bytedance/ShadowKV)](https://github.com/bytedance/ShadowKV)
  - **BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference.** - llm)](https://github.com/JunqiZhao888/buzz-llm)
  - **Recycled Attention: Efficient inference for long-context language models.** - attention)](https://github.com/carriex/recycled-attention)
  - **Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs.** - Ushio/MHA2MLA)](https://github.com/JT-Ushio/MHA2MLA)
  - **ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs.** - Lab/ZeroMerge)](https://github.com/SusCom-Lab/ZeroMerge)
  - **AKVQ-VL: Attention-Aware KV Cache Adaptive 2-Bit Quantization for Vision-Language Models.**
  - **ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference.**
  - **FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation.** - Joon Kim._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/dongwonjo/FastKV)](https://github.com/dongwonjo/FastKV)
  - **Can LLMs Maintain Fundamental Abilities under KV Cache Compression?.**
  - **Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation.**
  - **Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization.** - Young Kim, Jongse Park._ Arxiv 2025.
  - **Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference.**
  - **PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference.**
  - **SQuat: Subspace-orthogonal KV Cache Quantization.**
  - **FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling.**
  - **Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving.**
  - **Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving.**
  - **OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs**
  - **XAttention: Block Sparse Attention with Antidiagonal Scoring.** - han-lab/x-attention)](https://github.com/mit-han-lab/x-attention)
  - **EDiT: Efficient Diffusion Transformers with Linear Compressed Attention.**
  - **Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs.**
  - **BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference.** - Venkata, Murali Emani, Mahmut Kandemir, Venkatram Vishwanath._ Arxiv 2025.
  - **WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models.**
  - **Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs.**
  - **EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection.**
  - **Exploring the Limits of KV Cache Compression in Visual Autoregressive Transformers.**
- 4. State Space Models
  - **SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model.**
  - **S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting.**
  - **CacheMamba: Popularity Prediction for Mobile Edge Caching Networks via Selective State Spaces.** - Meybodi, Arash Mohammadi._ Arxiv 2025.
  - **Mamba: Linear-Time Sequence Modeling with Selective State Spaces.** - spaces/mamba)](https://github.com/state-spaces/mamba)
  - **MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts.**
  - **MambaByte: Token-free Selective State Space Model.**
  - **LOCOST: State-Space Models for Long Document Abstractive Summarization.**
  - **State Space Models as Foundation Models: A Control Theoretic Overview.**
  - **Jamba: A Hybrid Transformer-Mamba Language Model.** - Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham._ Arxiv 2024.
  - **Robustifying State-space Models for Long Sequences via Approximate Diagonalization.**
  - **Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality.** - spaces/mamba)](https://github.com/state-spaces/mamba)
  - **Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling.**
  - **MambaForGCN: Enhancing Long-Range Dependency with State Space Model and Kolmogorov-Arnold Networks for Aspect-Based Sentiment Analysis.**
  - **Discrete Diffusion Language Model for Long Text Summarization.**
  - **ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2.** - Mamba)](https://github.com/WenjunHuang94/ML-Mamba)
  - **Jamba-1.5: Hybrid Transformer-Mamba Models at Scale.**
  - **SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models.**
  - **ReMamba: Equip Mamba with Effective Long-Sequence Modeling.**
  - **Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling.** - mamba)](https://github.com/thunlp/stuffed-mamba)
  - **Taipan: Efficient and Expressive State Space Language Models with Selective Attention.**
  - **Rethinking Token Reduction for State Space Models.**
  - **B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory.**
  - **Attamba: Attending To Multi-Token States.** - lab/attamba)](https://github.com/abdelfattah-lab/attamba)
  - **Zamba: A Compact 7B SSM Hybrid Model.**
  - **B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory.**
  - **Gated Delta Networks: Improving Mamba2 with Delta Rule.**
- 5. Length Extrapolation
  - **Scalable-Softmax Is Superior for Attention.**
  - **Rope to Nope and Back Again: A New Hybrid Attention Strategy.**
  - **A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI).**
  - **LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation.**
  - **Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification.** - Paz, Kartik Ahuja._ Arxiv 2025.
  - **The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval.** - Rui Chiang, Dani Yogatama._ Arxiv 2025.
  - **RoFormer: Enhanced Transformer with Rotary Position Embedding.**
  - **Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.**
  - **KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation.** - Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky._ Arxiv 2022.
  - **Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis.** - Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge._ ACL 2023.
  - **The Impact of Positional Encoding on Length Generalization in Transformers.** - NLP/length-generalization)](https://github.com/McGill-NLP/length-generalization)
  - **A Length-Extrapolatable Transformer.**
  - **Focused Transformer: Contrastive Training for Context Scaling.**
  - **Exploring Transformer Extrapolation.**
  - **Extending Context Window of Large Language Models via Positional Interpolation.**
  - **LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models.** - Infinite)](https://github.com/kyegomez/LM-Infinite)
  - **Scaling Laws of RoPE-based Extrapolation.**
  - **YaRN: Efficient Context Window Extension of Large Language Models.**
  - **LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models.** - research/LongLoRA)](https://github.com/dvlab-research/LongLoRA)
  - **Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation.** - Chung Chi,Ting-Han Fan,Alexander I. Rudnicky._ Arxiv 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/chijames/Attention-Alignment-Transformer-Length-Extrapolation)](https://github.com/chijames/Attention-Alignment-Transformer-Length-Extrapolation)
  - **CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending.** - ai/Collinear-Constrained-Attention)](https://github.com/codefuse-ai/Collinear-Constrained-Attention)
  - **PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training.** - pku/PoSE)](https://github.com/dwzhu-pku/PoSE)
  - **Structured Packing in LLM Training Improves Long Context Utilization.**
  - **E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.**
  - **LongRoPE: Extending LLM ContextWindow Beyond 2 Million Tokens.**
  - **CLEX: Continuous Length Extrapolation for Large Language Models.** - NLP-SG/CLEX)](https://github.com/DAMO-NLP-SG/CLEX)
  - **Resonance RoPE: Improving Context Length Generalization of Large Language Models.**
  - **Can't Remember Details in Long Documents? You Need Some R&R.** - and-r)](https://github.com/casetext/r-and-r)
  - **Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.** - Group/Ms-PoE)](https://github.com/VITA-Group/Ms-PoE)
  - **InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory.**
  - **Naive Bayes-based Context Extension for Large Language Models.** - master)](https://github.com/amurtadha/NBCE-master)
  - **In-Context Pretraining: Language Modeling Beyond Document Boundaries.** - tau Yih, Mike Lewis._ ICLR 2024 Spotlight. [![GitHub Repo stars](https://img.shields.io/github/stars/swj0419/in-context-pretraining)](https://github.com/swj0419/in-context-pretraining)
  - **Effective Long-Context Scaling of Foundation Models.**
  - **Fewer Truncations Improve Language Modeling.**
  - **Extending Llama-3's Context Ten-Fold Overnight.**
  - **Long Context Alignment with Short Instructions and Synthesized Positions.**
  - **Length Generalization of Causal Transformers without Position Encoding.**
  - **xLSTM: Extended Long Short-Term Memory.**
  - **3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding.**
  - **Human-like Episodic Memory for Infinite Context LLMs.** - Ammar, Jun Wang._ Arxiv 2024.
  - **ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities.**
  - **LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models.** - Kiong Ng, Zhiwei Jiang, Bryan Hooi._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/zhiyuanhubj/LongRecipe)](https://github.com/zhiyuanhubj/LongRecipe)
  - **E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning.**
  - **Efficient Long-range Language Modeling with Self-supervised Causal Retrieval.**
  - **A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts.**
  - **Differential Transformer.**
  - **Why Does the Effective Context Length of LLMs Fall Short?.**
  - **LOGO -- Long cOntext aliGnment via efficient preference Optimization.**
  - **Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement.**
  - **Two are better than one: Context window extension with multi-grained self-injection.**
  - **LongReward: Improving Long-context Large Language Models with AI Feedback.**
  - **HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation.**
  - **Large Language Models Can Self-Improve in Long-context Reasoning.**
  - **Circuit Complexity Bounds for RoPE-based Transformer Architecture.**
  - **Transformers Can Do Arithmetic with the Right Embeddings.**
  - **What is Wrong with Perplexity for Long-context Language Modeling?.** - ML/LongPPL)](https://github.com/PKU-ML/LongPPL)
  - **An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding.** - nlco/cream)](https://github.com/bigai-nlco/cream)
  - **Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference.**
  - **Adjoint sharding for very long context training of state space models.**
  - **SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling.** - Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg._ Arxiv 2025.
  - **DINT Transformer.**
  - **Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.**
  - **KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation.** - Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky._ Arxiv 2022.
  - **A Length-Extrapolatable Transformer.**
  - **Randomized Positional Encodings Boost Length Generalization of Transformers.** - Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness._ ACL 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/google-deepmind/randomized_positional_encodings)](https://github.com/google-deepmind/randomized_positional_encodings)
  - **The Impact of Positional Encoding on Length Generalization in Transformers.** - NLP/length-generalization)](https://github.com/McGill-NLP/length-generalization)
  - **Extending Context Window of Large Language Models via Positional Interpolation.**
  - **PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training.** - pku/PoSE)](https://github.com/dwzhu-pku/PoSE)
  - **LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models.** - research/LongLoRA)](https://github.com/dvlab-research/LongLoRA)
  - **Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation.** - Chung Chi,Ting-Han Fan,Alexander I. Rudnicky._ Arxiv 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/chijames/Attention-Alignment-Transformer-Length-Extrapolation)](https://github.com/chijames/Attention-Alignment-Transformer-Length-Extrapolation)
  - **CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending.** - ai/Collinear-Constrained-Attention)](https://github.com/codefuse-ai/Collinear-Constrained-Attention)
  - **Structured Packing in LLM Training Improves Long Context Utilization.**
  - **LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning.** - Yuan Chang, Huiyuan Chen, Xia Hu._ Arxiv 2024.
  - **Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.**
  - **Extending LLMs' Context Window with 100 Samples.** - NLP/Entropy-ABF)](https://github.com/GAIR-NLP/Entropy-ABF)
  - **E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.**
  - **With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation.** - LoRA)](https://github.com/TemporaryLoRA/Temp-LoRA)
  - **Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation.**
  - **Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens.** - gram)](https://github.com/liujch1998/infini-gram)
  - **LongRoPE: Extending LLM ContextWindow Beyond 2 Million Tokens.**
  - **Data Engineering for Scaling Language Models to 128K Context.** - Context-Data-Engineering)](https://github.com/FranxYao/Long-Context-Data-Engineering)
  - **Transformers Can Achieve Length Generalization But Not Robustly.**
  - **Long-Context Language Modeling with Parallel Context Encoding.** - nlp/CEPE)](https://github.com/princeton-nlp/CEPE)
  - **CLEX: Continuous Length Extrapolation for Large Language Models.** - NLP-SG/CLEX)](https://github.com/DAMO-NLP-SG/CLEX)
  - **Resonance RoPE: Improving Context Length Generalization of Large Language Models.**
  - **Naive Bayes-based Context Extension for Large Language Models.** - master)](https://github.com/amurtadha/NBCE-master)
  - **In-Context Pretraining: Language Modeling Beyond Document Boundaries.** - tau Yih, Mike Lewis._ ICLR 2024 Spotlight. [![GitHub Repo stars](https://img.shields.io/github/stars/swj0419/in-context-pretraining)](https://github.com/swj0419/in-context-pretraining)
  - **Length Generalization of Causal Transformers without Position Encoding.**
  - **Extending Llama-3's Context Ten-Fold Overnight.**
  - **xLSTM: Extended Long Short-Term Memory.**
  - **DAPE: Data-Adaptive Positional Encoding for Length Extrapolation.** - Zheng/DAPE)](https://github.com/chuanyang-Zheng/DAPE)
  - **Contextual Position Encoding: Learning to Count What's Important.**
  - **Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model.**
  - **Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure.** - coupling)](https://github.com/HanseulJo/position-coupling)
  - **LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models.**
  - **Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks.**
  - **3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding.**
  - **Mixture of In-Context Experts Enhance LLMs' Long Context Awareness.**
  - **Human-like Episodic Memory for Infinite Context LLMs.** - Ammar, Jun Wang._ Arxiv 2024.
  - **Scaling Granite Code Models to 128K Context.** - Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/ibm-granite/granite-code-models)](https://github.com/ibm-granite/granite-code-models)
  - **Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly.**
  - **FocusLLM: Scaling LLM's Context by Parallel Decoding.**
  - **Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models.** - the-Knots)](https://github.com/rgtjf/Untie-the-Knots)
  - **Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count.** - coupling)](https://github.com/HanseulJo/position-coupling)
  - **Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models.**
  - **Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models.**
  - **DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search.**
  - **PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead.** - RAG)](https://github.com/TTArch/PEAR-RAG)
  - **A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts.**
  - **Extending Context Window of Large Language Models from a Distributional Perspective.**
  - **How to Train Long-Context Language Models (Effectively).** - nlp/ProLong)](https://github.com/princeton-nlp/ProLong)
  - **Differential Transformer.**
  - **DAPE V2: Process Attention Score as Feature Map for Length Extrapolation.** - Zheng/DAPE)](https://github.com/chuanyang-Zheng/DAPE)
  - **Why Does the Effective Context Length of LLMs Fall Short?.**
  - **LOGO -- Long cOntext aliGnment via efficient preference Optimization.**
  - **LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data.** - FinAI/LongFaith)](https://github.com/IDEA-FinAI/LongFaith)
  - **LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs.**
  - **Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling.**
  - **Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models.** - attention)](https://github.com/OpenNLPLab/lightning-attention)
  - **LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning.**
  - **NExtLong: Toward Effective Long-Context Training without Long Documents.**
  - **SEAL: Scaling to Emphasize Attention for Long-Context Retrieval.** - gyu Jin, Younghyun Cho, Eunhyeok Park._ Arxiv 2025.
  - **Information Entropy Invariance: Enhancing Length Extrapolation in Attention Mechanisms.** - NEKO/InfoScale)](https://github.com/HT-NEKO/InfoScale)
  - **Forgetting Transformer: Softmax Attention with a Forget Gate.** - lin/forgetting-transformer)](https://github.com/zhixuan-lin/forgetting-transformer)
  - **WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale.** - Qing Chen, Wei Lu, Furu Wei._ Arxiv 2025.
  - **Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning.** - synthesis)](https://github.com/NJUNLP/context-synthesis)
  - **Randomized Positional Encodings Boost Length Generalization of Transformers.** - Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness._ ACL 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/google-deepmind/randomized_positional_encodings)](https://github.com/google-deepmind/randomized_positional_encodings)
  - **LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning.** - Yuan Chang, Huiyuan Chen, Xia Hu._ Arxiv 2024.
  - **Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache.**
  - **Extending LLMs' Context Window with 100 Samples.** - NLP/Entropy-ABF)](https://github.com/GAIR-NLP/Entropy-ABF)
  - **With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation.** - LoRA)](https://github.com/TemporaryLoRA/Temp-LoRA)
  - **Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation.**
  - **Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens.** - gram)](https://github.com/liujch1998/infini-gram)
  - **DAPE: Data-Adaptive Positional Encoding for Length Extrapolation.** - Zheng/DAPE)](https://github.com/chuanyang-Zheng/DAPE)
  - **Contextual Position Encoding: Learning to Count What's Important.**
  - **Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model.**
  - **Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure.** - coupling)](https://github.com/HanseulJo/position-coupling)
  - **LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models.**
  - **Scaling Granite Code Models to 128K Context.** - Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/ibm-granite/granite-code-models)](https://github.com/ibm-granite/granite-code-models)
  - **Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly.**
  - **FocusLLM: Scaling LLM's Context by Parallel Decoding.**
  - **Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models.** - the-Knots)](https://github.com/rgtjf/Untie-the-Knots)
  - **PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead.** - RAG)](https://github.com/TTArch/PEAR-RAG)
  - **Extending Context Window of Large Language Models from a Distributional Perspective.**
  - **How to Train Long-Context Language Models (Effectively).** - nlp/ProLong)](https://github.com/princeton-nlp/ProLong)
  - **DAPE V2: Process Attention Score as Feature Map for Length Extrapolation.** - Zheng/DAPE)](https://github.com/chuanyang-Zheng/DAPE)
  - **LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization.** - NLP-SG/LongPO)](https://github.com/DAMO-NLP-SG/LongPO)
  - **ParallelComp: Parallel Long-Context Compressor for Length Extrapolation.**
  - **LongAttn: Selecting Long-context Training Data via Token-level Attention.** - wu/LongAttn)](https://github.com/Lyun0912-wu/LongAttn)
  - **Sliding Window Attention Training for Efficient Large Language Models.** - wu/LongAttn)](https://anonymous.4open.science/r/SWAT-attention/README.md)
  - **LongRoPE2: Near-Lossless LLM Context Window Scaling.**
  - **ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs.**
  - **Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration.** - PauseTokens-7357)
  - **Token Weighting for Long-Range Language Modeling.** - token-weighting)](https://github.com/UKPLab/naacl2025-token-weighting)
  - **From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models.** - blue)](https://ultralong.github.io/)
- 6. Long Term Memory
  - **M+: Extending MemoryLLM with Scalable Long-Term Memory.**
  - **LM2: Large Memory Models.** - ai/lm2)](https://github.com/convergence-ai/lm2)
  - **Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents.**
  - **MEMORYLLM: Towards Self-Updatable Large Language Models.** - ustc/MemoryLLM)](https://github.com/wangyu-ustc/MemoryLLM)
  - **EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts.**
  - **Can Memory-Augmented Language Models Generalize on Reasoning-in-a-Haystack Tasks?.** - Yun Ko, Sihui Dai, Georgios Kollias, Subhajit Chaudhury, Aurelie Lozano._ Arxiv 2025.
  - **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System.**
  - **MemoryBank: Enhancing Large Language Models with Long-Term Memory.** - SiliconFriend)](https://github.com/zhongwanjun/MemoryBank-SiliconFriend)
  - **Improve Long-term Memory Learning Through Rescaling the Error Temporally.**
  - **Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models.**
  - **Empowering Working Memory for Large Language Model Agents.**
  - **Evolving Large Language Model Assistant with Long-Term Conditional Memory.**
  - **Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement.** - iunn Ong, Seoyeon Kim, Dongha Lee, Jinyoung Yeo._ Arxiv 2024.
  - **CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs.**
  - **Steering Conversational Large Language Models for Long Emotional Support Conversations.**
  - **StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses.** - Nan Li, Quan Tu, Cunli Mao, Zhengtao Yu, Ji-Rong Wen, Rui Yan._ Arxiv 2024.
  - **SirLLM: Streaming Infinite Retentive LLM.**
  - **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts.** - Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer._ Arxiv 2024.
  - **Towards LifeSpan Cognitive Systems.**
  - **Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models.**
  - **Empowering Working Memory for Large Language Model Agents.**
  - **Evolving Large Language Model Assistant with Long-Term Conditional Memory.**
  - **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts.** - Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer._ Arxiv 2024.
  - **Steering Conversational Large Language Models for Long Emotional Support Conversations.**
  - **SPAR: Personalized Content-Based Recommendation via Long Engagement Attention.** - Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long._ Arxiv 2024.
  - **Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations.**
  - **Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization.**
  - **HMT: Hierarchical Memory Transformer for Long Context Language Processing.** - pytorch)](https://github.com/OswaldHe/HMT-pytorch)
  - **SirLLM: Streaming Infinite Retentive LLM.**
  - **Toward Conversational Agents with Context and Time Sensitive Long-term Memory.**
  - **Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue.** - Ling Mao, Wenfeng Xie, Dangyang Chen._ Arxiv 2024.
  - **Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation.**
  - **HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.**
  - **CarMem: Enhancing Long-Term Memory in LLM Voice Assistants through Category-Bounding.**
  - **SPAR: Personalized Content-Based Recommendation via Long Engagement Attention.** - Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long._ Arxiv 2024.
  - **Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations.**
  - **Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization.**
  - **HMT: Hierarchical Memory Transformer for Long Context Language Processing.** - pytorch)](https://github.com/OswaldHe/HMT-pytorch)
  - **Toward Conversational Agents with Context and Time Sensitive Long-term Memory.**
  - **Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue.** - Ling Mao, Wenfeng Xie, Dangyang Chen._ Arxiv 2024.
  - **Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation.**
  - **HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model.**
  - **InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation.**
  - **Cognitive Memory in Large Language Models.**
- 7. RAG and ICL
  - **CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation.** - Hui Lee, Eunhwan Park, Donghoon Han, Seung-Hoon Na._ Arxiv 2025.
  - **Lost in the Passage: Passage-level In-context Learning Does Not Necessarily Need a "Passage".**
  - **Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning.**
  - **Feature-Adaptive and Data-Scalable In-Context Learning.** - ICL)](https://github.com/jiahaozhenbang/FADS-ICL)
  - **KG-RAG: Bridging the Gap Between Knowledge and Creativity.** - RAG)](https://github.com/dsanmart/KG-RAG)
  - **HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models.** - NLP-Group/HippoRAG)](https://github.com/OSU-NLP-Group/HippoRAG)
  - **Implicit In-context Learning.**
  - **Are Long-LLMs A Necessity For Long-Context Tasks?.**
  - **Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading.**
  - **Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing.**
  - **BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.**
  - **Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity.** - RAG)](https://github.com/starsuzi/Adaptive-RAG)
  - **RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.** - Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/chanchimin/RQ-RAG)](https://github.com/chanchimin/RQ-RAG)
  - **Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts.**
  - **Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation.**
  - **Multi-view Content-aware Indexing for Long Document Retrieval.**
  - **Retrieval Head Mechanistically Explains Long-Context Factuality.**
  - **FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference.**
  - **MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery.**
  - **You Only Use Reactive Attention Slice For Long Context Retrieval.**
  - **SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval.**
  - **Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.**
  - **In Defense of RAG in the Era of Long-Context Language Models.**
  - **Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection.** - Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen._ Arxiv 2024.
  - **Is In-Context Learning Sufficient for Instruction Following in LLMs?.** - epfl/icl-alignment)](https://github.com/tml-epfl/icl-alignment)
  - **FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models.**
  - **Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.**
  - **Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions.**
  - **Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding.**
  - **FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering.** - blue)](https://huggingface.co/forag)
  - **LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs.** - AI-Lab/LongRAG)](https://github.com/TIGER-AI-Lab/LongRAG)
  - **Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning.**
  - **From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data.**
  - **Memory3: Language Modeling with Explicit Memory.**
  - **Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting.** - Yu Lee, Tomas Pfister._ Arxiv 2024.
  - **Large Language Models Know What Makes Exemplary Contexts.** - ICL)](https://github.com/ruyue0001/RL-ICL)
  - **Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach.**
  - **Writing in the Margins: Better Inference Pattern for Long Context Retrieval.** - in-the-margins)](https://github.com/writer/writing-in-the-margins)
  - **MemLong: Memory-Augmented Retrieval for Long Text Modeling.**
  - **ChuLo: Chunk-Level Key Information Representation for Long Document Processing.**
  - **TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text.**
  - **LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models.**
  - **Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism.**
  - **Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models.**
  - **SEGMENT+: Long Text Processing with Short-Context Language Models.** - 9/segmentplus)](https://github.com/WeiShi-9/segmentplus)
  - **Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs.** - uiuc/GoR)](https://github.com/ulab-uiuc/GoR)
  - **LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering.**
  - **Reducing Distraction in Long-Context Language Models by Focused Learning.**
  - **Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models.**
  - **Revisiting In-Context Learning with Long Context Language Models.**
  - **Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models.**
  - **FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering.** - blue)](https://huggingface.co/forag)
  - **Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations.**
  - **Retrieve, Summarize, Plan: Advancing Multi-hop Question Answering with an Iterative Approach.**
  - **R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation.**
  - **Making Long-Context Language Models Better Multi-Hop Reasoners.** - Lab/LongContextReasoner)](https://github.com/LaVi-Lab/LongContextReasoner)
  - **Large Language Models Know What Makes Exemplary Contexts.** - ICL)](https://github.com/ruyue0001/RL-ICL)
  - **RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.** - science/RAGChecker)](https://github.com/amazon-science/RAGChecker)
  - **Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding.**
  - **ALR2: A Retrieve-then-Reason Framework for Long-context Question Answering.**
  - **Inference Scaling for Long-Context Retrieval Augmented Generation.**
  - **GARLIC: LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph for Long Document QA.**
  - **Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG.**
  - **Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models.**
  - **SEGMENT+: Long Text Processing with Short-Context Language Models.** - 9/segmentplus)](https://github.com/WeiShi-9/segmentplus)
  - **Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs.** - uiuc/GoR)](https://github.com/ulab-uiuc/GoR)
  - **ChuLo: Chunk-Level Key Information Representation for Long Document Processing.**
  - **TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text.**
  - **Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism.**
  - **LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering.**
  - **Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation.**
  - **MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads.**
  - **OkraLong: A Flexible Retrieval-Augmented Framework for Long-Text Query Processing.**
  - **Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations.**
  - **Making Long-Context Language Models Better Multi-Hop Reasoners.** - Lab/LongContextReasoner)](https://github.com/LaVi-Lab/LongContextReasoner)
  - **RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.** - science/RAGChecker)](https://github.com/amazon-science/RAGChecker)
  - **Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding.**
  - **ALR2: A Retrieve-then-Reason Framework for Long-context Question Answering.**
  - **Inference Scaling for Long-Context Retrieval Augmented Generation.**
  - **GARLIC: LLM-Guided Dynamic Progress Control with Hierarchical Weighted Graph for Long Document QA.**
  - **Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG.**
  - **ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation.**
  - **Tuning LLMs by RAG Principles: Towards LLM-native Memory.**
  - **Long Context Modeling with Ranked Memory-Augmented Retrieval.**
  - **Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention.** - Jou Li, Yilin Zhang, Graham Neubig, Amanda Bertsch._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/millix19/dbsa)](https://github.com/millix19/dbsa)
  - **Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts.**
  - **FReM: A Flexible Reasoning Mechanism for Balancing Quick and Slow Thinking in Long-Context Question Answering.** - Fai Wong._ Arxiv 2025.
- 9. Compress
  - **Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models.** - Aware-Automated-Machine-Learning)](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning)
  - **TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models.**
  - **AdaSVD: Adaptive Singular Value Decomposition for Large Language Models.**
  - **Learning to Compress Prompt in Natural Language Formats.** - Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu._ Arxiv 2024.
  - **LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression.** - Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/LLMLingua)](https://github.com/microsoft/LLMLingua)
  - **PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models.** - for-Prompt-Compression)](https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression)
  - **Adapting Language Models to Compress Contexts.** - nlp/AutoCompressors)](https://github.com/princeton-nlp/AutoCompressors)
  - **Compressing Context to Enhance Inference Efficiency of Large Language Models.**
  - **LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models.** - Yew Lin, Yuqing Yang, Lili Qiu._ Arxiv 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/LLMLingua)](https://github.com/microsoft/LLMLingua)
  - **LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression.** - Yew Lin, Yuqing Yang, Lili Qiu._ Arxiv 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/LLMLingua)](https://github.com/microsoft/LLMLingua)
  - **Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference.**
  - **Compressed Context Memory for Online Language Model Interaction.** - Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song._ ICLR 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/snu-mllab/context-memory)](https://github.com/snu-mllab/context-memory)
  - **PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression.**
  - **Training LLMs over Neurally Compressed Text.** - Dickstein, Noah Constant._ Arxiv 2024.
  - **Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models.**
  - **Adapting LLMs for Efficient Context Processing through Soft Prompt Compression.**
  - **Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.**
  - **System 2 Attention (is something you might need too).**
  - **DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization.**
  - **Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.**
  - **Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression.** - COCO)](https://github.com/OpenMatch/Gist-COCO)
  - **Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models.**
  - **In-Context Learning State Vector with Inner and Momentum Optimization.** - TMG/ICL-State-Vector)](https://github.com/HITsz-TMG/ICL-State-Vector)
  - **Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation.**
  - **Improving Long Text Understanding with Knowledge Distilled from Summarization Model.**
  - **OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning.** - v2)](https://github.com/OpenNLG/OpenBA-v2)
  - **xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token.** - Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/Hannibal046/xRAG)](https://github.com/Hannibal046/xRAG)
  - **SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself.**
  - **Compressing Lengthy Context With UltraGist.** - Pt/UltraGist)](https://github.com/namespace-Pt/UltraGist)
  - **XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference.**
  - **Recurrent Context Compression: Efficiently Expanding the Context Window of LLM.** - G/RCC_Transformer)](https://github.com/WUHU-G/RCC_Transformer)
  - **Evaluating Zero-Shot Long-Context LLM Compression.**
  - **Your Transformer is Secretly Linear.** - Institute/LLM-Microscope)](https://github.com/AIRI-Institute/LLM-Microscope)
  - **In-Context Former: Lightning-fast Compressing Context for Large Language Model.**
  - **UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs.** - xmu/UIO-LLMs)](https://github.com/wenhaoli-xmu/UIO-LLMs)
  - **AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.** - Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han._ MLSys 2024 Best Paper Award. [![GitHub Repo stars](https://img.shields.io/github/stars/mit-han-lab/llm-awq)](https://github.com/mit-han-lab/llm-awq)
  - **PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning.**
  - **Concise and Precise Context Compression for Tool-Using Language Models.**
  - **Context Embeddings for Efficient Answer Generation in RAG.**
  - **InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models.** - Do, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/JuseonDo/InstructCMP)](https://github.com/JuseonDo/InstructCMP)
  - **Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models.**
  - **QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression.**
  - **SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models.**
  - **QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention.**
  - **AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models.**
  - **Characterizing Prompt Compression Methods for Long Context Inference.**
  - **Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference.**
  - **Familiarity-aware Evidence Compression for Retrieval Augmented Generation.** - group/FaviComp)](https://github.com/luka-group/FaviComp)
  - **TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning.**
  - **Parse Trees Guided LLM Prompt Compression.**
  - **FineZip: Pushing the Limits of Large Language Models for Practical Lossless Text Compression.**
  - **Perception Compressor:A training-free prompt compression method in long context scenarios.** - Tao Zheng._ Arxiv 2024.
  - **From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression.**
  - **Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability.** - Yan Yeung._ EMNLP 2024.
  - **Style-Compress: An LLM-Based Prompt Compression Framework Considering Task-Specific Styles.**
  - **Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference.**
  - **Towards Extreme Pruning of LLMs with Plug-and-Play Mixed Sparsity.**
  - **ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning.**
  - **Limits of KV Cache Compression for Tensor Attention based Autoregressive Transformers.**
  - **Merging Feed-Forward Sublayers for Compressed Transformers.** - ffs-compression)](https://github.com/nverma1/merging-ffs-compression/)
  - **A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression.**
  - **Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers.** - EIC/DiffRatio-MoD)](https://github.com/GATECH-EIC/DiffRatio-MoD)
  - **Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs.**
  - **Compressing Context to Enhance Inference Efficiency of Large Language Models.**
  - **System 2 Attention (is something you might need too).**
  - **Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon.**
  - **Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization.**
  - **Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression.** - COCO)](https://github.com/OpenMatch/Gist-COCO)
  - **Compressed Context Memory for Online Language Model Interaction.** - Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song._ ICLR 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/snu-mllab/context-memory)](https://github.com/snu-mllab/context-memory)
  - **PROMPT-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression.**
  - **Training LLMs over Neurally Compressed Text.** - Dickstein, Noah Constant._ Arxiv 2024.
  - **Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models.**
  - **Adapting LLMs for Efficient Context Processing through Soft Prompt Compression.**
  - **LLoCO: Learning Long Contexts Offline.**
  - **In-Context Learning State Vector with Inner and Momentum Optimization.** - TMG/ICL-State-Vector)](https://github.com/HITsz-TMG/ICL-State-Vector)
  - **Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation.**
  - **Improving Long Text Understanding with Knowledge Distilled from Summarization Model.**
  - **OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning.** - v2)](https://github.com/OpenNLG/OpenBA-v2)
  - **XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference.**
  - **In-context Autoencoder for Context Compression in a Large Language Model.** - Qing Chen, Furu Wei._ ICLR 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/getao/icae)](https://github.com/getao/icae)
  - **Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs.**
  - **Recurrent Context Compression: Efficiently Expanding the Context Window of LLM.** - G/RCC_Transformer)](https://github.com/WUHU-G/RCC_Transformer)
  - **LoCoCo: Dropping In Convolutions for Long Context Compression.** - Group/LoCoCo)](https://github.com/VITA-Group/LoCoCo)
  - **InstructCMP: Length Control in Sentence Compression through Instruction-based Large Language Models.** - Do, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/JuseonDo/InstructCMP)](https://github.com/JuseonDo/InstructCMP)
  - **Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference.**
  - **Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models.** - Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie._ CVPR 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/lntzm/HICom)](https://github.com/lntzm/HICom)
  - **DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization.**
  - **Compressing Large Language Models by Streamlining the Unimportant Layer.**
  - **Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization.**
  - **IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining.**
  - **Position-Aware Depth Decay Decoding (D3): Boosting Large Language Model Inference Efficiency.**
  - **Your Transformer is Secretly Linear.** - Institute/LLM-Microscope)](https://github.com/AIRI-Institute/LLM-Microscope)
  - **CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation.**
  - **SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval.** - blue)](https://speechprune.github.io/)
  - **FTP: A Fine-grained Token-wise Pruner for Large Language Models via Token Routing.**
  - **EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation.**
  - **You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning.**
  - **Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference.**
  - **PISCO: Pretty Simple Compression for Retrieval-Augmented Generation.**
  - **Provence: efficient and robust context pruning for retrieval-augmented generation.** - blue)](https://huggingface.co/naver/provence-reranker-debertav3-v1)
  - **FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing.** - Heng Lin, Shikhar Tuli, Haris Jeelani, Shangqian Gao, Yilin Shen, Hongxia Jin, Yen-Chang Hsu._ NAACL 2025.
  - **TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs.**
  - **Activation-Informed Merging of Large Language Models.**
  - **Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training.**
  - **QuEST: Stable Training of LLMs with 1-Bit Weights and Activations.** - DASLab/QuEST)](https://github.com/IST-DASLab/QuEST)
  - **DarwinLM: Evolutionary Structured Pruning of Large Language Models.**
  - **Hyper Compressed Fine-Tuning of Large Foundation Models with Quantum Inspired Adapters.**
  - **Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning.**
  - **Forget the Data and Fine-Tuning! Just Fold the Network to Compress.** - folding-universal)](https://github.com/nanguoyu/model-folding-universal)
  - **NestQuant: Nested Lattice Quantization for Matrix Products and LLMs.**
  - **Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models.**
  - **When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models.**
  - **Optimizing Singular Spectrum for Large Language Model Compression.** - Hsuan Yang._ Arixv 2025.
  - **Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer.**
  - **LLM-Pruner: On the Structural Pruning of Large Language Models.** - Pruner)](https://github.com/horseee/LLM-Pruner)
  - **Knowing When to Stop: Dynamic Context Cutoff for Large Language Models.** - to-stop)](https://github.com/ruoyuxie/when-to-stop)
  - **LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs.**
  - **DAST: Context-Aware Compression in LLMs via Dynamic Allocation of Soft Tokens.** - tao Zheng._ Arxiv 2025.
  - **Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?.**
  - **Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity.**
  - **LLoCO: Learning Long Contexts Offline.**
  - **In-context Autoencoder for Context Compression in a Large Language Model.** - Qing Chen, Furu Wei._ ICLR 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/getao/icae)](https://github.com/getao/icae)
  - **LoCoCo: Dropping In Convolutions for Long Context Compression.** - Group/LoCoCo)](https://github.com/VITA-Group/LoCoCo)
  - **Compressing Large Language Models by Streamlining the Unimportant Layer.**
  - **Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization.**
  - **Efficiently Editing Mixture-of-Experts Models with Compressed Experts.** - he/Compressed-Experts)](https://github.com/yifei-he/Compressed-Experts)
  - **Large Language Model Compression via the Nested Activation-Aware Decomposition.**
  - **Saliency-driven Dynamic Token Pruning for Large Language Models.**
  - **SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression.** - MLSys-Lab/SVD-LLM)](https://github.com/AIoT-MLSys-Lab/SVD-LLM)
  - **Compression Laws for Large Language Models.**
  - **FCoT-VL:Advancing Text-oriented Large Vision-Language Models with Efficient Visual Token Compression.**
  - **Delta Decompression for MoE-based LLMs Compression.**
  - **The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?.**
  - **Compressing Language Models for Specialized Domains.** - compression)](https://github.com/mlsw/domain-compression)
  - **Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners.**
  - **Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck.**
  - **Understanding and Improving Information Preservation in Prompt Compression for LLMs.**
  - **Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models.**
  - **AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation.**
  - **EFPC: Towards Efficient and Flexible Prompt Compression.** - Hao Cao, Yangsong Wang, Shuzheng Hao, Zhenxing Li, Chengjun Zhan, Sichao Liu, Yi-Qi Hu._ Arxiv 2025.
  - **DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models.**
  - **DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models.**
  - **Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation.** - dhy/EDC-2-RAG)](https://github.com/Tsinghua-dhy/EDC-2-RAG)
  - **When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks.**
  - **Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression.**
- 11. Benchmark and Evaluation
  - **LR2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems.** - blue)](https://huggingface.co/spaces/UltraRonin/LR2Bench)
  - **LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation.** - coai/LOT-LongLM)](https://github.com/thu-coai/LOT-LongLM)
  - **LUQ: Long-text Uncertainty Quantification for LLMs.**
  - **Long-context LLMs Struggle with Long In-context Learning.** - AI-Lab/LongICLBench)](https://github.com/TIGER-AI-Lab/LongICLBench)
  - **CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems.**
  - **XL2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies.** - nlp/XL2Bench)](https://github.com/nuaa-nlp/XL2Bench)
  - **SCROLLS: Standardized CompaRison Over Long Language Sequences.** - nlp/scrolls)](https://github.com/tau-nlp/scrolls)
  - **MuLD: The Multitask Long Document Benchmark.**
  - **Lost in the Middle: How Language Models Use Long Contexts.** - liu/lost-in-the-middle)](https://github.com/nelson-liu/lost-in-the-middle)
  - **L-Eval: Instituting Standardized Evaluation for Long Context Language Models.**
  - **LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding.**
  - **Content Reduction, Surprisal and Information Density Estimation for Long Documents.**
  - **The Impact of Reasoning Step Length on Large Language Models.**
  - **LongHealth: A Question Answering Benchmark with Long Clinical Documents.** - Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem._ Arxiv 2024.
  - **Long-form evaluation of model editing.**
  - **In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss.**
  - **DocFinQA: A Long-Context Financial Reasoning Dataset.** - Kedziorski, Viet Dac Lai, Chris Tanner._ Arxiv 2024.
  - **LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents.**
  - **PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models.**
  - **∞Bench: Extending Long Context Evaluation Beyond 100K Tokens.**
  - **Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models.** - Task-More-Tokens)](https://github.com/alonj/Same-Task-More-Tokens)
  - **Evaluating Very Long-Term Conversational Memory of LLM Agents.** - Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/snap-research/LoCoMo)](https://github.com/snap-research/LoCoMo)
  - **Needle in a haystack - pressure testing llms.**
  - **Language Models as Science Tutors.** - Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/princeton-nlp/LM-Science-Tutor)](https://github.com/princeton-nlp/LM-Science-Tutor)
  - **In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss.**
  - **LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.**
  - **Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models.** - Stars)](https://github.com/nick7nlp/Counting-Stars)
  - **NovelQA: A Benchmark for Long-Range Novel Question Answering.**
  - **CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models.**
  - **Long-form factuality in large language models.** - deepmind/long-form-factuality)](https://github.com/google-deepmind/long-form-factuality)
  - **LongEmbed: Extending Embedding Models for Long Context Retrieval.** - pku/LongEmbed)](https://github.com/dwzhu-pku/LongEmbed)
  - **Make Your LLM Fully Utilize the Context.** - Guang Lou._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/FILM)](https://github.com/microsoft/FILM)
  - **Many-shot Jailbreaking.**
  - **Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors.**
  - **S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models.**
  - **In-Context Learning with Long-Context Models: An In-Depth Exploration.** - context-icl)](https://github.com/abertsch72/long-context-icl)
  - **Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.** - compass/Ada-LEval)](https://github.com/open-compass/Ada-LEval)
  - **RULER: What's the Real Context Size of Your Long-Context Language Models?.** - Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/hsiehjackson/RULER)](https://github.com/hsiehjackson/RULER)
  - **DOLOMITES: Domain-Specific Long-Form Methodical Tasks.**
  - **Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis.**
  - **FinTextQA: A Dataset for Long-form Financial Question Answering.**
  - **A Multi-Perspective Analysis of Memorization in Large Language Models.**
  - **Language Models Need Inductive Biases to Count Inductively.**
  - **Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding.** - Seng Chua._ Arxiv 2024.
  - **BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack.**
  - **Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!.**
  - **What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling.**
  - **Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.**
  - **Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models.**
  - **Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?.** - Wei Chang, Kelvin Guu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/google-deepmind/loft)](https://github.com/google-deepmind/loft)
  - **LongIns: A Challenging Long-context Instruction-based Exam for LLMs.**
  - **Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA.**
  - **Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell.** - dont-tell)](https://github.com/TaiMingLu/know-dont-tell)
  - **USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations.**
  - **Entity-Level Sentiment: More than the Sum of Its Parts.**
  - **Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction.**
  - **RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension.**
  - **Attribute or Abstain: Large Language Models as Long Document Assistants.** - attribute-or-abstain)](https://github.com/UKPLab/arxiv2024-attribute-or-abstain)
  - **How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities.**
  - **DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems.** - Zou/DocBench)](https://github.com/Anni-Zou/DocBench)
  - **KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches.** - Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/henryzhongsc/longctx_bench)](https://github.com/henryzhongsc/longctx_bench)
  - **Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP.**
  - **Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems.** - Sheng Wu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/salesforce/summary-of-a-haystack)](https://github.com/salesforce/summary-of-a-haystack)
  - **VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation.** - Song/VeriScore)](https://github.com/Yixiao-Song/VeriScore)
  - **ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.**
  - **NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?.** - compass/opencompass)](https://github.com/open-compass/opencompass)
  - **LongLaMP: A Benchmark for Personalized Long-form Text Generation.** - blue)](https://longlamp-benchmark.github.io/)
  - **RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering.** - qa-arena)](https://github.com/awslabs/rag-qa-arena)
  - **Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models.** - S Dovonon, Jean Kaddour, Pasquale Minervini._ ICML 2024 TF2M workshop.
  - **Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack.** - USC/Lifelong-ICL)](https://github.com/INK-USC/Lifelong-ICL)
  - **Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks.**
  - **A Controlled Study on Long Context Extension and Generalization in LLMs.**
  - **RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues.** - Lin Kuo, Feng-Ting Liao, Mu-Wei Hsieh, Fu-Chieh Chang, Po-Chun Hsu, Da-Shan Shiu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/mtkresearch/RAD-Bench)](https://github.com/mtkresearch/RAD-Bench)
  - **Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation.** - blue)](https://huggingface.co/datasets/google/frames-benchmark)
  - **Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries.** - Baptiste Lespiau, Nithya Attaluri, Kate Olszewska._ Arxiv 2024.
  - **DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels.**
  - **LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA.**
  - **WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries.**
  - **Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach.**
  - **Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval.** - key-retrieval-code-tasks)](https://github.com/apple/ml-key-retrieval-code-tasks)
  - **Long Input Benchmark for Russian Analysis.**
  - **HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.** - bench)](https://github.com/Tintri/hello-bench)
  - **Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs.** - thu/LongPiBench)](https://github.com/Rachum-thu/LongPiBench)
  - **Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models.** - needle-in-a-haystack)](https://github.com/AmeyHengle/multilingual-needle-in-a-haystack)
  - **LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs.** - Wei Lee._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/mozhu621/LongGenBench)](https://github.com/mozhu621/LongGenBench/)
  - **What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.**
  - **MileBench: Benchmarking MLLMs in Long Context.**
  - **MovieSum: An Abstractive Summarization Dataset for Movie Screenplays.**
  - **SEED-Story: Multimodal Long Story Generation with Large Language Model.** - Story)](https://github.com/TencentARC/SEED-Story)
  - **MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs.** - Peng Lim, Caiming Xiong, Doyen Sahoo._ Arxiv 2024.
  - **Hyper-multi-step: The Truth Behind Difficult Long-context Tasks.**
  - **Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation.**
  - **Many-Shot In-Context Learning in Multimodal Foundation Models.**
  - **MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.**
  - **RepoQA: Evaluating Long Context Code Understanding.**
  - **Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding.**
  - **Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models.** - ML-Lab/multimodal-needle-in-a-haystack)](https://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack)
  - **Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts.** - blue)](https://locovqa.github.io/)
  - **InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.** - XComposer)](https://github.com/InternLM/InternLM-XComposer)
  - **Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge.** - Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Byungsoo Ko, Jonghwan Hyeon, Ho-Jin Choi._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/passing2961/Stark)](https://github.com/passing2961/Stark)
  - **SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers.**
  - **MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.** - Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/mayubo2333/MMLongBench-Doc)](https://github.com/mayubo2333/MMLongBench-Doc)
  - **LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding.**
  - **mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval.** - blue)](https://huggingface.co/Alibaba-NLP/gte-multilingual-base)
  - **A Benchmark for Long-Form Medical Question Answering.** - ai/medical-eval-sphere)](https://github.com/lavita-ai/medical-eval-sphere)
  - **Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows.** - blue)](https://spider2-sql.github.io/)
  - **LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos.**
  - **Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?.** - roberts1/needle-threading)](https://github.com/jonathan-roberts1/needle-threading)
  - **M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework.** - blue)](https://multimodal-documents.github.io/)
  - **DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models.**
  - **ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario.**
  - **Long Context vs. RAG for LLMs: An Evaluation and Revisits.**
  - **LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations.** - deepmind/lm_act)](https://github.com/google-deepmind/lm_act)
  - **Neptune: The Long Orbit to Benchmarking Long Video Understanding.** - deepmind/neptune)](https://github.com/google-deepmind/neptune)
  - **CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning.** - Ah Kim, Michael P Brenner, Viren Jain, Sameera Ponda, Subhashini Venugopalan._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/google/curie)](https://github.com/google/curie)
  - **L2M: Mutual Information Scaling Law for Long-Context Language Modeling.**
  - **DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities.**
  - **An Empirical Study of Mamba-based Language Models.** - LM)](https://github.com/NVIDIA/Megatron-LM/tree/ssm/examples/mamba)
  - **CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification.**
  - **MIR-Bench: Benchmarking LLM's Long-Context Intelligence via Many-Shot In-Context Inductive Reasoning.** - Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen._ Arxiv 2025.
  - **LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing.**
  - **Does RAG Really Perform Bad For Long-Context Processing?.**
  - **Long Range Arena : A Benchmark for Efficient Transformers.** - research/long-range-arena)](https://github.com/google-research/long-range-arena)
  - **MuLD: The Multitask Long Document Benchmark.**
  - **Lost in the Middle: How Language Models Use Long Contexts.** - liu/lost-in-the-middle)](https://github.com/nelson-liu/lost-in-the-middle)
  - **L-Eval: Instituting Standardized Evaluation for Long Context Language Models.**
  - **LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding.**
  - **Content Reduction, Surprisal and Information Density Estimation for Long Documents.**
  - **LooGLE: Long Context Evaluation for Long-Context Language Models.** - nlco/loogle)](https://github.com/bigai-nlco/loogle)
  - **The Impact of Reasoning Step Length on Large Language Models.**
  - **DocFinQA: A Long-Context Financial Reasoning Dataset.** - Kedziorski, Viet Dac Lai, Chris Tanner._ Arxiv 2024.
  - **LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents.**
  - **PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models.**
  - **Long-form evaluation of model editing.**
  - **In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss.**
  - **Language Models as Science Tutors.** - Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/princeton-nlp/LM-Science-Tutor)](https://github.com/princeton-nlp/LM-Science-Tutor)
  - **Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks.** - compass/Ada-LEval)](https://github.com/open-compass/Ada-LEval)
  - **RULER: What's the Real Context Size of Your Long-Context Language Models?.** - Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/hsiehjackson/RULER)](https://github.com/hsiehjackson/RULER)
  - **S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models.**
  - **In-Context Learning with Long-Context Models: An In-Depth Exploration.** - context-icl)](https://github.com/abertsch72/long-context-icl)
  - **DOLOMITES: Domain-Specific Long-Form Methodical Tasks.**
  - **Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis.**
  - **FinTextQA: A Dataset for Long-form Financial Question Answering.**
  - **A Multi-Perspective Analysis of Memorization in Large Language Models.**
  - **OLAPH: Improving Factuality in Biomedical Long-form Question Answering.** - lab/OLAPH)](https://github.com/dmis-lab/OLAPH)
  - **Can LLMs Solve longer Math Word Problems Better?.** - USTC/CoLeG-Math)](https://github.com/XinXU-USTC/CoLeG-Math)
  - **Base of RoPE Bounds Context Length.**
  - **Many-shot In-Context Learning.** - Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle._ Arxiv 2024.
  - **Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models.**
  - **Language Models Need Inductive Biases to Count Inductively.**
  - **CRAG -- Comprehensive RAG Benchmark.** - tau Yih, Xin Luna Dong._ Arxiv 2024. [![Static Badge](https://img.shields.io/badge/Homepage-blue)](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024)
  - **BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack.**
  - **Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!.**
  - **What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling.**
  - **Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?.** - Wei Chang, Kelvin Guu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/google-deepmind/loft)](https://github.com/google-deepmind/loft)
  - **Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell.** - dont-tell)](https://github.com/TaiMingLu/know-dont-tell)
  - **MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens.** - fans/MedOdyssey)](https://github.com/JOHNNY-fans/MedOdyssey)
  - **USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations.**
  - **Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization.** - Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister._ Arxiv 2024.
  - **One Thousand and One Pairs: A "novel" challenge for long-context language models.**
  - **LongIns: A Challenging Long-context Instruction-based Exam for LLMs.**
  - **Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA.**
  - **VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation.** - Song/VeriScore)](https://github.com/Yixiao-Song/VeriScore)
  - **ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models.**
  - **Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP.**
  - **Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems.** - Sheng Wu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/salesforce/summary-of-a-haystack)](https://github.com/salesforce/summary-of-a-haystack)
  - **NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?.** - compass/opencompass)](https://github.com/open-compass/opencompass)
  - **LongLaMP: A Benchmark for Personalized Long-form Text Generation.** - blue)](https://longlamp-benchmark.github.io/)
  - **RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering.** - qa-arena)](https://github.com/awslabs/rag-qa-arena)
  - **Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models.** - S Dovonon, Jean Kaddour, Pasquale Minervini._ ICML 2024 TF2M workshop.
  - **Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack.** - USC/Lifelong-ICL)](https://github.com/INK-USC/Lifelong-ICL)
  - **WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries.**
  - **Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach.**
  - **Evaluating Long Range Dependency Handling in Code Generation Models using Multi-Step Key Retrieval.** - key-retrieval-code-tasks)](https://github.com/apple/ml-key-retrieval-code-tasks)
  - **CoverBench: A Challenging Benchmark for Complex Claim Verification.** - David, Uri Shaham, Amir Feder, Mor Geva, Dror Marcus, Avi Caciularu._ Arxiv 2024. [![Static Badge](https://img.shields.io/badge/Homepage-blue)](https://huggingface.co/datasets/google/coverbench)
  - **Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models.** - needle-in-a-haystack)](https://github.com/AmeyHengle/multilingual-needle-in-a-haystack)
  - **LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs.** - Wei Lee._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/mozhu621/LongGenBench)](https://github.com/mozhu621/LongGenBench/)
  - **Retrieval meets Long Context Large Language Models.**
  - **What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices.**
  - **Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks.**
  - **A Controlled Study on Long Context Extension and Generalization in LLMs.**
  - **RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues.** - Lin Kuo, Feng-Ting Liao, Mu-Wei Hsieh, Fu-Chieh Chang, Po-Chun Hsu, Da-Shan Shiu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/mtkresearch/RAD-Bench)](https://github.com/mtkresearch/RAD-Bench)
  - **Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation.** - blue)](https://huggingface.co/datasets/google/frames-benchmark)
  - **An Empirical Study of Mamba-based Language Models.** - LM)](https://github.com/NVIDIA/Megatron-LM/tree/ssm/examples/mamba)
  - **Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries.** - Baptiste Lespiau, Nithya Attaluri, Kate Olszewska._ Arxiv 2024.
  - **DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels.**
  - **LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA.**
  - **Multilingual Evaluation of Long Context Retrieval and Reasoning.**
  - **L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?** - CITEEVAL)](https://github.com/ZetangForward/L-CITEEVAL)
  - **HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly.** - nlp/HELMET)](https://github.com/princeton-nlp/HELMET)
  - **MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs.** - Peng Lim, Caiming Xiong, Doyen Sahoo._ Arxiv 2024.
  - **Hyper-multi-step: The Truth Behind Difficult Long-context Tasks.**
  - **Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data.**
  - **How much do contextualized representations encode long-range context?.** - Ping Hsieh._ Arxiv 2024.
  - **LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.** - Wei Chang, Dong Yu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/xiaowu0162/LongMemEval)](https://github.com/xiaowu0162/LongMemEval)
  - **When Attention Sink Emerges in Language Models: An Empirical View.** - sg/Attention-Sink)](https://github.com/sail-sg/Attention-Sink)
  - **ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage.** - lab/ETHIC)](https://github.com/dmis-lab/ETHIC)
  - **Long2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall.**
  - **LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios.**
  - ![GitHub Repo stars - ai/Spider2)
  - **MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding.**
  - **MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.** - Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/mayubo2333/MMLongBench-Doc)](https://github.com/mayubo2333/MMLongBench-Doc)
  - **Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge.** - Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Byungsoo Ko, Jonghwan Hyeon, Ho-Jin Choi._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/passing2961/Stark)](https://github.com/passing2961/Stark)
  - **SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers.**
  - **mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval.** - blue)](https://huggingface.co/Alibaba-NLP/gte-multilingual-base)
  - **LCFO: Long Context and Long Form Output Dataset and Benchmarking.** - jussà, Pierre Andrews, Mariano Coria Meglioli, Joy Chen, Joe Chuang, David Dale, Christophe Ropers, Alexandre Mourachko, Eduardo Sánchez, Holger Schwenk, Tuan Tran, Arina Turkatenko, Carleigh Wood._ Arxiv 2024.
  - **SCBench: A KV Cache-Centric Analysis of Long-Context Methods.** - blue)](https://hqjiang.com/scbench.html)
  - **LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks.**
  - **XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation.**
  - **RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation.**
  - **MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems.** - Suk Lee, Lucian Popa, Vraj Shah, Huaiyu Zhu, Danish Contractor, Marina Danilevsky._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/ibm/mt-rag-benchmark)](https://github.com/ibm/mt-rag-benchmark)
  - **VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation.**
  - **Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation.**
  - **LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating.** - Zhi Li, Jian Xu, Xiao-Hui Li, Yuan Gao, Jun Song, Bo Zheng, Cheng-Lin Liu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/google-deepmind/neptune)](https://github.com/google-deepmind/neptune)
  - **HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding.**
  - **SQLong: Enhanced NL2SQL for Longer Contexts with LLMs.** - Fang Li, Long Duong._ Arxiv 2025.
  - **Compression Scaling Laws:Unifying Sparsity and Quantization.**
  - **LongSafety: Evaluating Long-Context Safety of Large Language Models.** - coai/LongSafety)](https://github.com/thu-coai/LongSafety)
  - **EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges.** - blue)](https://scale.com/leaderboard/enigma_eval)
  - **NoLiMa: Long-Context Evaluation Beyond Literal Matching.**
  - **Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context.**
  - **BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation.**
  - **Demystifying Long Chain-of-Thought Reasoning in LLMs.** - AI-Lab/gsm)](https://github.com/Infini-AI-Lab/gsm)
  - **Explaining Context Length Scaling and Bounds for Language Models.** - Neng Hwang, Serge Belongie, Lei Li._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/JingzheShi/NLPCtlScalingAndBounds)](https://github.com/JingzheShi/NLPCtlScalingAndBounds)
  - **Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings.** - blue)](https://catch-tag-release.github.io/)
  - **LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion.** - Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen._ Arxiv 2025.
  - **RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems?.** - blue)](https://huggingface.co/RedStar-Reasoning)
  - **MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents.** - blue)](https://huggingface.co/MMDocIR)
  - **Long Range Arena : A Benchmark for Efficient Transformers.** - research/long-range-arena)](https://github.com/google-research/long-range-arena)
  - **Base of RoPE Bounds Context Length.**
  - **Many-shot In-Context Learning.** - Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, Hugo Larochelle._ Arxiv 2024.
  - **CRAG -- Comprehensive RAG Benchmark.** - tau Yih, Xin Luna Dong._ Arxiv 2024. [![Static Badge](https://img.shields.io/badge/Homepage-blue)](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024)
  - **CoverBench: A Challenging Benchmark for Complex Claim Verification.** - David, Uri Shaham, Amir Feder, Mor Geva, Dror Marcus, Avi Caciularu._ Arxiv 2024. [![Static Badge](https://img.shields.io/badge/Homepage-blue)](https://huggingface.co/datasets/google/coverbench)
  - **Multilingual Evaluation of Long Context Retrieval and Reasoning.**
  - **L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?** - CITEEVAL)](https://github.com/ZetangForward/L-CITEEVAL)
  - **HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly.** - nlp/HELMET)](https://github.com/princeton-nlp/HELMET)
  - **Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data.**
  - **How much do contextualized representations encode long-range context?.** - Ping Hsieh._ Arxiv 2024.
  - **LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory.** - Wei Chang, Dong Yu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/xiaowu0162/LongMemEval)](https://github.com/xiaowu0162/LongMemEval)
  - **When Attention Sink Emerges in Language Models: An Empirical View.** - sg/Attention-Sink)](https://github.com/sail-sg/Attention-Sink)
  - **ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage.** - lab/ETHIC)](https://github.com/dmis-lab/ETHIC)
  - **Long2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall.**
  - **LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios.**
  - ![GitHub Repo stars - ai/Spider2)
  - ![Static Badge
  - **MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos.** - Rong Wen._ Arxiv 2025.
  - **LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression.** - project/LVLM-compress-bench)](https://github.com/opengear-project/LVLM-compress-bench)
  - **NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables.**
  - **DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities.**
  - **Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision.** - prog123/LongRePS)](https://github.com/lemon-prog123/LongRePS)
  - **U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack.** - KGLLM/U-NIAH)](https://github.com/Tongji-KGLLM/U-NIAH)
  - **One ruler to measure them all: Benchmarking multilingual long-context language models.**
- 12. Long Text Generation
  - **Suri: Multi-constraint Instruction Following for Long-form Text Generation.**
  - **Context-Preserving Gradient Modulation for Large Language Models: A Novel Approach to Semantic Consistency in Long-Form Text Generation.**
  - **Integrating Planning into Single-Turn Long-Form Text Generation.**
  - **LoGU: Long-form Generation with Uncertainty Expressions.**
  - **LongGenBench: Long-context Generation Benchmark.**
  - **Large Language Models Still Exhibit Bias in Long Text.**
  - **Language Models can Self-Lengthen to Generate Long Texts.** - Lengthen)](https://github.com/QwenLM/Self-Lengthen)
  - **LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs.**
  - **Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation.**
  - **LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation.** - pli/LongProc)](https://github.com/princeton-pli/LongProc)
  - **The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input.** - blue)](https://www.kaggle.com/facts-leaderboard)
  - **A Cognitive Writing Perspective for Constrained Long-Form Text Generation.**
  - **CLIPPER: Compression enables long-context synthetic data generation.**
  - **LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models.** - Li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/THU-KEG/LongWriter-V)](https://github.com/THU-KEG/LongWriter-V)
  - **Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key.**
  - **Large Language Models Still Exhibit Bias in Long Text.**
  - **LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information.**
  - **DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation.** - LFAG/DeFine_Dataset)](https://github.com/DeFine-LFAG/DeFine_Dataset)
  - **Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation.** - Writer)](https://github.com/OnlyAR/RAL-Writer)
  - **Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models.** - ai/heterogeneous-recursive-planning)](https://github.com/principia-ai/heterogeneous-recursive-planning)
  - **Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement.**
  - **ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation.**
  - ![Static Badge - pli.github.io/LongProc/)
  - **Learning to Reason for Long-Form Story Generation.**
  - **ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning.** - NLP-Chang/ThinkPrune)](https://github.com/UCSB-NLP-Chang/ThinkPrune)
  - **LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm.** - Navarro, Chenghua Lin._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/Wusiwei0410/LongEval)](https://github.com/Wusiwei0410/LongEval)
  - **From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens.** - nlco/TokenSwift)](https://github.com/bigai-nlco/TokenSwift)
  - **RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery.**
  - **Think When You Need: Self-Adaptive Chain-of-Thought Learning.**
  - **LLM×MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources.**
- 13. Long CoT
  - **LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!.** - AI/SkyThought)](https://github.com/NovaSky-AI/SkyThought)
  - **Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning.**
  - **CoT-Valve: Length-Compressible Chain-of-Thought Tuning.** - Valve)](https://github.com/horseee/CoT-Valve)
  - **Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity.**
  - **Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning.**
  - **Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?.**
  - **Towards Widening The Distillation Bottleneck for Reasoning Models.**
  - **What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret.**
  - **START: Self-taught Reasoner with Tools.**
  - **L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning.**
  - **SKIntern: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models.**
  - **Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities.**
  - **Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval.**
  - **PENCIL: Long Thoughts with Short Memory.**
  - **SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities.** - blue)](https://safe-chain.github.io/)
  - **Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning.**
  - **Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?.**
  - **TokenSkip: Controllable Chain-of-Thought Compression in LLMs.**
  - **Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning.**
  - **Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs.**
  - **O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?.** - NLP/O1-Journey)](https://github.com/GAIR-NLP/O1-Journey)
  - **OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning.** - BJTU/OpenRFT)](https://github.com/ADaM-BJTU/OpenRFT)
  - **When More is Less: Understanding Chain-of-Thought Length in LLMs.**
  - **Monte Carlo Tree Diffusion for System 2 Planning.**
  - **DRT: Deep Reasoning Translation via Long Chain-of-Thought.** - o1)](https://github.com/krystalan/DRT-o1)
  - **InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models.**
  - **Long Is More Important Than Difficult for Training Reasoning Models.**
  - **LightThinker: Thinking Step-by-Step Compression.**
  - **SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild.** - nlp/simpleRL-reason)](https://github.com/hkust-nlp/simpleRL-reason)
  - **Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning.** - Long Sun, Zhun Sun, Houwen Peng, Han-Jia Ye._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/sun-hailong/TVC)](https://github.com/sun-hailong/TVC)
  - **MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving.**
  - **"Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding.**
  - **Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond.** - R1)](https://github.com/Qihoo360/Light-R1)
  - **Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering.**
  - **TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance.**
- 3. Recurrent Transformers
  - ![GitHub Repo stars - infctx-trainer)
  - **Transformer-XL: Attentive language models beyond a fixed-length context.** - xl)](https://github.com/kimiyoung/transformer-xl)
  - **Memformer: The memory-augmented transformer.**
  - **Compressive Transformers for Long-Range Sequence Modelling.** - transformer-pytorch)](https://github.com/lucidrains/compressive-transformer-pytorch)
  - **ERNIE-Doc: A Retrospective Long-Document Modeling Transformer.** - IJCNLP 2021.
  - **Memorizing Transformers.** - transformers-pytorch)](https://github.com/lucidrains/memorizing-transformers-pytorch)
  - **Recurrent Attention Networks for Long-text Modeling.**
  - ![GitHub Repo stars
  - **Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model.**
  - **Scaling Transformer to 1M tokens and beyond with RMT.**
  - **Block-Recurrent Transformers.** - recurrent-transformer-pytorch)](https://github.com/lucidrains/block-recurrent-transformer-pytorch)
  - **TRAMS: Training-free Memory Selection for Long-range Language Modeling.**
  - **Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence.** - Jie Zhu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/RWKV/RWKV-LM)](https://github.com/RWKV/RWKV-LM)
  - **Extensible Embedding: A Flexible Multipler For LLM's Context Length.**
  - ![GitHub Repo stars
  - **Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.**
  - **Just read twice: closing the recall gap for recurrent language models.** - linear-attention)](https://github.com/HazyResearch/prefix-linear-attention)
  - **Linearizing Large Language Models.** - ML/linear_open_lm)](https://github.com/TRI-ML/linear_open_lm)
  - **VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models.** - hou/VisualRWKV)](https://github.com/howard-hou/VisualRWKV)
  - **GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression.** - paper)](https://github.com/recursal/GoldFinch-paper)
  - **xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference.** - AI/xlstm)](https://github.com/NX-AI/xlstm)
  - **Compressive Transformers for Long-Range Sequence Modelling.** - transformer-pytorch)](https://github.com/lucidrains/compressive-transformer-pytorch)
  - **RWKV: Reinventing RNNs for the Transformer Era.** - Jie Zhu._ Arxiv 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/BlinkDL/RWKV-LM)](https://github.com/BlinkDL/RWKV-LM)
  - **Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models.** - Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre._ Arxiv 2024.
  - **Extensible Embedding: A Flexible Multipler For LLM's Context Length.**
  - ![GitHub Repo stars
  - **Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.**
  - **Linearizing Large Language Models.** - ML/linear_open_lm)](https://github.com/TRI-ML/linear_open_lm)
  - **VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models.** - hou/VisualRWKV)](https://github.com/howard-hou/VisualRWKV)
  - **Associative Recurrent Memory Transformer.** - recurrent-memory-transformer)](https://github.com/RodkinIvan/associative-recurrent-memory-transformer)
  - **Analysis of Argument Structure Constructions in a Deep Recurrent Language Model.**
  - **RecurrentGemma: Moving Past Transformers for Efficient Open Language Models.** - Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas._ Arxiv 2024.
  - **RWKV: Reinventing RNNs for the Transformer Era.** - Jie Zhu._ Arxiv 2023. [![GitHub Repo stars](https://img.shields.io/github/stars/BlinkDL/RWKV-LM)](https://github.com/BlinkDL/RWKV-LM)
  - **Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models.** - Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre._ Arxiv 2024.
- 8. Agent
  - **A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis.**
  - **LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration.**
  - **PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents.**
  - **AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents.** - Austin-RPL/amago)](https://github.com/UT-Austin-RPL/amago)
  - **Chain of Agents: Large Language Models Collaborating on Long-Context Tasks.**
  - **GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models.**
  - **Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks.** - VL/Optimus-1)](https://github.com/JiuTian-VL/Optimus-1)
  - **Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks.**
  - **PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents.**
  - **Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks.**
  - **Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks.** - VL/Optimus-1)](https://github.com/JiuTian-VL/Optimus-1)
- 10. Long Video and Image
  - **LongVILA: Scaling Long-Context Visual Language Models for Long Videos.**
  - **DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework.**
  - **Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding.** - Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai._ ECCV 2024 Workshop. [![GitHub Repo stars](https://img.shields.io/github/stars/joslefaure/HERMES)](https://github.com/joslefaure/HERMES)
  - **EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture.** - apps/EasyAnimate)](https://github.com/aigc-apps/EasyAnimate)
  - **VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.** - Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal._ Arxiv 2024.
  - **PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization.**
  - **Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies.** - Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/ander1119/TiM)](https://github.com/ander1119/TiM)
  - **Towards Event-oriented Long Video Understanding.** - Rong Wen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/RUCAIBox/Event-Bench)](https://github.com/RUCAIBox/Event-Bench)
  - **An End-to-End Speech Summarization Using Large Language Model.**
  - **KeyVideoLLM: Towards Large-scale Video Keyframe Selection.**
  - **OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding.**
  - **MATE: Meet At The Embedding -- Connecting Images with Long Texts.**
  - **Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models.**
  - **SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation.** - Wei Chang, Lingjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/slowfast-vgen/slowfast-vgen)](https://github.com/slowfast-vgen/slowfast-vgen)
  - **LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation.**
  - **mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.** - PLUG/mPLUG-Owl)](https://github.com/X-PLUG/mPLUG-Owl)
  - **VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges.** - nlco/VideoLLaMB)](https://github.com/bigai-nlco/VideoLLaMB)
  - **Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.**
  - **LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture.**
  - **VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models.**
  - **T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs.**
  - **Temporal Preference Optimization for Long-Form Video Understanding.** - Levy._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/ruili33/TPO)](https://github.com/ruili33/TPO)
  - **Latent Swap Joint Diffusion for Long-Form Audio Generation.** - blue)](https://swapforward.github.io/)
  - **Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding.**
  - **Owl-1: Omni World Model for Consistent Long Video Generation.** - yh/Owl)](https://github.com/huang-yh/Owl)
  - **MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation.**
  - **Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies.** - Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/ander1119/TiM)](https://github.com/ander1119/TiM)
  - **Towards Event-oriented Long Video Understanding.** - Rong Wen._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/RUCAIBox/Event-Bench)](https://github.com/RUCAIBox/Event-Bench)
  - **An End-to-End Speech Summarization Using Large Language Model.**
  - **OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding.**
  - **MATE: Meet At The Embedding -- Connecting Images with Long Texts.**
  - **mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.** - PLUG/mPLUG-Owl)](https://github.com/X-PLUG/mPLUG-Owl)
  - **ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos.**
  - **ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding.** - ReTaKe)](https://github.com/SCZwangxiao/video-ReTaKe)
  - **LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token.** - ReTaKe)](https://github.com/SCZwangxiao/video-ReTaKe)
  - **VCA: Video Curious Agent for Long Video Understanding.**
  - **Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory.**
  - **ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos.**
  - **Adaptive Keyframe Sampling for Long Video Understanding.**
  - **VideoRoPE: What Makes for Good Video Rotary Position Embedding?.**
  - **Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing.**
  - **AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding.** - FlexReduc)](https://github.com/SCZwangxiao/video-FlexReduc)
  - **Atlas: Multi-Scale Attention Improves Long Context Image Modeling.**
- 16. Blogs
- 1. Survey Papers
  - **Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey.** - llms-learning)](https://github.com/Strivin0311/long-llms-learning)
  - **Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding.**
  - **The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey.**
  - **Efficient Transformers: A Survey.**
  - **A Survey on Efficient Inference for Large Language Models.** - Ping Zhang, Yuhan Dong, Yu Wang._ Arxiv 2024.
  - **State Space Model for New-Generation Network Alternative to Transformers: A Survey.** - AHU/Mamba_State_Space_Model_Paper_List)](https://github.com/Event-AHU/Mamba_State_Space_Model_Paper_List)
  - **A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models.** - Seng Chua, Qing Li._ Arxiv 2024.
  - **Evaluation of Retrieval-Augmented Generation: A Survey.** - RAG-Evaluation)](https://github.com/YHPeter/Awesome-RAG-Evaluation)
  - **The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.**
  - **Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption.** - charlie/Awesome-KV-Cache)](https://github.com/zcli-charlie/Awesome-KV-Cache)
  - **Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely.**
  - **Prompt Compression for Large Language Models: A Survey.**
  - **A Survey on Mamba Architecture for Vision Applications.**
  - **Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models.** - Fai Wong._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/DevoAllen/Awesome-Reasoning-Economy-Paper)](https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers)
  - **Efficient Inference for Large Reasoning Models: A Survey.** - Efficient-Inference-for-LRMs)](https://github.com/yueliu1999/Awesome-Efficient-Inference-for-LRMs)
  - **Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models.**
  - **Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques.**
  - **A Survey on Transformer Context Extension: Approaches and Evaluation.**
  - **A Survey on Knowledge-Oriented Retrieval-Augmented Generation.** - Papers-Retrieval-Augmented-Generation)](https://github.com/USTCAGI/Awesome-Papers-Retrieval-Augmented-Generation)
  - **A Survey of RWKV.** - Survey)](https://github.com/MLGroupJLU/RWKV-Survey)
  - **A Survey on Large Language Model Acceleration based on KV Cache Management.** - Lab/Awesome-KV-Cache-Management)](https://github.com/TreeAI-Lab/Awesome-KV-Cache-Management)
  - **A Survey on Long Text Modeling with Transformers.**
  - **Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art.**
  - **Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding.**
  - **Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey.** - Compression)](https://github.com/SrGrace/Contextual-Compression)
  - **Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models.** - Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Shaochen Zhong, Hanjie Chen, Xia Hu._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/Eclipsess/Awesome-Efficient-Reasoning-LLMs)](https://github.com/Eclipsess/Awesome-Efficient-Reasoning-LLMs)
  - **Thus Spake Long-Context Large Language Model.** - Spake-Long-Context-LLM)](https://github.com/OpenMOSS/Thus-Spake-Long-Context-LLM)
  - **A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond.** - Sheng Hua, Bowen Zhou, Yu Cheng._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/XiaoYee/Awesome_Efficient_LRM_Reasoning)](https://github.com/XiaoYee/Awesome_Efficient_LRM_Reasoning)
  - **A Survey on Long Text Modeling with Transformers.**
  - **Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art.**
  - **Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey.** - Compression)](https://github.com/SrGrace/Contextual-Compression)
  - **Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models.** - Long-Chain-of-Thought-Reasoning)](https://github.com/LightChen233/Awesome-Long-Chain-of-Thought-Reasoning)
  - **A Survey on Structured State Space Sequence (S4) Models.**
- 15. Technical Report
  - **Gemma 3 Technical Report.** - bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin, Robert Busa-Fekete, Alex Feng, Noveen Sachdeva, Benjamin Coleman, Yi Gao, Basil Mustafa, Iain Barr, Emilio Parisotto, David Tian, Matan Eyal, Colin Cherry, Jan-Thorsten Peter, Danila Sinopalnikov, Surya Bhupatiraju, Rishabh Agarwal, Mehran Kazemi, Dan Malkin, Ravin Kumar, David Vilar, Idan Brusilovsky, Jiaming Luo, Andreas Steiner, Abe Friesen, Abhanshu Sharma, Abheesht Sharma, Adi Mayrav Gilady, Adrian Goedeckemeyer, Alaa Saade, Alex Feng, Alexander Kolesnikov, Alexei Bendebury, Alvin Abdagic, Amit Vadi, András György, André Susano Pinto, Anil Das, Ankur Bapna, Antoine Miech, Antoine Yang, Antonia Paterson, Ashish Shenoy, Ayan Chakrabarti, Bilal Piot, Bo Wu, Bobak Shahriari, Bryce Petrini, Charlie Chen, Charline Le Lan, Christopher A. Choquette-Choo, CJ Carey, Cormac Brick, Daniel Deutsch, Danielle Eisenbud, Dee Cattle, Derek Cheng, Dimitris Paparas, Divyashree Shivakumar Sreepathihalli, Doug Reid, Dustin Tran, Dustin Zelle, Eric Noland, Erwin Huizenga, Eugene Kharitonov, Frederick Liu, Gagik Amirkhanyan, Glenn Cameron, Hadi Hashemi, Hanna Klimczak-Plucińska, Harman Singh, Harsh Mehta, Harshal Tushar Lehri, Hussein Hazimeh, Ian Ballantyne, Idan Szpektor, Ivan Nardini et al.._ Arxiv 2025.
  - **EXAONE Deep: Reasoning Enhanced Language Models.**
  - **MiniMax-01: Scaling Foundation Models with Lightning Attention.** - AI/MiniMax-01)](https://github.com/MiniMax-AI/MiniMax-01)
  - **Qwen2.5-1M Technical Report.**
  - **DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model.** - AI._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V2)](https://github.com/deepseek-ai/DeepSeek-V2)
  - **DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model.** - AI._ Arxiv 2024. [![GitHub Repo stars](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V2)](https://github.com/deepseek-ai/DeepSeek-V2)
  - **Qwen2.5 Technical Report.**
  - **Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs.** - Chun Chen, Yi-ling Chen, Qi Dai, Xiyang Dai, Ruchao Fan, Mei Gao, Min Gao, Amit Garg, Abhishek Goswami, Junheng Hao, Amr Hendy, Yuxuan Hu, Xin Jin, Mahmoud Khademi, Dongwoo Kim, Young Jin Kim, Gina Lee, Jinyu Li, Yunsheng Li, Chen Liang, Xihui Lin, Zeqi Lin, Mengchen Liu, Yang Liu, Gilsinia Lopez, Chong Luo, Piyush Madan, Vadim Mazalov, Ali Mousavi, Anh Nguyen, Jing Pan, Daniel Perez-Becker, Jacob Platin, Thomas Portet, Kai Qiu, Bo Ren, Liliang Ren, Sambuddha Roy, Ning Shang, Yelong Shen, Saksham Singhal, Subhojit Som, Xia Song, Tetyana Sych, Praneetha Vaddamanu, Shuohang Wang, Yiming Wang, Zhenghao Wang, Haibin Wu, Haoran Xu, Weijian Xu, Yifan Yang, Ziyi Yang, Donghan Yu, Ishmam Zabir, Jianwen Zhang, Li Lyna Zhang, Yunan Zhang, Xiren Zhou._ Arxiv 2025.
  - **DeepSeek-V3 Technical Report.** - AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J.L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jiawei Wang, Jin Chen, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, Junxiao Song, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Litong Wang, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qiancheng Wang, Qihao Zhu, Qinyu Chen, Qiushi Du, R.J. Chen, R.L. Jin, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, Runxin Xu, Ruoyu Zhang, Ruyi Chen, S.S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Shuting Pan, T. Wang, Tao Yun, Tian Pei, Tianyu Sun, W.L. Xiao, Wangding Zeng et al. (100 additional authors not shown)._ Arxiv 2025. [![GitHub Repo stars](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-V3)](https://github.com/deepseek-ai/DeepSeek-V3)
- 14. Speculative Decoding
  - **LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification.** - sg/LongSpec)](https://github.com/sail-sg/LongSpec)
  - **Long-Context Inference with Retrieval-Augmented Speculative Decoding.** - AI-Lab/RAPID)](https://github.com/John-AI-Lab/RAPID)
5. Length Extrapolation
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars - attention/README.md)
  - ![GitHub Repo stars - PauseTokens-7357)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Alignment-Transformer-Length-Extrapolation)
  - ![GitHub Repo stars - ai/Collinear-Constrained-Attention)
  - ![GitHub Repo stars
  - **RoFormer: Enhanced Transformer with Rotary Position Embedding.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - deepmind/randomized_positional_encodings)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - deepmind/randomized_positional_encodings)
  - ![GitHub Repo stars - NLP/length-generalization)
  - **Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis.** - Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge.* ACL 2023.
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - NLP/length-generalization)
  - **Focused Transformer: Contrastive Training for Context Scaling.**
  - ![GitHub Repo stars
  - **Exploring Transformer Extrapolation.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - research/LongLoRA)
  - **LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models.**
  - ![GitHub Repo stars - Infinite)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - pku/PoSE)
  - ![GitHub Repo stars - Infinite)
  - **YaRN: Efficient Context Window Extension of Large Language Models.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - research/LongLoRA)
  - **Scaling Laws of RoPE-based Extrapolation.**
  - ![GitHub Repo stars - Alignment-Transformer-Length-Extrapolation)
  - ![GitHub Repo stars - ai/Collinear-Constrained-Attention)
  - ![GitHub Repo stars - pku/PoSE)
  - ![GitHub Repo stars - NLP/Entropy-ABF)
  - ![GitHub Repo stars - LoRA)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - gram)
  - ![GitHub Repo stars - NLP/Entropy-ABF)
  - ![GitHub Repo stars - LoRA)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - gram)
  - ![GitHub Repo stars - Context-Data-Engineering)
  - ![GitHub Repo stars - nlp/CEPE)
  - ![GitHub Repo stars - Context-Data-Engineering)
  - ![GitHub Repo stars - nlp/CEPE)
  - ![GitHub Repo stars - NLP-SG/CLEX)
  - ![GitHub Repo stars - NLP-SG/CLEX)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Group/Ms-PoE)
  - ![GitHub Repo stars - master)
  - ![GitHub Repo stars - context-pretraining)
  - **Can't Remember Details in Long Documents? You Need Some R&R.**
  - ![GitHub Repo stars - and-r)
  - **Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.**
  - ![GitHub Repo stars - Group/Ms-PoE)
  - **InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory.**
  - ![GitHub Repo stars - context-pretraining)
  - ![GitHub Repo stars
  - **Effective Long-Context Scaling of Foundation Models.**
  - ![GitHub Repo stars
  - **Fewer Truncations Improve Language Modeling.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Zheng/DAPE)
  - ![GitHub Repo stars - coupling)
  - ![GitHub Repo stars - Zheng/DAPE)
  - **Long Context Alignment with Short Instructions and Synthesized Positions.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - coupling)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - granite/granite-code-models)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - granite/granite-code-models)
  - **ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities.**
  - ![GitHub Repo stars
  - **LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models.** - Kiong Ng, Zhiwei Jiang, Bryan Hooi.* Arxiv 2024.
  - ![GitHub Repo stars
  - **E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning.**
  - ![GitHub Repo stars - the-Knots)
  - ![GitHub Repo stars - the-Knots)
  - ![GitHub Repo stars - RAG)
  - ![GitHub Repo stars - RAG)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlp/ProLong)
  - **Efficient Long-range Language Modeling with Self-supervised Causal Retrieval.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlp/ProLong)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement.**
  - **Two are better than one: Context window extension with multi-grained self-injection.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ML/LongPPL)
  - **HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation.**
  - ![GitHub Repo stars - ML/LongPPL)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - attention)
  - ![GitHub Repo stars - token-weighting)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - attention)
  - ![GitHub Repo stars - FinAI/LongFaith)
  - ![GitHub Repo stars - NLP-SG/LongPO)
  - **Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models.**
  - ![GitHub Repo stars - nlco/cream)
  - **Transformers Can Do Arithmetic with the Right Embeddings.**
  - **Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count.**
  - ![GitHub Repo stars - synthesis)
  - ![GitHub Repo stars - NEKO/InfoScale)
  - ![GitHub Repo stars - lin/forgetting-transformer)
7. RAG and ICL
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ICL)
  - ![GitHub Repo stars - RAG)
  - ![GitHub Repo stars - NLP-Group/HippoRAG)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - RAG)
  - ![GitHub Repo stars - RAG)
  - ![GitHub Repo stars
  - **Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading.**
  - **Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing.**
  - **BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models.**
  - **Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity.**
  - ![GitHub Repo stars - RAG)
  - **RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation.** - Min Chan, Chunpu Xu, Ruibin Yuan, Hongyin Luo, Wei Xue, Yike Guo, Jie Fu.* Arxiv 2024.
  - ![GitHub Repo stars - RAG)
  - **Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts.**
  - **Retrieval Head Mechanistically Explains Long-Context Factuality.**
  - ![GitHub Repo stars
  - **Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation.**
  - **Multi-view Content-aware Indexing for Long Document Retrieval.**
  - **Feature-Adaptive and Data-Scalable In-Context Learning.**
  - ![GitHub Repo stars - ICL)
  - **KG-RAG: Bridging the Gap Between Knowledge and Creativity.**
  - ![GitHub Repo stars - RAG)
  - **HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models.**
  - ![GitHub Repo stars - NLP-Group/HippoRAG)
  - **Implicit In-context Learning.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - 9/segmentplus)
  - ![GitHub Repo stars - uiuc/GoR)
  - **Are Long-LLMs A Necessity For Long-Context Tasks?.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - epfl/icl-alignment)
  - ![GitHub Repo stars
  - **Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection.** - Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen.* Arxiv 2024.
  - **Is In-Context Learning Sufficient for Instruction Following in LLMs?.**
  - **FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models.**
  - ![GitHub Repo stars - epfl/icl-alignment)
  - **Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.**
  - ![GitHub Repo stars
  - **Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions.**
  - **Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding.**
  - ![GitHub Repo stars - AI-Lab/LongRAG)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Lab/LongContextReasoner)
  - ![GitHub Repo stars - ICL)
  - ![GitHub Repo stars - science/RAGChecker)
  - ![GitHub Repo stars - Lab/LongContextReasoner)
  - **LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs.**
  - ![GitHub Repo stars - AI-Lab/LongRAG)
  - **Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning.**
  - **From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data.**
  - ![GitHub Repo stars - ICL)
  - **Memory3: Language Modeling with Explicit Memory.**
  - **Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting.** - Yu Lee, Tomas Pfister.* Arxiv 2024.
  - ![GitHub Repo stars
  - ![GitHub Repo stars - science/RAGChecker)
  - ![GitHub Repo stars - in-the-margins)
  - ![GitHub Repo stars
  - **In Defense of RAG in the Era of Long-Context Language Models.**
  - **MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery.**
  - **You Only Use Reactive Attention Slice For Long Context Retrieval.**
  - ![GitHub Repo stars
  - **SMART-RAG: Selection using Determinantal Matrices for Augmented Retrieval.**
  - **Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - 9/segmentplus)
  - ![GitHub Repo stars - uiuc/GoR)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models.**
11. Benchmark and Evaluation
- 11.2 MLLM
  - ![GitHub Repo stars - project/LVLM-compress-bench)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Story)
  - ![GitHub Repo stars - deepmind/lm_act)
  - **MileBench: Benchmarking MLLMs in Long Context.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **Many-Shot In-Context Learning in Multimodal Foundation Models.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **RepoQA: Evaluating Long Context Code Understanding.**
  - ![GitHub Repo stars
  - **Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Doc)
  - **Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models.**
  - ![GitHub Repo stars - ML-Lab/multimodal-needle-in-a-haystack)
  - ![GitHub Repo stars - XComposer)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.**
  - ![GitHub Repo stars - XComposer)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ML-Lab/multimodal-needle-in-a-haystack)
  - **Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts.**
  - ![GitHub Repo stars - Doc)
  - ![GitHub Repo stars
  - **MovieSum: An Abstractive Summarization Dataset for Movie Screenplays.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - deepmind/lm_act)
  - **SEED-Story: Multimodal Long Story Generation with Large Language Model.**
  - ![GitHub Repo stars - Story)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - deepmind/neptune)
  - ![GitHub Repo stars
  - **LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations.**
  - ![GitHub Repo stars
- 11.1 LLM
  - ![GitHub Repo stars - research/long-range-arena)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - AI-Lab/LongICLBench)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlp/XL2Bench)
  - ![GitHub Repo stars - deepmind/loft)
  - ![GitHub Repo stars - research/long-range-arena)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - liu/lost-in-the-middle)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - coai/LOT-LongLM)
  - ![GitHub Repo stars - nlp/scrolls)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlco/loogle)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlp/scrolls)
  - ![GitHub Repo stars - coai/LOT-LongLM)
  - **SCROLLS: Standardized CompaRison Over Long Language Sequences.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - liu/lost-in-the-middle)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlco/loogle)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Task-More-Tokens)
  - ![GitHub Repo stars - research/LoCoMo)
  - **∞Bench: Extending Long Context Evaluation Beyond 100K Tokens.**
  - **Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models.**
  - ![GitHub Repo stars - Task-More-Tokens)
  - **Evaluating Very Long-Term Conversational Memory of LLM Agents.** - Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang.* Arxiv 2024.
  - ![GitHub Repo stars - research/LoCoMo)
  - ![GitHub Repo stars - nlp/LM-Science-Tutor)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlp/LM-Science-Tutor)
  - **Needle in a haystack - pressure testing llms.**
  - ![GitHub Repo stars
  - **In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss.**
  - **LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Stars)
  - **Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models.**
  - ![GitHub Repo stars - Stars)
  - ![GitHub Repo stars
  - **NovelQA: A Benchmark for Long-Range Novel Question Answering.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - deepmind/long-form-factuality)
  - **LUQ: Long-text Uncertainty Quantification for LLMs.**
  - ![GitHub Repo stars
  - **Long-context LLMs Struggle with Long In-context Learning.**
  - ![GitHub Repo stars - AI-Lab/LongICLBench)
  - **CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems.**
  - ![GitHub Repo stars
  - **XL2Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies.**
  - ![GitHub Repo stars - nlp/XL2Bench)
  - **Long-form factuality in large language models.**
  - ![GitHub Repo stars - deepmind/long-form-factuality)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - pku/LongEmbed)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - context-icl)
  - ![GitHub Repo stars - lab/OLAPH)
  - ![GitHub Repo stars - USTC/CoLeG-Math)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - compass/Ada-LEval)
  - ![GitHub Repo stars
  - **LongEmbed: Extending Embedding Models for Long Context Retrieval.**
  - ![GitHub Repo stars - pku/LongEmbed)
  - **Make Your LLM Fully Utilize the Context.** - Guang Lou.* Arxiv 2024.
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - context-icl)
  - **Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors.**
  - ![GitHub Repo stars - compass/Ada-LEval)
  - ![GitHub Repo stars - lab/OLAPH)
  - ![GitHub Repo stars - USTC/CoLeG-Math)
  - ![GitHub Repo stars - deepmind/loft)
  - **Many-shot Jailbreaking.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - dont-tell)
  - ![GitHub Repo stars - fans/MedOdyssey)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding.** - Seng Chua.* Arxiv 2024.
  - ![GitHub Repo stars
  - **Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Song/VeriScore)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - of-a-haystack)
  - ![GitHub Repo stars - needle-in-a-haystack)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Bench)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - dont-tell)
  - ![GitHub Repo stars - fans/MedOdyssey)
  - ![GitHub Repo stars - attribute-or-abstain)
  - ![GitHub Repo stars - Zou/DocBench)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - of-a-haystack)
  - **Entity-Level Sentiment: More than the Sum of Its Parts.**
  - **Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction.**
  - **RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension.**
  - ![GitHub Repo stars - Song/VeriScore)
  - **Attribute or Abstain: Large Language Models as Long Document Assistants.**
  - ![GitHub Repo stars - attribute-or-abstain)
  - **How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities.**
  - **DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems.**
  - ![GitHub Repo stars - Zou/DocBench)
  - ![GitHub Repo stars - compass/opencompass)
  - ![GitHub Repo stars - qa-arena)
  - ![GitHub Repo stars - key-retrieval-code-tasks)
  - ![GitHub Repo stars - qa-arena)
  - ![GitHub Repo stars - compass/opencompass)
  - ![GitHub Repo stars - USC/Lifelong-ICL)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Bench)
  - ![GitHub Repo stars - key-retrieval-code-tasks)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - bench)
  - ![GitHub Repo stars - CITEEVAL)
  - ![GitHub Repo stars - nlp/HELMET)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - sg/Attention-Sink)
  - **Long Input Benchmark for Russian Analysis.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - needle-in-a-haystack)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - thu/LongPiBench)
  - ![GitHub Repo stars - lab/ETHIC)
  - ![GitHub Repo stars - roberts1/needle-threading)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ai/medical-eval-sphere)
  - ![GitHub Repo stars
  - **HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models.**
  - ![GitHub Repo stars - bench)
  - ![GitHub Repo stars - CITEEVAL)
  - ![GitHub Repo stars - nlp/HELMET)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - sg/Attention-Sink)
  - **Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs.**
  - ![GitHub Repo stars - thu/LongPiBench)
  - ![GitHub Repo stars - lab/ETHIC)
  - ![GitHub Repo stars - roberts1/needle-threading)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ai/medical-eval-sphere)
  - ![GitHub Repo stars - LM/tree/ssm/examples/mamba)
  - ![GitHub Repo stars - AI-Lab/gsm)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - rag-benchmark)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - long-cot)
  - **DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities.**
  - ![GitHub Repo stars
  - **BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models.** - Rong Wen.* Arxiv 2023.
  - **LongHealth: A Question Answering Benchmark with Long Clinical Documents.** - Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem.* Arxiv 2024.
  - ![GitHub Repo stars - coai/LongSafety)
  - ![GitHub Repo stars - prog123/LongRePS)
  - ![GitHub Repo stars - KGLLM/U-NIAH)
12. Long Text Generation
- 11.2 MLLM
  - ![GitHub Repo stars - LFAG/DeFine_Dataset)
  - ![GitHub Repo stars - Writer)
  - ![GitHub Repo stars - ai/heterogeneous-recursive-planning)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Lengthen)
  - ![GitHub Repo stars - Lengthen)
  - **Integrating Planning into Single-Turn Long-Form Text Generation.**
  - **LoGU: Long-form Generation with Uncertainty Expressions.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - pli/LongProc)
  - ![GitHub Repo stars
  - **LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - KEG/LongWriter-V)
  - ![GitHub Repo stars
  - **Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlco/TokenSwift)
16. Blogs
- 11.2 MLLM
- 13.2 MLLM
  - **The Secret Sauce behind 100K context window in LLMs: all tricks in one place.**
  - ![GitHub Repo stars
3. Recurrent Transformers
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars - xl)
  - **Transformer-XL: Attentive language models beyond a fixed-length context.**
  - ![GitHub Repo stars - xl)
  - ![GitHub Repo stars - transformer-pytorch)
  - ![GitHub Repo stars
  - **Memformer: The memory-augmented transformer.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - transformer-pytorch)
  - ![GitHub Repo stars - transformers-pytorch)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - LM)
  - ![GitHub Repo stars - LM)
  - ![GitHub Repo stars - transformers-pytorch)
  - **Recurrent Attention Networks for Long-text Modeling.**
  - ![GitHub Repo stars - recurrent-transformer-pytorch)
  - ![GitHub Repo stars
  - **Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model.**
  - **Scaling Transformer to 1M tokens and beyond with RMT.**
  - **Block-Recurrent Transformers.**
  - ![GitHub Repo stars - recurrent-transformer-pytorch)
  - ![GitHub Repo stars
  - **TRAMS: Training-free Memory Selection for Long-range Language Modeling.**
  - ![GitHub Repo stars
  - **Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence.** - Jie Zhu.* Arxiv 2024.
  - ![GitHub Repo stars - LM)
  - ![GitHub Repo stars - infctx-trainer)
  - ![GitHub Repo stars - ML/linear_open_lm)
  - ![GitHub Repo stars - hou/VisualRWKV)
  - ![GitHub Repo stars - linear-attention)
  - ![GitHub Repo stars - ML/linear_open_lm)
  - ![GitHub Repo stars - hou/VisualRWKV)
  - **Just read twice: closing the recall gap for recurrent language models.**
  - ![GitHub Repo stars - linear-attention)
  - ![GitHub Repo stars - recurrent-memory-transformer)
  - ![GitHub Repo stars - recurrent-memory-transformer)
  - ![GitHub Repo stars - paper)
  - ![GitHub Repo stars - paper)
  - ![GitHub Repo stars - AI/xlstm)
4. State Space Models
- 2.4 IO-Aware Attention
  - **Mamba: Linear-Time Sequence Modeling with Selective State Spaces.**
  - ![GitHub Repo stars - spaces/mamba)
  - ![GitHub Repo stars - spaces/mamba)
  - **MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts.**
  - **MambaByte: Token-free Selective State Space Model.**
  - **LOCOST: State-Space Models for Long Document Abstractive Summarization.**
  - **State Space Models as Foundation Models: A Control Theoretic Overview.**
  - **Jamba: A Hybrid Transformer-Mamba Language Model.** - Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham.* Arxiv 2024.
  - **Robustifying State-space Models for Long Sequences via Approximate Diagonalization.**
  - **Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality.**
  - **Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - mamba)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - lab/attamba)
  - ![GitHub Repo stars
  - **MambaForGCN: Enhancing Long-Range Dependency with State Space Model and Kolmogorov-Arnold Networks for Aspect-Based Sentiment Analysis.**
  - **Discrete Diffusion Language Model for Long Text Summarization.**
  - **ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2.**
  - ![GitHub Repo stars - Mamba)
  - **SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models.**
  - **Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling.**
  - ![GitHub Repo stars - mamba)
  - **Taipan: Efficient and Expressive State Space Language Models with Selective Attention.**
  - **Rethinking Token Reduction for State Space Models.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - lab/attamba)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - LM/tree/ssm/examples/mamba)
6. Long Term Memory
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars
  - ![GitHub Repo stars - SiliconFriend)
  - **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System.**
  - ![GitHub Repo stars
  - **MemoryBank: Enhancing Large Language Models with Long-Term Memory.**
  - ![GitHub Repo stars - SiliconFriend)
  - **Improve Long-term Memory Learning Through Rescaling the Error Temporally.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - pytorch)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - pytorch)
  - ![GitHub Repo stars
  - **Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona Refinement.** - iunn Ong, Seoyeon Kim, Dongha Lee, Jinyoung Yeo.* Arxiv 2024.
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs.**
  - ![GitHub Repo stars - ustc/MemoryLLM)
  - ![GitHub Repo stars - ai/lm2)
  - ![GitHub Repo stars
10. Long Video and Image
- 9.2 Model
  - ![GitHub Repo stars
  - ![GitHub Repo stars - PLUG/mPLUG-Owl)
  - ![GitHub Repo stars - apps/EasyAnimate)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Bench)
  - ![GitHub Repo stars - vgen/slowfast-vgen)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlco/VideoLLaMB)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - FlexReduc)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - ReTaKe)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - yh/Owl)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars - PLUG/mPLUG-Owl)
  - **EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - apps/EasyAnimate)
  - **VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.** - Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal.* Arxiv 2024.
  - **PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization.**
  - ![GitHub Repo stars - Bench)
  - ![GitHub Repo stars
  - **DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework.**
  - **Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding.** - Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai.* ECCV 2024 Workshop.
  - **LongVILA: Scaling Long-Context Visual Language Models for Long Videos.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - **VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges.**
  - **Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.**
  - **LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture.**
  - **VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models.**
  - **Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models.**
  - **SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation.** - Wei Chang, Lingjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang.* Arxiv 2024.
  - ![GitHub Repo stars - vgen/slowfast-vgen)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
9. Compress
- 9.1 Prompt
  - ![GitHub Repo stars
  - ![GitHub Repo stars - for-Prompt-Compression)
  - ![GitHub Repo stars - mllab/context-memory)
  - ![GitHub Repo stars - TMG/ICL-State-Vector)
  - ![GitHub Repo stars - v2)
  - ![GitHub Repo stars - Pt/UltraGist)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - G/RCC_Transformer)
  - ![GitHub Repo stars - Group/LoCoCo)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - COCO)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - xmu/UIO-LLMs)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - group/FaviComp)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - nlp/AutoCompressors)
  - ![GitHub Repo stars - EIC/DiffRatio-MoD)
  - ![GitHub Repo stars - to-stop)
  - ![GitHub Repo stars
  - **Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs.**
- 2.4 IO-Aware Attention
  - ![GitHub Repo stars
  - **LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2023.
  - ![GitHub Repo stars
  - **Learning to Compress Prompt in Natural Language Formats.** - Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu.* Arxiv 2024.
  - **LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression.** - Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang.* Arxiv 2024.
  - **PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models.**
  - ![GitHub Repo stars - for-Prompt-Compression)
  - ![GitHub Repo stars - mllab/context-memory)
  - **LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression.** - Yew Lin, Yuqing Yang, Lili Qiu.* Arxiv 2023.
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - TMG/ICL-State-Vector)
  - ![GitHub Repo stars - v2)
  - ![GitHub Repo stars
  - **Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - G/RCC_Transformer)
  - ![GitHub Repo stars - Group/LoCoCo)
  - ![GitHub Repo stars - Institute/LLM-Microscope)
  - **xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token.** - Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao.* Arxiv 2024.
  - ![GitHub Repo stars
  - **SelfCP: Compressing Long Prompt to 1/12 Using the Frozen Large Language Model Itself.**
  - **Compressing Lengthy Context With UltraGist.**
  - **In-Context Former: Lightning-fast Compressing Context for Large Language Model.**
  - **UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs.**
  - ![GitHub Repo stars - xmu/UIO-LLMs)
  - **Evaluating Zero-Shot Long-Context LLM Compression.**
  - ![GitHub Repo stars
  - **Context Embeddings for Efficient Answer Generation in RAG.**
  - **QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression.**
  - ![GitHub Repo stars
  - **PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning.**
  - **Concise and Precise Context Compression for Tool-Using Language Models.**
  - **SentenceVAE: Faster, Longer and More Accurate Inference with Next-sentence Prediction for Large Language Models.**
  - **QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention.**
  - **AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models.**
  - ![GitHub Repo stars
  - **Familiarity-aware Evidence Compression for Retrieval Augmented Generation.**
  - ![GitHub Repo stars - group/FaviComp)
  - **TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning.**
  - **Parse Trees Guided LLM Prompt Compression.**
  - **FineZip: Pushing the Limits of Large Language Models for Practical Lossless Text Compression.**
  - ![GitHub Repo stars
  - **Perception Compressor:A training-free prompt compression method in long context scenarios.** - Tao Zheng.* Arxiv 2024.
  - **From Reading to Compressing: Exploring the Multi-document Reader for Prompt Compression.**
  - **Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability.** - Yan Yeung.* EMNLP 2024.
- 9.2 Model
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Institute/LLM-Microscope)
  - ![GitHub Repo stars - DASLab/QuEST)
  - ![GitHub Repo stars - folding-universal)
  - ![GitHub Repo stars - MLSys-Lab/SVD-LLM)
  - ![GitHub Repo stars - Aware-Automated-Machine-Learning)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Pruner)
  - **AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration.** - Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han.* MLSys 2024 Best Paper Award.
  - ![GitHub Repo stars - han-lab/llm-awq)
  - ![GitHub Repo stars - ffs-compression/)
  - ![GitHub Repo stars - compression)
  - ![GitHub Repo stars - he/Compressed-Experts)
8. Agent
- 2.4 IO-Aware Attention
  - **A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis.**
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Austin-RPL/amago)
  - ![GitHub Repo stars - VL/Optimus-1)
  - ![GitHub Repo stars
  - **AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents.**
  - ![GitHub Repo stars - Austin-RPL/amago)
  - **Chain of Agents: Large Language Models Collaborating on Long-Context Tasks.**
  - **GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models.**
  - ![GitHub Repo stars - VL/Optimus-1)
13. Blogs
- 11.2 MLLM
1. Survey Papers
- ![GitHub Repo stars - llms-learning)
- **Efficient Transformers: A Survey.**
- **Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey.**
- ![GitHub Repo stars - llms-learning)
- ![GitHub Repo stars - AHU/Mamba_State_Space_Model_Paper_List)
- **The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey.**
- ![GitHub Repo stars - RAG-Evaluation)
- **State Space Model for New-Generation Network Alternative to Transformers: A Survey.**
- ![GitHub Repo stars - AHU/Mamba_State_Space_Model_Paper_List)
- **A Survey on Efficient Inference for Large Language Models.** - Ping Zhang, Yuhan Dong, Yu Wang.* Arxiv 2024.
- **A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models.** - Seng Chua, Qing Li.* Arxiv 2024.
- **Evaluation of Retrieval-Augmented Generation: A Survey.**
- ![GitHub Repo stars - RAG-Evaluation)
- **The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.**
- **Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption.**
- ![GitHub Repo stars - charlie/Awesome-KV-Cache)
- ![GitHub Repo stars - Compression)
- ![GitHub Repo stars - charlie/Awesome-KV-Cache)
- **Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely.**
- **Prompt Compression for Large Language Models: A Survey.**
- ![GitHub Repo stars - Spake-Long-Context-LLM)
- ![GitHub Repo stars - Long-Chain-of-Thought-Reasoning)
- **A Comprehensive Survey on Long Context Language Modeling.**
- ![GitHub Repo stars - Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling)
- ![GitHub Repo stars - Papers-Retrieval-Augmented-Generation)
- ![GitHub Repo stars - Lab/Awesome-KV-Cache-Management)
- ![GitHub Repo stars - Survey)
Month Papers
- IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark
- Star Attention: Efficient LLM Inference over Long Sequences
- Large Language Models Can Self-Improve in Long-context Reasoning
- Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
- IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark
- Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation
- LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
- M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
- Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
- What is Wrong with Perplexity for Long-context Language Modeling?
- Language Models can Self-Lengthen to Generate Long Texts
- GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
- 1bit-Merging: Dynamic Quantized Merging for Large Language Models
Week Papers
- T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
- Star Attention: Efficient LLM Inference over Long Sequences
- Attamba: Attending To Multi-Token States
- A Benchmark for Long-Form Medical Question Answering
- When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
13. Long CoT
- 13.1 LLM
  - ![GitHub Repo stars - AI/SkyThought)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - Valve)
  - ![GitHub Repo stars - R1)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - BJTU/OpenRFT)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
  - ![GitHub Repo stars - o1)
  - ![GitHub Repo stars - NLP/O1-Journey)
  - ![GitHub Repo stars
  - ![GitHub Repo stars
- 13.2 MLLM
  - ![GitHub Repo stars - hailong/TVC)
15. Blogs
- 11.2 MLLM
  - **The Secret Sauce behind 100K context window in LLMs: all tricks in one place.**
  - **The Secret Sauce behind 100K context window in LLMs: all tricks in one place.**
14. Blogs
- 11.2 MLLM
Acknowledgements
- Star History
  - ![Star History Chart - LLM-Long-Context-Modeling/stargazers)
  - ![Star History Chart - LLM-Long-Context-Modeling/stargazers)
15. Technical Report
- 13.2 MLLM
  - ![GitHub Repo stars - ai/DeepSeek-V2)
  - ![GitHub Repo stars - ai/DeepSeek-V3)
  - ![GitHub Repo stars
  - ![GitHub Repo stars - AI/MiniMax-01)
📢 News
- Month Papers
  - Long Is More Important Than Difficult for Training Reasoning Models
13. Technical Report
- 11.2 MLLM
  - ![GitHub Repo stars - ai/DeepSeek-V2)
14. Speculative Decoding
- 13.2 MLLM
  - ![GitHub Repo stars - sg/LongSpec)
  - ![GitHub Repo stars - AI-Lab/RAPID)

Programming Languages

Jupyter Notebook 2

Categories

📜 Papers 1,250 2. Efficient Attention 223 11. Benchmark and Evaluation 215 5. Length Extrapolation 117 9. Compress 87 7. RAG and ICL 74 10. Long Video and Image 41 3. Recurrent Transformers 38 4. State Space Models 30 1. Survey Papers 27 6. Long Term Memory 23 12. Long Text Generation 21 14. Blogs 19 13. Long CoT 13 Month Papers 13 13. Blogs 12 8. Agent 10 16. Blogs 6 Week Papers 6 15. Technical Report 4 15. Blogs 2 14. Speculative Decoding 2 Acknowledgements 2 13. Technical Report 1 📢 News 1

Sub Categories

2.4 IO-Aware Attention 467 2. Efficient Attention 316 11. Benchmark and Evaluation 252 11.1 LLM 175 5. Length Extrapolation 155 9. Compress 145 11.2 MLLM 99 7. RAG and ICL 87 2.1 Sparse Attention 75 6. Long Term Memory 44 10. Long Video and Image 43 2.2 Linear Attention 39 13. Long CoT 35 3. Recurrent Transformers 34 1. Survey Papers 33 9.2 Model 32 12. Long Text Generation 30 16. Blogs 28 9.1 Prompt 26 4. State Space Models 26 13.1 LLM 12 8. Agent 11 13.2 MLLM 9 15. Technical Report 9 2.3 Hierarchical Attention 4 14. Speculative Decoding 2 Star History 2 Month Papers 1