Awesome-Long-Context-Language-Modeling
Papers of Long Context Language Model
https://github.com/davendw49/Awesome-Long-Context-Language-Modeling
Last synced: 6 days ago
JSON representation
-
Introduction (Draft by ChatGPT😄)
- ChatGPT - 3.5 has a maximum context window of **2,048** tokens. This limitation poses challenges when dealing with longer pieces of text, as it may cut off relevant information beyond the context window.
- Figure taken from Longformer
-
PaperList
-
Memory/Cache-Augmented Models
- Dual Cache for Long Document Neural Coreference Resolution
- Augmenting Language Models with Long-Term Memory
- In-context Autoencoder for Context Compression in a Large Language Model
- LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
- Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
- Context Compression for Auto-regressive Transformers with Sentinel Tokens
- Learned Token Pruning for Transformers
- Block-Recurrent Transformers
- Recurrent Memory Transformer
- Memorizing Transformers
- Compressive Transformers for Long-Range Sequence Modelling
- Focused Transformer: Contrastive Training for Context Scaling
- Context Compression for Auto-regressive Transformers with Sentinel Tokens
- Block-Recurrent Transformers
- Recurrent Memory Transformer
- Memorizing Transformers
- Compressing Context to Enhance Inference Efficiency of Large Language Models
-
Transformer Variants (Totally change the KV or position embedding of the transformers)
- Adapting Language Models to Compress Contexts
- LONGNET: Scaling Transformers to 1,000,000,000 Tokens
- Blockwise Parallel Transformer for Long Context Large Models
- ETC: Encoding Long and Structured Inputs in Transformers
- Improving Long Context Document-Level Machine Translation
- Extending context window of large language models via positional interpolation
- Improving Long Context Document-Level Machine Translation
- ETC: Encoding Long and Structured Inputs in Transformers
-
Window-Based/On-the-fly Methods
- Efficient Long-Text Understanding with Short-Text Models
- LongCoder: A Long-Range Pre-trained Language Model for Code Completion
- Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
- Train short, test long: Attention with linear biases enables input length extrapolation
- LongCoder: A Long-Range Pre-trained Language Model for Code Completion
- Efficient Long-Text Understanding with Short-Text Models
-
Analysis
- Lost in the Middle: How Language Models Use Long Contexts
- Do Long-Range Language Models Actually Use Long-Range Context?
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?
- Lost in the Middle: How Language Models Use Long Contexts
- Do Long-Range Language Models Actually Use Long-Range Context?
- Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?
-
Reinforcement Learning
-
Benchmark
-
CV-Inspired
-
-
Contact Me
-
CV-Inspired
-