Projects in Awesome Lists tagged with long-context

https://github.com/internlm/internlm

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf

Last synced: 14 May 2025

https://github.com/InternLM/InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf

Last synced: 16 Mar 2025

https://github.com/dvlab-research/longlora

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

fine-tuning-llm large-language-models llm long-context lora

Last synced: 15 May 2025

https://github.com/dvlab-research/LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

fine-tuning-llm large-language-models llm long-context lora

Last synced: 16 Mar 2025

https://github.com/thudm/longwriter

[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

fine-tuning llm long-context long-text

Last synced: 14 May 2025

https://github.com/THUDM/LongWriter

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

fine-tuning llm long-context long-text

Last synced: 08 Aug 2025

https://github.com/thudm/longbench

LongBench v2 and LongBench (ACL 2024)

benchmark llm long-context longtext

Last synced: 15 May 2025

https://github.com/THUDM/LongBench

LongBench v2 and LongBench (ACL 2024)

benchmark llm long-context longtext

Last synced: 16 Oct 2025

https://github.com/haoliuhl/ringattention

Large Context Attention

large-language-models long-context memory-efficient transformers

Last synced: 12 Jan 2026

https://github.com/lucidrains/MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

artificial-intelligence attention-mechanisms deep-learning learned-tokenization long-context transformers

Last synced: 09 May 2025

https://github.com/lucidrains/megabyte-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

artificial-intelligence attention-mechanisms deep-learning learned-tokenization long-context transformers

Last synced: 14 May 2025

https://github.com/lucidrains/ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

attention-mechanism distributed-attention efficient-attention long-context

Last synced: 15 May 2025

https://github.com/thudm/longcite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

benchmark citation-generation fine-tuning llm long-context

Last synced: 08 Apr 2025

https://github.com/THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

benchmark citation-generation fine-tuning llm long-context

Last synced: 16 Oct 2025

https://github.com/nvidia/kvpress

LLM KV cache compression made easy

inference kv-cache kv-cache-compression large-language-models llm long-context python pytorch transformers

Last synced: 09 Apr 2026

https://github.com/lucidrains/recurrent-memory-transformer-pytorch

Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch

artificial-intelligence attention-mechanisms deep-learning long-context memory recurrence transformers

Last synced: 15 May 2025

https://github.com/thunlp/infllm

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

large-language-models llm long-context training-free

Last synced: 06 Apr 2025

https://github.com/thunlp/InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

large-language-models llm long-context training-free

Last synced: 05 Apr 2025

https://github.com/openbmb/infinitebench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

benchmark large-language-models long-context

Last synced: 05 Jul 2025

https://github.com/VITA-MLLM/Long-VITA

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

long-context mllm vision-language-model

Last synced: 31 Mar 2025

https://github.com/thudm/longalign

[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs

alignment llm long-context longtext

Last synced: 12 Apr 2025

https://github.com/OpenBMB/InfiniteBench

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

benchmark large-language-models long-context

Last synced: 17 Apr 2025

https://github.com/THUDM/LongAlign

[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs

alignment llm long-context longtext

Last synced: 16 Oct 2025

https://github.com/Infini-AI-Lab/TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

acceleration efficiency inference llm llm-inference long-context speculative-decoding

Last synced: 16 May 2025

https://bigai-nlco.github.io/LooGLE/

ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models

acl2024 large-language-models llm long-context

Last synced: 29 Mar 2025

https://github.com/yangjianxin1/longqlora

LongQLoRA: Extent Context Length of LLMs Efficiently

llm long-context longlora lora qlora

Last synced: 12 Sep 2025

https://github.com/yangjianxin1/LongQLoRA

LongQLoRA: Extent Context Length of LLMs Efficiently

llm long-context longlora lora qlora

Last synced: 16 Oct 2025

https://github.com/bigai-nlco/LooGLE

ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models

acl2024 large-language-models llm long-context

Last synced: 09 May 2025

https://github.com/bytedance/shadowkv

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

cpu-offload high-throughput llm-inference long-context low-rank research sparse-attention

Last synced: 04 Apr 2025

https://github.com/nightdessert/Retrieval_Head

open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality

large-language-models long-context

Last synced: 08 May 2025

https://github.com/x-plug/writingbench

WritingBench: A Comprehensive Benchmark for Generative Writing

ai benchmark evaluation-framework huggingface llm long-context long-text nlp text-generation writing

Last synced: 01 Sep 2025

https://github.com/Glaciohound/LM-Infinite

Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"

language-model long-context model-diagnostics

Last synced: 16 May 2025

https://github.com/OpenGVLab/MM-NIAH

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

benchmark long-context multimodal-large-language-models vision-language-model

Last synced: 17 Apr 2025

https://github.com/QingFei1/LongRAG

[EMNLP 2024] LongRAG: A Dual-perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

llm long-context rag

Last synced: 07 May 2025

https://github.com/lucidrains/perceiver-ar-pytorch

Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch

artficial-intelligence attention-mechanism deep-learning long-context transformer

Last synced: 15 Jul 2025

https://github.com/nick7nlp/Counting-Stars

Counting-Stars (★)

evaluation-metrics large-language-model long-context

Last synced: 07 May 2025

https://github.com/open-compass/ada-leval

The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

gpt4 llm long-context

Last synced: 14 Aug 2025

https://github.com/lucidrains/flash-genomics-model

My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other hierarchical methods)

artificial-intelligence attention-mechanisms deep-learning genomics long-context transformers

Last synced: 30 Apr 2025

https://github.com/dvlab-research/q-llm

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

fast-inference inference-acceleration kv-cache-compression large-language-models long-context

Last synced: 03 Jul 2025

https://github.com/vita-group/ms-poe

"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang.

large-language-models long-context lost-in-the-middle positional-encoding

Last synced: 19 Apr 2025

https://github.com/vectifyai/condb

ConDB: The KV-Cache Native Context Database

agents ai context-database kv-cache llm long-context rag reasoning retrieval tree-search

Last synced: 04 Jun 2026

https://github.com/VITA-Group/Ms-PoE

"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang.

large-language-models long-context lost-in-the-middle positional-encoding

Last synced: 16 May 2025

https://github.com/4ai/ran

RAN: Recurrent Attention Networks for Long-text Modeling | Findings of ACL23

acl acl2023 long-context long-context-attention long-context-transformers long-document-modeling recurrent-attention-networks recurrent-networks

Last synced: 23 Apr 2025

https://github.com/openmoss/longllada

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

diffusion diffusion-language-models large-language-models length-extrapolation long-context

Last synced: 23 Jul 2025

https://github.com/asigalov61/Heptabit-Music-Transformer

[DEPRECIATED] Very fast, large music transformer with 8k sequence length, efficient heptabit MIDI notes encoding, true full MIDI instruments range, chords counters and outro tokens

artificial-intelligence heptabit heptagon heptagram long-context midi music-ai music-transformer sota-model

Last synced: 14 Jul 2025

https://github.com/dmis-lab/ethic

[NAACL 2025] ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage

benchmark evaluation long-context

Last synced: 12 Oct 2025

https://github.com/openmoss/reattention

[ICLR2025] ReAttention, a training-free approach to break the maximum context length in length extrapolation

large-language-model length-extrapolation long-context triton

Last synced: 09 Oct 2025

https://github.com/jagmarques/nexusquant

Training-free KV cache compression for LLMs. 10-33x compression via E8 lattice quantization + attention-aware token eviction. One line of code.

attention compression e8-lattice inference kv-cache llama llm long-context memory-efficient mistral pytorch quantization token-eviction transformers vector-quantization

Last synced: 01 May 2026

https://github.com/reddec/dreaming-bard

LLM assistant to create long books/stories/documents

llm long-context novel-writing

Last synced: 10 Aug 2025

https://github.com/rgtjf/untie-the-knots

Untie-the-Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models

language-model long-context untie-the-knots

Last synced: 29 Jan 2026

https://github.com/tjamescouch/gro

Provider-agnostic LLM CLI wrapper (claude/openai/gemini)

agent-framework agent-runtime ai-agents ai-infrastructure ai-runtime autonomous-agents context-management llm llm-agents llm-runtime long-context mcp model-context-protocol multi-agent

Last synced: 01 Mar 2026

https://github.com/melvinebenezer/liah-lie_in_a_haystack

needle in a haystack for LLMs

llm llm-inference llms-benchmarking long-context needle-in-haystack

Last synced: 09 Jul 2025

https://github.com/manishklach/intent-attention-kernel

Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.

agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton

Last synced: 28 May 2026

https://github.com/zircote/rlm-rs-plugin

Claude Code plugin for processing documents 100x larger than context limits using the Recursive Language Model pattern. Rust-powered chunking, hybrid semantic + BM25 search, and sub-LLM orchestration.

ai-agents bm25 chunking claude-code claude-code-plugin document-processing hybrid-search llm long-context recursive-language-model rlm rust semantic-search sqlite

Last synced: 08 Apr 2026

https://github.com/neosun100/kimi-linear-vllm-docker-serve

Dockerized vLLM serving for Kimi-Linear-48B-A3B (AWQ-4bit), from 128K to 1M context.

awq docker kimi-linear llm-serving long-context vllm

Last synced: 31 Jan 2026

https://github.com/graph-com/haystackcraft

Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation

agent benchmark context-engineering deep-research llm long-context rag retrieval

Last synced: 15 Apr 2026

https://github.com/stanford-oval/sliders

Repository for paper: Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

agents document-ai long-context

Last synced: 03 Jun 2026

https://github.com/denial-web/hard-needle

Semantically hard multi-needle long-context data generator. Stop testing LLMs with random-password needles.

benchmark llm llm-evaluation long-context needle-in-a-haystack python rag synthetic-data

Last synced: 08 May 2026

https://github.com/harvey-fin/absence-bench

Code implementation for paper AbsenceBench: Language Models Can't Tell What's Missing

benchmark long-context natural-language-processing

Last synced: 29 Nov 2025

https://github.com/framsouza/slack-gemini-summarizer

A solution to fetch and analyze Slack channel conversations, leveraging the Gemini 1.5 Pro API for summarization.

ai gemini-pro genai long-context slack

Last synced: 18 Apr 2026

https://github.com/leagames0221-sys/longctx-bench-honest

Honest measurement of 1M-token long-context benchmarks (RULER + LongBench v2 + NIAH) on Qwen2.5-7B-1M local vs GitHub Models cloud. All zero credit card, drift-checked, reproducible.

benchmark bitsandbytes consumer-laptop github-models llm long-context niah portfolio qwen qwen2-5 transformers vllm

Last synced: 02 Jun 2026