An open API service indexing awesome lists of open source software.

https://github.com/Meirtz/Awesome-Context-Engineering

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.
https://github.com/Meirtz/Awesome-Context-Engineering

List: awesome-context-engineering

agent agentic-ai agi awesome-list cognitive-science context-engineering llm rag

Last synced: 10 months ago
JSON representation

🔥 Comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems. hundreds of papers, frameworks, and implementation guides for LLMs and AI agents.

Awesome Lists containing this project

README

          

# Awesome Context Engineering


Awesome Context Engineering Cover

## 💬 Join Our Community


WeChat Group

Join our WeChat group for discussions and updates!


Join our Discord server


[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square)](http://makeapullrequest.com)
[![Paper](https://img.shields.io/badge/Paper-Published-green.svg)](https://arxiv.org/abs/2507.13334)

> 📄 **Our comprehensive survey paper on Context Engineering is now published!** Check out our latest academic insights and theoretical foundations.

A comprehensive survey and collection of resources on **Context Engineering** - the evolution from static prompting to dynamic, context-aware AI systems.

## 📧 Contact

For questions, suggestions, or collaboration opportunities, please feel free to reach out:

**Lingrui Mei**
📧 Email: [meilingrui25b@ict.ac.cn](mailto:meilingrui25b@ict.ac.cn) or [meilingrui22@mails.ucas.ac.cn](mailto:meilingrui22@mails.ucas.ac.cn)

**I WROTE THE WRONG EMAIL ADDRESS IN THE FIRST VERSION OF MY PAPER!!** You can also open an issue in this repository for general discussions and suggestions.

---

## 📰 News

- **[2025.07.17]** 🔥🔥 Our paper is now published! Check out ["A Survey of Context Engineering for Large Language Models"](https://arxiv.org/abs/2507.13334) on [arXiv](https://arxiv.org/abs/2507.13334) and [Hugging Face Papers](https://huggingface.co/papers/2507.13334)
- **[2025.07.03]** Repository initialized with comprehensive outline
- **[2025.07.03]** Survey structure established following modern context engineering paradigms

---

## 🎯 Introduction

In the era of Large Language Models (LLMs), the limitations of static prompting have become increasingly apparent. **Context Engineering** represents the natural evolution to address LLM uncertainty and achieve production-grade AI deployment. Unlike traditional prompt engineering, context engineering encompasses the complete information payload provided to LLMs at inference time, including all structured informational components necessary for plausible task completion.

This repository serves as a comprehensive survey of context engineering techniques, methodologies, and applications.

---

## 📚 Table of Contents

- [Awesome Context Engineering](#awesome-context-engineering)
- [💬 Join Our Community](#-join-our-community)
- [📧 Contact](#-contact)
- [📰 News](#-news)
- [🎯 Introduction](#-introduction)
- [📚 Table of Contents](#-table-of-contents)
- [🔗 Related Survey](#-related-survey)
- [🏗️ Definition of Context Engineering](#️-definition-of-context-engineering)
- [LLM Generation](#llm-generation)
- [Definition of Context](#definition-of-context)
- [Definition of Context Engineering](#definition-of-context-engineering)
- [Dynamic Context Orchestration](#dynamic-context-orchestration)
- [Mathematical Principles](#mathematical-principles)
- [Theoretical Framework: Bayesian Context Inference](#theoretical-framework-bayesian-context-inference)
- [Comparison](#comparison)
- [🌐 Related Blogs](#-related-blogs)
- [Social Media \& Talks](#social-media--talks)
- [🤔 Why Context Engineering?](#-why-context-engineering)
- [The Paradigm Shift: From Tactical to Strategic](#the-paradigm-shift-from-tactical-to-strategic)
- [1. Fundamental Challenges with Current Approaches](#1-fundamental-challenges-with-current-approaches)
- [Human Intent Communication Challenges](#human-intent-communication-challenges)
- [Complex Knowledge Requirements](#complex-knowledge-requirements)
- [Reliability and Trustworthiness Issues](#reliability-and-trustworthiness-issues)
- [2. Limitations of Static Prompting](#2-limitations-of-static-prompting)
- [From Strings to Systems](#from-strings-to-systems)
- [The "Movie Production" Analogy](#the-movie-production-analogy)
- [3. Enterprise and Production Requirements](#3-enterprise-and-production-requirements)
- [Context Failures Are the New Bottleneck](#context-failures-are-the-new-bottleneck)
- [Scalability Beyond Simple Tasks](#scalability-beyond-simple-tasks)
- [Reliability and Consistency](#reliability-and-consistency)
- [Economic and Operational Efficiency](#economic-and-operational-efficiency)
- [4. Cognitive and Information Science Foundations](#4-cognitive-and-information-science-foundations)
- [Artificial Embodiment](#artificial-embodiment)
- [Information Retrieval at Scale](#information-retrieval-at-scale)
- [5. The Future of AI System Architecture](#5-the-future-of-ai-system-architecture)
- [🔧 Components, Techniques and Architectures](#-components-techniques-and-architectures)
- [Context Scaling](#context-scaling)
- [Structured Data Integration](#structured-data-integration)
- [Self-Generated Context](#self-generated-context)
- [🛠️ Implementation and Challenges](#️-implementation-and-challenges)
- [1. Retrieval-Augmented Generation (RAG)](#1-retrieval-augmented-generation-rag)
- [2. Memory Systems](#2-memory-systems)
- [3. Agent Communication](#3-agent-communication)
- [4. Tool Use and Function Calling](#4-tool-use-and-function-calling)
- [📊 Evaluation Paradigms for Context-Driven Systems](#-evaluation-paradigms-for-context-driven-systems)
- [Context Quality Assessment](#context-quality-assessment)
- [Benchmarking Context Engineering](#benchmarking-context-engineering)
- [🚀 Applications and Systems](#-applications-and-systems)
- [Complex Research Systems](#complex-research-systems)
- [Production Systems](#production-systems)
- [🔮 Limitations and Future Directions](#-limitations-and-future-directions)
- [Current Limitations](#current-limitations)
- [Future Research Directions](#future-research-directions)
- [🤝 Contributing](#-contributing)
- [Paper Formatting Guidelines](#paper-formatting-guidelines)
- [Badge Colors](#badge-colors)
- [📄 License](#-license)
- [📑 Citation](#-citation)
- [⚠️ Disclaimer](#️-disclaimer)
- [📧 Contact](#-contact-1)
- [🙏 Acknowledgments](#-acknowledgments)
- [Star History](#star-history)
- [📖 Our Paper](#-our-paper)

---

## 🔗 Related Survey

General AI Survey Papers



  • A Survey of Large Language Models, Zhao et al.,arXiv Badge

    GitHub stars



  • The Prompt Report: A Systematic Survey of Prompt Engineering Techniques, Schulhoff et al., arXiv Badge

    GitHub stars



  • A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, Sahoo et al., arXiv Badge


  • A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models, Gao et al., arXiv Badge

    GitHub stars


Context and Reasoning



  • A Survey on In-context Learning, Dong et al., EMNLP Badge

    GitHub stars



  • The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis, Zhou et al., arXiv Badge

    GitHub stars



  • A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions, Gupta et al., arXiv Badge


  • Retrieval-Augmented Generation for Large Language Models: A Survey, Gao et al., arXiv Badge

    GitHub stars



  • A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Cheng et al., arXiv Badge

    GitHub stars


Memory Systems and Context Persistence

Survey



  • A Survey on the Memory Mechanism of Large Language Model based Agents, Zhang et al., arXiv Badge

    GitHub stars



  • Survey on Memory-Augmented Neural Networks: Cognitive Insights to AI Applications, Khosla et al., arXiv Badge


  • From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, Wu et al., arXiv Badge


  • Survey on Evaluation of LLM-based Agents, Anonymous et al., arXiv Badge


  • A Survey of Personalized Large Language Models: Progress and Future Directions, Anonymous et al., arXiv Badge


  • Agentic Retrieval-Augmented Generation: A Survey, Anonymous et al., arXiv Badge


  • Retrieval-Augmented Generation with Graphs (GraphRAG), Anonymous et al., arXiv Badge

    GitHub stars


Benchmarks



  • Evaluating Very Long-Term Conversational Memory of LLM Agents (LOCOMO), Anonymous et al., ACL Badge

    GitHub stars



  • Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions, Hu et al.,arXiv Badge

    GitHub stars


    HF Dataset



  • Episodic Memories Generation and Evaluation Benchmark for Large Language Models, Anonymous et al., arXiv Badge


  • On the Structural Memory of LLM Agents, Anonymous et al., arXiv Badge


  • HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering, Yang et al., EMNLP Badge

    GitHub stars



Neural Memory Architectures



  • Neural Turing Machines, Graves et al., arXiv Badge


  • Differentiable Neural Computers, Graves et al., arXiv Badge

    GitHub stars



  • A Brain-inspired Memory Transformation based Differentiable Neural Computer, Anonymous et al., arXiv Badge


  • Differentiable Neural Computers with Memory Demon, Anonymous et al., arXiv Badge


Memory-Augmented Transformers



  • Memorizing Transformers, Wu et al., arXiv Badge


  • Recurrent Memory Transformer, Bulatov et al., NeurIPS Badge

    GitHub stars



  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arXiv Badge


  • Memformer: A Memory-Augmented Transformer for Sequence Modeling, Wu et al., arXiv Badge


  • Token Turing Machines, Ryoo et al., arXiv Badge


  • TransformerFAM: Feedback Attention is Working Memory, Irie et al., arXiv Badge

Production Memory Systems



  • MemGPT: Towards LLMs as Operating Systems, Packer et al., arXiv Badge

    GitHub stars



  • MemoryBank: Enhancing Large Language Models with Long-Term Memory, Zhong et al., arXiv Badge

    GitHub stars



  • MEM0: Building Production-Ready AI Agents with Scalable Long-Term Memory, Taranjeet et al., arXiv Badge

    GitHub stars



  • MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents, Anonymous et al., arXiv Badge

    GitHub stars



  • A-MEM: Agentic Memory for LLM Agents, Anonymous et al., arXiv Badge

    GitHub stars



  • MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent, Anonymous et al., arXiv Badge


  • Memory OS of AI Agent, Kang et al., arXiv Badge

    GitHub stars



Graph-based Memory Systems



  • arigraph: learning knowledge graph world models with episodic memory for llm agents, Anonymous et al., arXiv Badge


  • Zep: A Temporal Knowledge Graph Architecture for Agent Memory, Anonymous et al., arXiv Badge

    GitHub stars



  • KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph, Anonymous et al., arXiv Badge


  • GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models, Anonymous et al., arXiv Badge


  • From Local to Global: A GraphRAG Approach to Query-Focused Summarization, Edge et al., arXiv Badge

    GitHub stars



  • Knowledge Graph-Guided Retrieval Augmented Generation, Zhu et al., arXiv Badge


Episodic and Working Memory



  • Larimar: Large Language Models with Episodic Memory Control, Goyal et al., ICML Badge


  • EM-LLM: Human-like Episodic Memory for Infinite Context LLMs, Anonymous et al., ICLR Badge

    GitHub stars



  • Large Language Models with Controllable Working Memory, Goyal et al., arXiv Badge


  • Empowering Working Memory for Large Language Model Agents, Anonymous et al., arXiv Badge


Conversational Memory



  • MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation, Anonymous et al., arXiv Badge


  • Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory, Anonymous et al., arXiv Badge


  • Generative Agents: Interactive Simulacra of Human Behavior, Park et al., arXiv Badge


  • Self-Controlled Memory Framework for Large Language Models, Anonymous et al., arXiv Badge


Foundational Survey Papers from Major Venues



  • AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts, Shin et al., EMNLP Badge

    GitHub stars



  • The Power of Scale for Parameter-Efficient Prompt Tuning, Lester et al., EMNLP Badge

    GitHub stars



  • Prefix-Tuning: Optimizing Continuous Prompts for Generation, Li et al., ACL Badge

    GitHub stars



  • An Explanation of In-context Learning as Implicit Bayesian Inference, Xie et al., ICLR Badge

    GitHub stars



  • Rethinking the Role of Demonstrations: What Makes In-context Learning Work?, Min et al., EMNLP Badge

    GitHub stars


Additional RAG and Retrieval Surveys



  • Retrieval-Augmented Generation for AI-Generated Content: A Survey, Various, arXiv Badge

    GitHub stars



  • Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, Various, arXiv Badge


  • Large language models (LLMs): survey, technical frameworks, and future challenges, Various, AIR Badge

---

## 🏗️ Definition of Context Engineering

> **Context is not just the single prompt users send to an LLM. Context is the complete information payload provided to a LLM at inference time, encompassing all structured informational components that the model needs to plausibly accomplish a given task.**

### LLM Generation

To formally define Context Engineering, we must first mathematically characterize the LLM generation process. Let us model an LLM as a probabilistic function:

$$P(\text{output} | \text{context}) = \prod_{t=1}^T P(\text{token}_t | \text{previous tokens}, \text{context})$$

Where:
- $\text{context}$ represents the complete input information provided to the LLM
- $\text{output}$ represents the generated response sequence
- $P(\text{token}_t | \text{previous tokens}, \text{context})$ is the probability of generating each token given the context

### Definition of Context

In traditional prompt engineering, the context is treated as a simple string:
$$\text{context} = \text{prompt}$$

However, in Context Engineering, we decompose the context into multiple structured components:

$$\text{context} = \text{Assemble}(\text{instructions}, \text{knowledge}, \text{tools}, \text{memory}, \text{state}, \text{query})$$

Where $\text{Assemble}$ is a context assembly function that orchestrates:
- $\text{instructions}$: System prompts and rules
- $\text{knowledge}$: Retrieved relevant information
- $\text{tools}$: Available function definitions
- $\text{memory}$: Conversation history and learned facts
- $\text{state}$: Current world/user state
- $\text{query}$: User's immediate request

### Definition of Context Engineering

**Context Engineering** is formally defined as the optimization problem:

$$\text{Assemble}^* = \arg\max_{\text{Assemble}} \mathbb{E} [\text{Reward}(\text{LLM}(\text{context}), \text{target})]$$

Subject to constraints:
- $|\text{context}| \leq \text{MaxTokens} \text{(context window limitation)}$
- $\text{knowledge} = \text{Retrieve}(\text{query}, \text{database})$
- $\text{memory} = \text{Select}(\text{history}, \text{query})$
- $\text{state} = \text{Extract}(\text{world})$

Where:
- $\text{Reward}$ measures the quality of generated responses
- $\text{Retrieve}$, $\text{Select}$, $\text{Extract}$ are functions for information gathering

### Dynamic Context Orchestration

The context assembly can be decomposed as:

$$\text{context} = \text{Concat}(\text{Format}(\text{instructions}), \text{Format}(\text{knowledge}), \text{Format}(\text{tools}), \text{Format}(\text{memory}), \text{Format}(\text{query}))$$

Where $\text{Format}$ represents component-specific structuring, and $\text{Concat}$ assembles them respecting token limits and optimal positioning.

**Context Engineering** is therefore the discipline of designing and optimizing these assembly and formatting functions to maximize task performance.

### Mathematical Principles

From this formalization, we derive four fundamental principles:

1. **System-Level Optimization**: Context generation is a multi-objective optimization problem over assembly functions, not simple string manipulation.

2. **Dynamic Adaptation**: The context assembly function adapts to each $\text{query}$ and $\text{state}$ at inference time: $\text{Assemble}(\cdot | \text{query}, \text{state})$.

3. **Information-Theoretic Optimality**: The retrieval function maximizes relevant information: $\text{Retrieve} = \arg\max \text{Relevance}(\text{knowledge}, \text{query})$.

4. **Structural Sensitivity**: The formatting functions encode structure that aligns with LLM processing capabilities.

### Theoretical Framework: Bayesian Context Inference

Context Engineering can be formalized within a Bayesian framework where the optimal context is inferred:

$$P(\text{context} | \text{query}, \text{history}, \text{world}) \propto P(\text{query} | \text{context}) \cdot P(\text{context} | \text{history}, \text{world})$$

Where:
- $P(\text{query} | \text{context})$ models query-context compatibility
- $P(\text{context} | \text{history}, \text{world})$ represents prior context probability

The optimal context assembly becomes:

$$\text{context}^* = \arg\max_{\text{context}} P(\text{answer} | \text{query}, \text{context}) \cdot P(\text{context} | \text{query}, \text{history}, \text{world})$$

This Bayesian formulation enables:
- **Uncertainty Quantification**: Modeling confidence in context relevance
- **Adaptive Retrieval**: Updating context beliefs based on feedback
- **Multi-step Reasoning**: Maintaining context distributions across interactions

### Comparison

| Dimension | Prompt Engineering | Context Engineering |
|-----------|-------------------|-------------------|
| **Mathematical Model** | $\text{context} = \text{prompt}$ (static) | $\text{context} = \text{Assemble}(...)$ (dynamic) |
| **Optimization Target** | $\arg\max_{\text{prompt}} P(\text{answer} \mid \text{query}, \text{prompt})$ | $\arg\max_{\text{Assemble}} \mathbb{E}[\text{Reward}(...)]$ |
| **Complexity** | $O(1)$ context assembly | $O(n)$ multi-component optimization |
| **Information Theory** | Fixed information content | Adaptive information maximization |
| **State Management** | Stateless function | Stateful with $\text{memory}(\text{history}, \text{query})$ |
| **Scalability** | Linear in prompt length | Sublinear through compression/filtering |
| **Error Analysis** | Manual prompt inspection | Systematic evaluation of assembly components |

---

## 🌐 Related Blogs

- [The rise of "context engineering"](https://blog.langchain.com/the-rise-of-context-engineering/)
- [The New Skill in AI is Not Prompting, It's Context Engineering](https://www.philschmid.de/context-engineering)
- [davidkimai/Context-Engineering: "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." ](https://github.com/davidkimai/Context-Engineering)
- [Context Engineering is Runtime of AI Agents | by Bijit Ghosh | Jun, 2025 | Medium](https://medium.com/@bijit211987/context-engineering-is-runtime-of-ai-agents-411c9b2ef1cb)
- [Context Engineering](https://blog.langchain.com/context-engineering-for-agents/)
- [Context Engineering for Agents](https://rlancemartin.github.io/2025/06/23/context_engineering/)
- [Cognition | Don't Build Multi-Agents](https://cognition.ai/blog/dont-build-multi-agents)
- [从Prompt Engineering到Context Engineering - 53AI-AI知识库|大模型知识库|大模型训练|智能体开发](https://www.53ai.com/news/tishicikuangjia/2025062727685.html)

### Social Media & Talks

- [Mastering Claude Code in 30 minutes](https://www.youtube.com/watch?v=6eBSHbLKuN0)
- [Context Engineering for Agents](https://www.youtube.com/watch?v=4GiqzUHD5AA)
- [Andrej Karpathy on X: "+1 for "context engineering" over "prompt engineering"](https://x.com/karpathy/status/1937902205765607626?ref=blog.langchain.com)
- [复旦大学/上海创智学院邱锡鹏:Context Scaling,通往AGI的下一幕](https://mp.weixin.qq.com/s/Knej0qbyr5j5KX_BO7FGew)

---

## 🤔 Why Context Engineering?

### The Paradigm Shift: From Tactical to Strategic

The evolution from prompt engineering to context engineering represents a fundamental maturation in AI system design. As influential figures like Andrej Karpathy, Tobi Lutke, and Simon Willison have argued, the term "prompt engineering" has been diluted to mean simply "typing things into a chatbot," failing to capture the complexity required for industrial-strength LLM applications.

### 1. Fundamental Challenges with Current Approaches

#### Human Intent Communication Challenges
- **Unclear Human Intent Expression**: Human intentions are often unclear, incomplete, or ambiguous when expressed in natural language
- **AI's Incomplete Understanding of Human Intent**: AI systems struggle to fully comprehend complex human intentions, especially those involving implicit context or cultural nuances
- **Overly Literal AI Interpretation**: AI systems often interpret human instructions too literally, missing the underlying intent or contextual meaning

#### Complex Knowledge Requirements
Single models alone cannot solve complex problems that require:
- **(1) Large-scale External Knowledge**: Vast amounts of external knowledge that exceed model capacity
- **(2) Accurate External Knowledge**: Precise, up-to-date information that models may not possess
- **(3) Novel External Knowledge**: Emerging knowledge that appears after model training

**Static Knowledge Limitations:**
- **Static Knowledge Problem**: Pre-trained models contain static knowledge that becomes outdated
- **Knowledge Cutoff**: Models cannot access information beyond their training data
- **Domain-Specific Gaps**: Models lack specialized knowledge for specific industries or applications

#### Reliability and Trustworthiness Issues
- **AI Hallucination**: LLMs generate plausible but factually incorrect information when lacking proper context
- **Lack of Provenance**: Absence of clear source attribution for generated information
- **Confidence Calibration**: Models often appear confident even when generating false information
- **Transparency Gaps**: Inability to trace how conclusions were reached
- **Accountability Issues**: Difficulty in verifying the reliability of AI-generated content

### 2. Limitations of Static Prompting

#### From Strings to Systems
Traditional prompting treats context as a static string, but enterprise applications require:
- **Dynamic Information Assembly**: Context created on-the-fly, tailored to specific users and queries
- **Multi-Source Integration**: Combining databases, APIs, documents, and real-time data
- **State Management**: Maintaining conversation history, user preferences, and workflow status
- **Tool Orchestration**: Coordinating external function calls and API interactions

#### The "Movie Production" Analogy
If prompt engineering is writing a single line of dialogue for an actor, context engineering is the entire process of building the set, designing lighting, providing detailed backstory, and directing the scene. The dialogue only achieves its intended impact because of the rich, carefully constructed environment surrounding it.

### 3. Enterprise and Production Requirements

#### Context Failures Are the New Bottleneck
Most failures in modern agentic systems are no longer attributable to core model reasoning capabilities but are instead **"context failures"**. The true engineering challenge lies not in what question to ask, but in ensuring the model has all necessary background, data, tools, and memory to answer meaningfully and reliably.

#### Scalability Beyond Simple Tasks
While prompt engineering suffices for simple, self-contained tasks, it breaks down when scaled to:
- **Complex, multi-step applications**
- **Data-rich enterprise environments**
- **Stateful, long-running workflows**
- **Multi-user, multi-tenant systems**

#### Reliability and Consistency
Enterprise applications demand:
- **Deterministic Behavior**: Predictable outputs across different contexts and users
- **Error Handling**: Graceful degradation when information is incomplete or contradictory
- **Audit Trails**: Transparency in how context influences model decisions
- **Compliance**: Meeting regulatory requirements for data handling and decision making

#### Economic and Operational Efficiency
Context Engineering enables:
- **Cost Optimization**: Strategic choice between RAG and long-context approaches
- **Latency Management**: Efficient information retrieval and context assembly
- **Resource Utilization**: Optimal use of finite context windows and computational resources
- **Maintenance Scalability**: Systematic approaches to updating and managing knowledge bases

Context Engineering provides the architectural foundation for managing state, integrating diverse data sources, and maintaining coherence across these demanding scenarios.

### 4. Cognitive and Information Science Foundations

#### Artificial Embodiment
LLMs are essentially "brains in a vat" - powerful reasoning engines lacking connection to specific environments. Context Engineering provides:
- **Synthetic Sensory Systems**: Retrieval mechanisms as artificial perception
- **Proxy Embodiment**: Tool use as artificial action capabilities
- **Artificial Memory**: Structured information storage and retrieval

#### Information Retrieval at Scale
Context Engineering addresses the fundamental challenge of information retrieval where the "user" is not human but an AI agent. This requires:
- **Semantic Understanding**: Bridging the gap between intent and expression
- **Relevance Optimization**: Ranking and filtering vast knowledge bases
- **Query Transformation**: Converting ambiguous requests into precise retrieval operations

### 5. The Future of AI System Architecture

Context Engineering elevates AI development from a collection of "prompting tricks" to a rigorous discipline of systems architecture. It applies decades of knowledge in operating system design, memory management, and distributed systems to the unique challenges of LLM-based applications.

This discipline is foundational for unlocking the full potential of LLMs in production systems, enabling the transition from one-off text generation to autonomous agents and sophisticated AI copilots that can reliably operate in complex, dynamic environments.

---

## 🔧 Components, Techniques and Architectures

### Context Scaling

Position Interpolation and Extension Techniques



  • Extending Context Window of Large Language Models via Position Interpolation, Chen et al., arXiv Badge

    GitHub stars



  • YaRN: Efficient Context Window Extension of Large Language Models, Peng et al., ICLR Badge

    GitHub stars



  • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, Ding et al., ICML Badge

    GitHub stars



  • LongRoPE2: Near-Lossless LLM Context Window Scaling, Shang et al., ICML Badge

    GitHub stars


Memory-Efficient Attention Mechanisms



  • Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences, Kang et al., ICLR Badge

    GitHub stars



  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, Munkhdalai et al., arXiv Badge

    GitHub stars



  • DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads, Xiao et al., ICLR Badge

    GitHub stars



  • Star Attention: Efficient LLM Inference over Long Sequences, Acharya et al., arXiv Badge

    GitHub stars


Ultra-Long Sequence Processing (100K+ Tokens)



  • TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation, Wu et al., ICML Badge

    GitHub stars



  • LongHeads: Multi-Head Attention is Secretly a Long Context Processor, Lu et al., EMNLP Badge

    GitHub stars



  • ∞Bench: Extending Long Context Evaluation Beyond 100K Tokens, Bai et al., ACL Badge

    GitHub stars


Comprehensive Extension Surveys and Methods



  • Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Various, arXiv Badge


  • A Controlled Study on Long Context Extension and Generalization in LLMs, Various, arXiv Badge

    GitHub stars



  • Selective Attention: Enhancing Transformer through Principled Context Control, Various, NeurIPS Badge

    GitHub stars



Vision-Language Models with Sophisticated Context Understanding



  • Towards LLM-Centric Multimodal Fusion: A Survey on Integration Strategies and Techniques, An et al., arXiv Badge


  • Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion, Wang et al., ACL Badge

    GitHub stars



  • V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding, Dai et al., arXiv Badge

    GitHub stars



  • Flamingo: a Visual Language Model for Few-Shot Learning, Alayrac et al., NeurIPS Badge

    GitHub stars


Audio-Visual Context Integration and Processing



  • Aligned Better, Listen Better for Audio-Visual Large Language Models, Guo et al., ICLR Badge


  • AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue, Chen et al., arXiv Badge


  • SonicVisionLM: Playing Sound with Vision Language Models, Xie et al., CVPR Badge

    GitHub stars



  • SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context, Li et al., arXiv Badge

    GitHub stars


Multi-Modal Prompt Engineering and Context Design



  • CaMML: Context-Aware Multimodal Learner for Large Models, Chen et al., ACL Badge


  • Visual In-Context Learning for Large Vision-Language Models, Zhou et al., ACL Badge


  • CAMA: Enhancing Multimodal In-Context Learning with Context-Aware Modulated Attention, Li et al., arXiv Badge

CVPR 2024 Vision-Language Advances



  • CogAgent: A Visual Language Model for GUI Agents, Various, CVPR Badge

    GitHub stars



  • LISA: Reasoning Segmentation via Large Language Model, Various, CVPR Badge

    GitHub stars



  • Reproducible scaling laws for contrastive language-image learning, Various, CVPR Badge

    GitHub stars


Video and Temporal Understanding



  • Video Understanding with Large Language Models: A Survey, Various, arXiv Badge

    GitHub stars


### Structured Data Integration

Knowledge Graph-Enhanced Language Models



  • Learn Together: Joint Multitask Finetuning of Pretrained KG-enhanced LLM for Downstream Tasks, Martynova et al., ICCL Badge

    GitHub stars



  • Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback, Sun et al., ICLR Badge


  • Knowledge Graph-Guided Retrieval Augmented Generation, Zhu et al., arXiv Badge

    GitHub stars



  • KGLA: Knowledge Graph Enhanced Language Agents for Customer Service, Anonymous et al., arXiv Badge

Graph Neural Networks Combined with Language Models



  • Are Large Language Models In-Context Graph Learners?, Li et al., arXiv Badge

    GitHub stars



  • Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning, Hu et al., EMNLP Badge

    GitHub stars



  • GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model, Yang et al., ICLR Badge


  • NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models, Ji et al., arXiv Badge

Structured Data Integration



  • CoddLLM: Empowering Large Language Models for Data Analytics, Authors et al., arXiv Badge


  • Structure-Guided Large Language Models for Text-to-SQL Generation, Authors et al., arXiv Badge


  • StructuredRAG: JSON Response Formatting with Large Language Models, Authors et al., arXiv Badge

    GitHub stars


Foundational KG-LLM Integration Methods



  • Unifying Large Language Models and Knowledge Graphs: A Roadmap, Various, arXiv Badge

    GitHub stars



  • Combining Knowledge Graphs and Large Language Models, Various, arXiv Badge


  • All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks, Various, arXiv Badge


  • Large Language Models for Graph Learning, Various, WWW Badge

### Self-Generated Context

Self-Supervised Context Generation and Augmentation



  • SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models, Chuang et al., arXiv Badge

    GitHub stars



  • Self-Supervised Prompt Optimization, Xiang et al., CoRR Badge

    GitHub stars



  • SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation, Duong et al., ICLR Badge

    GitHub stars


Reasoning Models That Generate Their Own Context



  • Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al., ICLR Badge


  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Yao et al., arXiv Badge

    GitHub stars



  • Rethinking Chain-of-Thought from the Perspective of Self-Training, Wu et al., arXiv Badge

    GitHub stars



  • Autonomous Tree-search Ability of Large Language Models, Authors et al., arXiv Badge

    GitHub stars


Iterative Context Refinement and Self-Improvement



  • Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al., arXiv Badge

    GitHub stars



  • Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning, Authors et al., arXiv Badge


  • Large Language Models Can Self-Improve in Long-context Reasoning, Li et al., arXiv Badge

    GitHub stars



  • Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, Oren et al., arXiv Badge GitHub stars


  • Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models, Zhou et al., arXiv Badge GitHub stars

Meta-Learning and Autonomous Context Evolution



  • Meta-in-context learning in large language models, Coda-Forno et al., NeurIPS Badge


  • EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers, Guo et al., ICLR Badge

    GitHub stars



  • AutoPDL: Automatic Prompt Optimization for LLM Agents, Spiess et al., AutoML Badge


  • Agent-Pro: Learning to Evolve Coder Agents via Proposal-based Programming, Zhang et al., arXiv Badge

Foundational Chain-of-Thought Research



  • Chain-of-thought prompting elicits reasoning in large language models, Wei et al., NeurIPS Badge

---

## 🛠️ Implementation and Challenges

### 1. Retrieval-Augmented Generation (RAG)

survey



  • Retrieval-Augmented Generation for Large Language Models: A Survey, Yunfan Gao et al., arXiv Badge

    GitHub stars



  • A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models, Siyun Zhao et al., arXiv Badge

    GitHub stars



  • Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, Siyun Zhao et al., arXiv Badge


  • Evaluation of Retrieval-Augmented Generation: A Survey, Hao Yu et al., arXiv Badge

    GitHub stars



  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., arXiv Badge

    GitHub stars



  • A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Cheng et al., arXiv Badge

    GitHub stars



  • A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models, Ding et al., arXiv Badge

Naive RAG



  • Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models, Xindi Wang et al., arXiv Badge


  • In-context Examples Selection for Machine Translation, Sweta Agrawal et al., arXiv Badge


  • In Defense of RAG in the Era of Long-Context Language Models, Tan Yu et al., arXiv Badge


  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis et al., arXiv Badge


  • LightRAG: Simple and Fast Retrieval-Augmented Generation, Zirui Guo et al., arXiv Badge

    GitHub stars



  • Generate rather than Retrieve: Large Language Models are Strong Context Generators, Wenhao Yu et al., arXiv Badge

    GitHub stars



  • Large language models can be easily distracted by irrelevant context, Freda Shi et al., arXiv Badge

    GitHub stars



  • Old IR Methods Meet RAG, Oz Huly et al.


  • Dense Passage Retrieval for Open-Domain Question Answering, Vladimir Karpukhin et al., arXiv Badge

    GitHub stars


Advanced RAG



  • Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity, Soyeong Jeong et al., arXiv Badge

    GitHub stars



  • Improving language models by retrieving from trillions of tokens, Sebastian Borgeaud et al., arXiv Badge


  • FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering, Tianchi Cai et al.


  • IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues, Diji Yang et al., arXiv Badge


  • RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation, Chao Jin et al., arXiv Badge


  • Corrective Retrieval Augmented Generation, Shi-Qi Yan et al., arXiv Badge

    GitHub stars



  • RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs, Yue Yu et al., arXiv Badge


  • Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models, Fei Wang et al., arXiv Badge


  • Learning to Filter Context for Retrieval-Augmented Generation, Zhiruo Wang et al., arXiv Badge

    GitHub stars



  • Query Rewriting in Retrieval-Augmented Large Language Models, Xinbei Ma et al., arXiv Badge

    GitHub stars



  • UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation, Daixuan Cheng et al., arXiv Badge

    GitHub stars



  • Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression, Huiqiang Jiang et al., arXiv Badge

    GitHub stars



  • Document-level event argument extraction by conditional generation, Sha Li et al., arXiv Badge

    GitHub stars



  • Multi-sentence Argument Linking, Seth Ebner et al., arXiv Badge

    GitHub stars



  • Fine-tuning or retrieval? comparing knowledge injection in llms, Oded Ovadia et al., arXiv Badge


  • IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions, Zhebin Zhang et al., arXiv Badge


  • Retrieval Meets Long Context Large Language Models, Peng Xu et al., arXiv Badge


  • Dense x retrieval: What retrieval granularity should we use?, Tong Chen et al., arXiv Badge

    GitHub stars



  • Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation, Ruiyang Ren et al., arXiv Badge

    GitHub stars



  • The Power of Noise: Redefining Retrieval for RAG Systems, Florin Cuconasu et al., arXiv Badge

    GitHub stars



  • RECITATION-AUGMENTED LANGUAGE MODELS, Zhiqing Sun et al., arXiv Badge

    GitHub stars



  • Robust Retrieval Augmented Generation for Zero-shot Slot Filling, Michael Glass et al., arXiv Badge

    GitHub stars



  • In-Context Retrieval-Augmented Language Models, Ori Ram et al., arXiv Badge

    GitHub stars



  • Learning to Retrieve In-Context Examples for Large Language Models, Liang Wang et al., arXiv Badge

    GitHub stars


Modular RAG



  • FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research, Jiajie Jin et al., arXiv Badge

    GitHub stars



  • Multi-Head RAG: Solving Multi-Aspect Problems with LLMs, Maciej Besta et al., arXiv Badge

    GitHub stars



  • StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization, Zhuoqun Li et al., arXiv Badge

    GitHub stars



  • RAFT: Adapting Language Model to Domain Specific RAG, Tianjun Zhang et al., arXiv Badge

    GitHub stars



  • Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System, Weizhou Shen et al., arXiv Badge

    GitHub stars



  • UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems, Hongru Wang et al., arXiv Badge


  • Retrieve-and-Sample: Document-level Event Argument Extraction via Hybrid Retrieval Augmentation, Yubing Ren et al.


  • RA-DIT: RETRIEVAL-AUGMENTED DUAL INSTRUCTION TUNING, Xi Victoria Lin et al., arXiv Badge

    GitHub stars



  • Self-Knowledge Guided Retrieval Augmentation for Large Language Models, Yile Wang et al., arXiv Badge

    GitHub stars



  • Prompt-Guided Retrieval Augmentation for Non-Knowledge-Intensive Tasks, Zhicheng Guo et al., arXiv Badge

    GitHub stars



  • REPLUG: Retrieval-Augmented Black-Box Language Models, Weijia Shi et al., arXiv Badge


  • Query Rewriting for Retrieval-Augmented Large Language Models, Xinbei Ma et al., DOI Badge

    GitHub stars



  • Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory, Xin Cheng et al., arXiv Badge

    GitHub stars



  • Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering, Shamane Siriwardhana et al., arXiv Badge

Graph-Based RAG



  • Don't Forget to Connect! Improving RAG with Graph-based Reranking, Jialin Dong et al., arXiv Badge


  • From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Darren Edge et al., arXiv Badge


  • GRAG: Graph Retrieval-Augmented Generation, Yuntong Hu et al., arXiv Badge

    GitHub stars



  • Iseeq: Information seeking question generation using dynamic meta-information retrieval and knowledge graphs, Manas Gaur et al., arXiv Badge

    GitHub stars



  • G-retriever: Retrieval-augmented generation for textual graph understanding and question answering, Xiaoxin He et al., arXiv Badge

    GitHub stars



  • Knowledge graph prompting for multi-document question answering, Yu Wang et al., arXiv Badge

    GitHub stars



  • GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning, Costas Mavromatis et al., arXiv Badge

    GitHub stars



  • LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

    GitHub stars



  • Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

    GitHub stars



  • Knowledge Graph-Guided Retrieval Augmented Generation

    GitHub stars



  • MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot

    GitHub stars



  • Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting, KGR et al., arXiv Badge

    GitHub stars



  • In-depth Analysis of Graph-based RAG in a Unified FrameworkarXiv Badge

    GitHub stars



  • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, Parth Sarthi et al., arXiv Badge

    GitHub stars



  • TableRAG: Million-Token Table Understanding with Language Models, Si-An Chen et al., arXiv Badge

    GitHub stars



  • KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, Lei Liang et al., arXiv Badge

    GitHub stars



  • GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation, Luo et al., arXiv Badge

    GitHub stars



  • HybridRAG: A Hybrid Retrieval System for RAG Combining Vector and Graph Search, Sarabesh, GitHub Badge

    GitHub stars


Agentic RAG



  • From RAG to Memory: Non-Parametric Continual Learning for Large Language Models, Bernal Jiménez Gutiérrez et al., arXiv Badge

    GitHub stars



  • HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models, Bernal Jiménez Gutiérrez et al., arXiv Badge

    GitHub stars



  • GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models, Shilong Li et al., arXiv Badge


  • PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, Myeonghwa Lee et al., arXiv Badge

    GitHub stars



  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, Akari Asai et al., arXiv Badge

    GitHub stars



  • DeepRAG: Thinking to Retrieve Step by Step for Large Language Models, Xinyan Guan et al., arXiv Badge


  • Paperqa: Retrieval-augmented generative agent for scientific research, Jakub Lála et al., arXiv Badge


  • Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogues, Hongru Wang et al., arXiv Badge

    GitHub stars



  • PRCA: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter, Haoyan Yang et al., arXiv Badge

    GitHub stars



  • SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND CRITIQUE THROUGH SELF-REFLECTION, Akari Asai et al., arXiv Badge

    GitHub stars



  • RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation, Zihao Wang et al., arXiv Badge

    GitHub stars



  • Chain-of-verification reduces hallucination in large language models, Shehzaad Dhuliawala et al., arXiv Badge


  • HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation, Liu et al., arXiv Badge

    GitHub stars



  • MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries, Tang & Yang, arXiv Badge

    GitHub stars



  • MMOA-RAG: Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning, Chen et al., arXiv Badge

    GitHub stars



  • Search-in-the-Chain: Towards Accurate, Credible, and Up-to-Date Large Language Models, Menick et al., arXiv Badge

Real-Time and Streaming RAG



  • StreamingRAG: Real-time Contextual Retrieval and Generation Framework, Sankaradas et al., arXiv Badge

    GitHub stars



  • Multi-task Retriever Fine-tuning for Domain-Specific and Efficient RAG, Authors, arXiv Badge

### 2. Memory Systems

Persistent Memory Architecture



  • MemGPT: Towards LLMs as Operating Systems, Packer et al., arXiv Badge