{"id":31842667,"url":"https://github.com/fareedkhan-dev/contextual-engineering-guide","last_synced_at":"2025-10-12T06:49:00.152Z","repository":{"id":306243621,"uuid":"1025499874","full_name":"FareedKhan-dev/contextual-engineering-guide","owner":"FareedKhan-dev","description":"Implementation of contextual engineering pipeline with LangChain and LangGraph Agents","archived":false,"fork":false,"pushed_at":"2025-07-24T12:03:58.000Z","size":40,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-24T15:47:28.951Z","etag":null,"topics":["ai","ai-agents","claude","context-engineering","langchain","langgraph","large-language-models","openai","prompt-engineering"],"latest_commit_sha":null,"homepage":"https://medium.com/@fareedkhandev/improving-langchain-ai-agents-using-contextual-engineering-0914d84601f3","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FareedKhan-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-24T10:50:35.000Z","updated_at":"2025-07-24T15:28:22.000Z","dependencies_parsed_at":"2025-07-24T15:47:35.005Z","dependency_job_id":"de77c80f-5083-461f-aad9-a6fa69c57076","html_url":"https://github.com/FareedKhan-dev/contextual-engineering-guide","commit_stats":null,"previous_names":["fareedkhan-dev/contextual-engineering-guide"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/FareedKhan-dev/contextual-engineering-guide","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2Fcontextual-engineering-guide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2Fcontextual-engineering-guide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2Fcontextual-engineering-guide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2Fcontextual-engineering-guide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FareedKhan-dev","download_url":"https://codeload.github.com/FareedKhan-dev/contextual-engineering-guide/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2Fcontextual-engineering-guide/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279010531,"owners_count":26084759,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-agents","claude","context-engineering","langchain","langgraph","large-language-models","openai","prompt-engineering"],"created_at":"2025-10-12T06:48:58.619Z","updated_at":"2025-10-12T06:49:00.142Z","avatar_url":"https://github.com/FareedKhan-dev.png","language":"Jupyter Notebook","readme":"\u003c!-- omit in toc --\u003e\n# LangChain AI Agents Using Contextual Engineering\n\nContext engineering means creating the right setup for an AI before giving it a task. This setup includes:\n\n*   **Instructions** on how the AI should act, like being a helpful budget travel guide\n*   Access to **useful info** from databases, documents, or live sources.\n*   Remembering **past conversations** to avoid repeats or forgetting.\n*   **Tools** the AI can use, such as calculators or search features.\n*   Important details about you, like your **preferences** or location.\n\n![Context Engineering](https://cdn-images-1.medium.com/max/1500/1*sCTOzjG6KP7slQuxLZUtNg.png)\n*Context Engineering (From [LangChain](https://blog.langchain.com/context-engineering-for-agents/) and [12Factor](https://github.com/humanlayer/12-factor-agents/tree/main))*\n\n[AI engineers are now shifting](https://diamantai.substack.com/p/why-ai-experts-are-moving-from-prompt) from prompt engineering to context engineering because…\n\n\u003e context engineering focuses on providing AI with the right background and tools, making its answers smarter and more useful.\n\nIn this blog, we will explore how **LangChain** and **LangGraph** two powerful tools for building AI agents, RAG apps, and LLM apps can be used to implement **contextual engineering** effectively to improve our AI Agents.\n\nThis guide is created on top of [langgchain ai](https://github.com/FareedKhan-dev/contextual-engineering-guide) guide.\n\n---\n\n\u003c!-- omit in toc --\u003e\n### Table of Contents\n- [What is Context Engineering?](#what-is-context-engineering)\n- [Scratchpad with LangGraph](#scratchpad-with-langgraph)\n- [Creating StateGraph](#creating-stategraph)\n- [Memory Writing in LangGraph](#memory-writing-in-langgraph)\n- [Scratchpad Selection Approach](#scratchpad-selection-approach)\n- [Memory Selection Ability](#memory-selection-ability)\n- [Advantage of LangGraph BigTool Calling](#advantage-of-langgraph-bigtool-calling)\n- [RAG with Contextual Engineering](#rag-with-contextual-engineering)\n- [Compression Strategy with knowledgeable Agents](#compression-strategy-with-knowledgeable-agents)\n- [Isolating Context using Sub-Agents Architecture](#isolating-context-using-sub-agents-architecture)\n- [Isolation using Sandboxed Environments](#isolation-using-sandboxed-environments)\n- [State Isolation in LangGraph](#state-isolation-in-langgraph)\n- [Summarizing Everything](#summarizing-everything)\n\n### What is Context Engineering?\nLLMs work like a new type of operating system. The LLM acts like the CPU, and its context window works like RAM, serving as its short-term memory. But, like RAM, the context window has limited space for different information.\n\n\u003e Just as an operating system decides what goes into RAM, “context engineering” is about choosing what the LLM should keep in its context.\n\n![Different Context Types](https://cdn-images-1.medium.com/max/1000/1*kMEQSslFkhLiuJS8-WEMIg.png)\n\nWhen building LLM applications, we need to manage different types of context. Context engineering covers these main types:\n\n*   Instructions: prompts, examples, memories, and tool descriptions\n*   Knowledge: facts, stored information, and memories\n*   Tools: feedback and results from tool calls\n\nThis year, more people are interested in agents because LLMs are better at thinking and using tools. Agents work on long tasks by using LLMs and tools together, choosing the next step based on the tool’s feedback.\n\n![Agent Workflow](https://cdn-images-1.medium.com/max/1500/1*Do44CZkpPYyIJefuNQ69GA.png)\n\nBut long tasks and collecting too much feedback from tools use a lot of tokens. This can create problems: the context window can overflow, costs and delays can increase, and the agent might work worse.\n\nDrew Breunig explained how too much context can hurt performance, including:\n\n*   Context Poisoning: [when a mistake or hallucination gets added to the context](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-poisoning)\n*   Context Distraction: [when too much context confuses the model](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-distraction)\n*   Context Confusion: [when extra, unnecessary details affect the answer](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-confusion)\n*   Context Clash: [when parts of the context give conflicting information](https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html?ref=blog.langchain.com#context-clash)\n\n![Multiple turns in Agent](https://cdn-images-1.medium.com/max/1500/1*ZJeZJPKI5jC_1BMCoghZxA.png)\n\nAnthropic [in their research](https://www.anthropic.com/engineering/built-multi-agent-research-system?ref=blog.langchain.com) stressed the need for it:\n\n\u003e Agents often have conversations with hundreds of turns, so managing context carefully is crucial.\n\nSo, how are people solving this problem today? Common strategies for agent context engineering can be grouped into four main types:\n\n*   Write: creating clear and useful context\n*   Select: picking only the most relevant information\n*   Compress: shortening context to save space\n*   Isolate: keeping different types of context separate\n\n![Categories of Context Engineering](https://cdn-images-1.medium.com/max/2600/1*CacnXVAI6wR4eSIWgnZ9sg.png)\n*Categories of Context Engineering (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\n[LangGraph](https://www.langchain.com/langgraph) is built to support all these strategies. We will go through each of these components one by one in [LangGraph](https://www.langchain.com/langgraph) and see how they help make our AI agents work better.\n\n### Scratchpad with LangGraph\nJust like humans take notes to remember things for later tasks, agents can do the same using a [scratchpad](https://www.anthropic.com/engineering/claude-think-tool). It stores information outside the context window so the agent can access it whenever needed.\n\n![First Component of CE](https://cdn-images-1.medium.com/max/1000/1*aXpKxYt03iZPcrGkxsFvrQ.png)\n*First Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nA good example is [Anthropic multi-agent researcher](https://www.anthropic.com/engineering/built-multi-agent-research-system):\n\n\u003e *The LeadResearcher plans its approach and saves it to memory, because if the context window goes beyond 200,000 tokens, it gets cut off so saving the plan ensures it isn’t lost.*\n\nScratchpads can be implemented in different ways:\n\n*   As a [tool call](https://www.anthropic.com/engineering/claude-think-tool) that [writes to a file](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem).\n*   As a field in a runtime [state object](https://langchain-ai.github.io/langgraph/concepts/low_level/#state) that persists during the session.\n\nIn short, scratchpads help agents keep important notes during a session to complete tasks effectively.\n\nIn terms of LangGraph, it supports both [short-term](https://langchain-ai.github.io/langgraph/concepts/memory/#short-term-memory) (thread-scoped) and [long-term memory](https://langchain-ai.github.io/langgraph/concepts/memory/#long-term-memory).\n\n*   Short-term memory uses [checkpointing](https://langchain-ai.github.io/langgraph/concepts/persistence/) to save the [agent state](https://langchain-ai.github.io/langgraph/concepts/low_level/#state) during a session. It works like a scratchpad, letting you store information while the agent runs and retrieve it later.\n\nThe state object is the main structure passed between graph nodes. You can define its format (usually a Python dictionary). It acts as a shared scratchpad, where each node can read and update specific fields.\n\n\u003e We will only import the modules when we need them, so we can learn step by step in a clear way.\n\nFor better and cleaner output, we will use Python `pprint` module for pretty printing and the `Console` module from the `rich` library. Let’s import and initialize them first:\n\n```python\n# Import necessary libraries\nfrom typing import TypedDict  # For defining the state schema with type hints\n\nfrom rich.console import Console  # For pretty-printing output\nfrom rich.pretty import pprint  # For pretty-printing Python objects\n\n# Initialize a console for rich, formatted output in the notebook.\nconsole = Console()\n```\n\nNext, we will create a `TypedDict` for the state object.\n\n```python\n# Define the schema for the graph's state using TypedDict.\n# This class acts as a data structure that will be passed between nodes in the graph.\n# It ensures that the state has a consistent shape and provides type hints.\nclass State(TypedDict):\n    \"\"\"\n    Defines the structure of the state for our joke generator workflow.\n\n    Attributes:\n        topic: The input topic for which a joke will be generated.\n        joke: The output field where the generated joke will be stored.\n    \"\"\"\n\n    topic: str\n    joke: str\n```\n\nThis state object will store the topic and the joke that we ask our agent to generate based on the given topic.\n\n### Creating StateGraph\nOnce we define a state object, we can write context to it using a [StateGraph](https://langchain-ai.github.io/langgraph/concepts/low_level/#stategraph).\n\nA StateGraph is LangGraph’s main tool for building stateful [agents or workflows](https://langchain-ai.github.io/langgraph/concepts/workflows/). Think of it as a directed graph:\n\n*   Nodes are steps in the workflow. Each node takes the current state as input, updates it, and returns the changes.\n*   Edges connect nodes, defining how execution flows this can be linear, conditional, or even cyclical.\n\nNext, we will:\n\n1.  Create a [chat model](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html) by choosing from [Anthropic models](https://docs.anthropic.com/en/docs/about-claude/models/overview).\n2.  Use it in a LangGraph workflow.\n\n```python\n# Import necessary libraries for environment management, display, and LangGraph\nimport getpass\nimport os\n\nfrom IPython.display import Image, display\nfrom langchain.chat_models import init_chat_model\nfrom langgraph.graph import END, START, StateGraph\n\n# --- Environment and Model Setup ---\n# Set the Anthropic API key to authenticate requests\nfrom dotenv import load_dotenv\napi_key = os.getenv(\"ANTHROPIC_API_KEY\")\nif not api_key:\n    raise ValueError(\"Missing ANTHROPIC_API_KEY in environment\")\n\n# Initialize the chat model to be used in the workflow\n# We use a specific Claude model with temperature=0 for deterministic outputs\nllm = init_chat_model(\"anthropic:claude-sonnet-4-20250514\", temperature=0)\n```\nWe’ve initialized our Sonnet model. LangChain supports many open-source and closed models through their APIs, so you can use any of them.\n\nNow, we need to create a function that generates a response using this Sonnet model.\n```python\n# --- Define Workflow Node ---\ndef generate_joke(state: State) -\u003e dict[str, str]:\n    \"\"\"\n    A node function that generates a joke based on the topic in the current state.\n\n    This function reads the 'topic' from the state, uses the LLM to generate a joke,\n    and returns a dictionary to update the 'joke' field in the state.\n\n    Args:\n        state: The current state of the graph, which must contain a 'topic'.\n\n    Returns:\n        A dictionary with the 'joke' key to update the state.\n    \"\"\"\n    # Read the topic from the state\n    topic = state[\"topic\"]\n    print(f\"Generating a joke about: {topic}\")\n\n    # Invoke the language model to generate a joke\n    msg = llm.invoke(f\"Write a short joke about {topic}\")\n\n    # Return the generated joke to be written back to the state\n    return {\"joke\": msg.content}\n```\nThis function simply returns a dictionary containing the generated response (the joke).\n\nNow, using the StateGraph, we can easily build and compile the graph. Let’s do that next.\n```python\n# --- Build and Compile the Graph ---\n# Initialize a new StateGraph with the predefined State schema\nworkflow = StateGraph(State)\n\n# Add the 'generate_joke' function as a node in the graph\nworkflow.add_node(\"generate_joke\", generate_joke)\n\n# Define the workflow's execution path:\n# The graph starts at the START entrypoint and flows to our 'generate_joke' node.\nworkflow.add_edge(START, \"generate_joke\")\n# After 'generate_joke' completes, the graph execution ends.\nworkflow.add_edge(\"generate_joke\", END)\n\n# Compile the workflow into an executable chain\nchain = workflow.compile()\n\n# --- Visualize the Graph ---\n# Display a visual representation of the compiled workflow graph\ndisplay(Image(chain.get_graph().draw_mermaid_png()))\n```\n![Our Generated Graph](https://cdn-images-1.medium.com/max/1000/1*SxWwYN-oO_rG9xUFgeuB-A.png)\n\nNow we can execute this workflow.\n```python\n# --- Execute the Workflow ---\n# Invoke the compiled graph with an initial state containing the topic.\n# The `invoke` method runs the graph from the START node to the END node.\njoke_generator_state = chain.invoke({\"topic\": \"cats\"})\n\n# --- Display the Final State ---\n# Print the final state of the graph after execution.\n# This will show both the input 'topic' and the output 'joke' that was written to the state.\nconsole.print(\"\\n[bold blue]Joke Generator State:[/bold blue]\")\npprint(joke_generator_state)\n\n#### OUTPUT ####\n{\n  'topic': 'cats', \n  'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'\n}\n```\nIt returns the dictionary which is basically the joke generation state of our agent. This simple example shows how we can write context to state.\n\n\u003e You can learn more about [Checkpointing](https://langchain-ai.github.io/langgraph/concepts/persistence/) for saving and resuming graph states, and [Human-in-the-loop](https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/) for pausing workflows to get human input before continuing.\n\n### Memory Writing in LangGraph\nScratchpads help agents work within a single session, but sometimes agents need to remember things across multiple sessions.\n\n*   [Reflexion](https://arxiv.org/abs/2303.11366) introduced the idea of agents reflecting after each turn and reusing self-generated hints.\n*   [Generative Agents](https://ar5iv.labs.arxiv.org/html/2304.03442) created long-term memories by summarizing past agent feedback.\n\n![Memory Writing](https://cdn-images-1.medium.com/max/1000/1*VaMVevdSVxDITLK1j0LfRQ.png)\n*Memory Writing (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nThese ideas are now used in products like [ChatGPT](https://help.openai.com/en/articles/8590148-memory-faq), [Cursor](https://forum.cursor.com/t/0-51-memories-feature/98509), and [Windsurf](https://docs.windsurf.com/windsurf/cascade/memories), which automatically create long-term memories from user interactions.\n\n*   Checkpointing saves the graph’s state at each step in a [thread](https://langchain-ai.github.io/langgraph/concepts/persistence/). A thread has a unique ID and usually represents one interaction — like a single chat in ChatGPT.\n*   Long-term memory lets you keep specific context across threads. You can save [individual files](https://langchain-ai.github.io/langgraph/concepts/memory/#profile) (e.g., a user profile) or [collections](https://langchain-ai.github.io/langgraph/concepts/memory/#collection) of memories.\n*   It uses the [BaseStore](https://langchain-ai.github.io/langgraph/reference/store/) interface, a key-value store. You can use it in memory (as shown here) or with [LangGraph Platform deployments](https://langchain-ai.github.io/langgraph/concepts/persistence/#langgraph-platform).\n\nLet’s now create an `InMemoryStore` to use across multiple sessions in this notebook.\n\n```python\nfrom langgraph.store.memory import InMemoryStore\n\n# --- Initialize Long-Term Memory Store ---\n# Create an instance of InMemoryStore, which provides a simple, non-persistent,\n# key-value storage system for use within the current session.\nstore = InMemoryStore()\n\n# --- Define a Namespace for Organization ---\n# A namespace is used to logically group related data within the store.\n# Here, we use a tuple to represent a hierarchical namespace,\n# which could correspond to a user ID and an application context.\nnamespace = (\"rlm\", \"joke_generator\")\n\n# --- Write Data to the Memory Store ---\n# Use the `put` method to save a key-value pair into the specified namespace.\n# This operation persists the joke generated in the previous step, making it\n# available for retrieval across different sessions or threads.\nstore.put(\n    namespace,  # The namespace to write to\n    \"last_joke\",  # The key for the data entry\n    {\"joke\": joke_generator_state[\"joke\"]},  # The value to be stored\n)\n```\nWe’ll discuss how to select context from a namespace in the upcoming section. For now, we can use the [search](https://langchain-ai.github.io/langgraph/reference/store/#langgraph.store.base.BaseStore.search) method to view items within a namespace and confirm that we successfully wrote to it.\n```python\n# Search the namespace to view all stored items\nstored_items = list(store.search(namespace))\n\n# Display the stored items with rich formatting\nconsole.print(\"\\n[bold green]Stored Items in Memory:[/bold green]\")\npprint(stored_items)\n\n#### OUTPUT ####\n[\n  Item(namespace=['rlm', 'joke_generator'], key='last_joke', \n  value={'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'},\n  created_at='2025-07-24T02:12:25.936238+00:00',\n  updated_at='2025-07-24T02:12:25.936238+00:00', score=None)\n]\n```\nNow, let’s embed everything we did into a LangGraph workflow.\n\nWe will compile the workflow with two arguments:\n\n*   `checkpointer` saves the graph state at each step in a thread.\n*   `store` keeps context across different threads.\n\n```python\nfrom langgraph.checkpoint.memory import InMemorySaver\nfrom langgraph.store.base import BaseStore\nfrom langgraph.store.memory import InMemoryStore\n\n# Initialize storage components\ncheckpointer = InMemorySaver()  # For thread-level state persistence\nmemory_store = InMemoryStore()  # For cross-thread memory storage\n\n\ndef generate_joke(state: State, store: BaseStore) -\u003e dict[str, str]:\n    \"\"\"Generate a joke with memory awareness.\n    \n    This enhanced version checks for existing jokes in memory\n    before generating new ones.\n    \n    Args:\n        state: Current state containing the topic\n        store: Memory store for persistent context\n        \n    Returns:\n        Dictionary with the generated joke\n    \"\"\"\n    # Check if there's an existing joke in memory\n    existing_jokes = list(store.search(namespace))\n    if existing_jokes:\n        existing_joke = existing_jokes[0].value\n        print(f\"Existing joke: {existing_joke}\")\n    else:\n        print(\"Existing joke: No existing joke\")\n\n    # Generate a new joke based on the topic\n    msg = llm.invoke(f\"Write a short joke about {state['topic']}\")\n    \n    # Store the new joke in long-term memory\n    store.put(namespace, \"last_joke\", {\"joke\": msg.content})\n\n    # Return the joke to be added to state\n    return {\"joke\": msg.content}\n\n\n# Build the workflow with memory capabilities\nworkflow = StateGraph(State)\n\n# Add the memory-aware joke generation node\nworkflow.add_node(\"generate_joke\", generate_joke)\n\n# Connect the workflow components\nworkflow.add_edge(START, \"generate_joke\")\nworkflow.add_edge(\"generate_joke\", END)\n\n# Compile with both checkpointing and memory store\nchain = workflow.compile(checkpointer=checkpointer, store=memory_store)\n```\nGreat! Now we can simply execute the updated workflow and test how it works with the memory feature enabled.\n```python\n# Execute the workflow with thread-based configuration\nconfig = {\"configurable\": {\"thread_id\": \"1\"}}\njoke_generator_state = chain.invoke({\"topic\": \"cats\"}, config)\n\n# Display the workflow result with rich formatting\nconsole.print(\"\\n[bold cyan]Workflow Result (Thread 1):[/bold cyan]\")\npprint(joke_generator_state)\n\n#### OUTPUT ####\nExisting joke: No existing joke\n\nWorkflow Result (Thread 1):\n{  'topic': 'cats', \n   'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'}\n```\nSince this is thread 1, there’s no existing joke stored in our AI agent’s memory which is exactly what we’d expect for a fresh thread.\n\nBecause we compiled the workflow with a checkpointer, we can now view the [latest state](https://langchain-ai.github.io/langgraph/concepts/persistence/#get-state) of the graph.\n```python\n# --- Retrieve and Inspect the Graph State ---\n# Use the `get_state` method to retrieve the latest state snapshot for the\n# thread specified in the `config` (in this case, thread \"1\"). This is\n# possible because we compiled the graph with a checkpointer.\nlatest_state = chain.get_state(config)\n\n# --- Display the State Snapshot ---\n# Print the retrieved state to the console. The StateSnapshot includes not only\n# the data ('topic', 'joke') but also execution metadata.\nconsole.print(\"\\n[bold magenta]Latest Graph State (Thread 1):[/bold magenta]\")\npprint(latest_state)\n```\nTake a look at the output:\n```\n### OUTPUT OF OUR LATEST STATE ###\nLatest Graph State:\n\nStateSnapshot(\n    values={\n        'topic': 'cats',\n        'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'\n    },\n    next=(),\n    config={\n        'configurable': {\n            'thread_id': '1',\n            'checkpoint_ns': '',\n            'checkpoint_id': '1f06833a-53a7-65a8-8001-548e412001c4'\n        }\n    },\n    metadata={'source': 'loop', 'step': 1, 'parents': {}},\n    created_at='2025-07-24T02:12:27.317802+00:00',\n    parent_config={\n        'configurable': {\n            'thread_id': '1',\n            'checkpoint_ns': '',\n            'checkpoint_id': '1f06833a-4a50-6108-8000-245cde0c2411'\n        }\n    },\n    tasks=(),\n    interrupts=()\n)\n```\nYou can see that our state now shows the last conversation we had with the agent in this case, where we asked it to tell a joke about cats.\n\nLet’s rerun the workflow with different ID.\n```python\n# Execute the workflow with a different thread ID\nconfig = {\"configurable\": {\"thread_id\": \"2\"}}\njoke_generator_state = chain.invoke({\"topic\": \"cats\"}, config)\n\n# Display the result showing memory persistence across threads\nconsole.print(\"\\n[bold yellow]Workflow Result (Thread 2):[/bold yellow]\")\npprint(joke_generator_state)\n\n#### OUTPUT ####\nExisting joke: {'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'}\nWorkflow Result (Thread 2):\n{'topic': 'cats', 'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'}\n```\nWe can see that the joke from the first thread has been successfully saved to memory.\n\n\u003e You can learn more about [LangMem](https://langchain-ai.github.io/langmem/) for memory abstractions and the [Ambient Agents Course](https://github.com/langchain-ai/agents-from-scratch/blob/main/notebooks/memory.ipynb) for an overview of memory in LangGraph agents.\n\n### Scratchpad Selection Approach\nHow you select context from a scratchpad depends on its implementation:\n\n*   If it’s a [tool](https://www.anthropic.com/engineering/claude-think-tool), the agent can read it directly by making a tool call.\n*   If it’s part of the agent’s runtime state, you (the developer) decide which parts of the state to share with the agent at each step. This gives you fine-grained control over what context is exposed.\n\n![Second Component of CE](https://cdn-images-1.medium.com/max/1000/1*VZiHtQ_8AlNdV3HIMrbBZA.png)\n*Second Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nIn previous step, we learned how to write to the LangGraph state object. Now, we’ll learn how to select context from the state and pass it to an LLM call in a downstream node.\n\nThis selective approach lets you control exactly what context the LLM sees during execution.\n```python\ndef generate_joke(state: State) -\u003e dict[str, str]:\n    \"\"\"Generate an initial joke about the topic.\n    \n    Args:\n        state: Current state containing the topic\n        \n    Returns:\n        Dictionary with the generated joke\n    \"\"\"\n    msg = llm.invoke(f\"Write a short joke about {state['topic']}\")\n    return {\"joke\": msg.content}\n\n\ndef improve_joke(state: State) -\u003e dict[str, str]:\n    \"\"\"Improve an existing joke by adding wordplay.\n    \n    This demonstrates selecting context from state - we read the existing\n    joke from state and use it to generate an improved version.\n    \n    Args:\n        state: Current state containing the original joke\n        \n    Returns:\n        Dictionary with the improved joke\n    \"\"\"\n    print(f\"Initial joke: {state['joke']}\")\n    \n    # Select the joke from state to present it to the LLM\n    msg = llm.invoke(f\"Make this joke funnier by adding wordplay: {state['joke']}\")\n    return {\"improved_joke\": msg.content}\n```\nTo make things a bit more complex, we’re now adding two workflows to our agent:\n\n1.  Generate Joke same as before.\n2.  Improve Joke takes the generated joke and makes it better.\n\nThis setup will help us understand how scratchpad selection works in LangGraph. Let’s now compile this workflow the same way we did earlier and check how our graph looks.\n```python\n# Build the workflow with two sequential nodes\nworkflow = StateGraph(State)\n\n# Add both joke generation nodes\nworkflow.add_node(\"generate_joke\", generate_joke)\nworkflow.add_node(\"improve_joke\", improve_joke)\n\n# Connect nodes in sequence\nworkflow.add_edge(START, \"generate_joke\")\nworkflow.add_edge(\"generate_joke\", \"improve_joke\")\nworkflow.add_edge(\"improve_joke\", END)\n\n# Compile the workflow\nchain = workflow.compile()\n\n# Display the workflow visualization\ndisplay(Image(chain.get_graph().draw_mermaid_png()))\n```\n![Our Generated Graph](https://cdn-images-1.medium.com/max/1000/1*XU_CMOwwboMYcK6lw3HjrA.png)\n\nWhen we execute this workflow, this is what we get.\n```python\n# Execute the workflow to see context selection in action\njoke_generator_state = chain.invoke({\"topic\": \"cats\"})\n\n# Display the final state with rich formatting\nconsole.print(\"\\n[bold blue]Final Workflow State:[/bold blue]\")\npprint(joke_generator_state)\n\n#### OUTPUT ####\nInitial joke: Why did the cat join a band?\n\nBecause it wanted to be the purr-cussionist!\nFinal Workflow State:\n{\n  'topic': 'cats',\n  'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'}\n```\nNow that we have executed our workflow, we can move on to using it in our memory selection step.\n\n### Memory Selection Ability\nIf agents can save memories, they also need to select relevant memories for the task at hand. This is useful for:\n\n*   [Episodic memories](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-types) few-shot examples showing desired behavior.\n*   [Procedural memories](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-types) instructions to guide behavior.\n*   [Semantic memories](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-types) facts or relationships that provide task-relevant context.\n\nSome agents use narrow, predefined files to store memories:\n\n*   Claude Code uses [`CLAUDE.md`](http://claude.md/).\n*   [Cursor](https://docs.cursor.com/context/rules) and [Windsurf](https://windsurf.com/editor/directory) use “rules” files for instructions or examples.\n\nBut when storing a large [collection](https://langchain-ai.github.io/langgraph/concepts/memory/#collection) of facts (semantic memories), selection gets harder.\n\n*   [ChatGPT](https://help.openai.com/en/articles/8590148-memory-faq) sometimes retrieves irrelevant memories, as shown by [Simon Willison](https://simonwillison.net/2025/Jun/6/six-months-in-llms/) when ChatGPT wrongly fetched his location and injected it into an image making the context feel like it “no longer belonged to him”.\n*   To improve selection, embeddings or [knowledge graphs](https://neo4j.com/blog/developer/graphiti-knowledge-graph-memory/#:~:text=changes%20since%20updates%20can%20trigger,and%20holistic%20memory%20for%20agentic) are used for indexing.\n\nIn our previous section, we wrote to the `InMemoryStore` in graph nodes. Now, we can select context from it using the [get](https://langchain-ai.github.io/langgraph/concepts/memory/#memory-storage) method to pull relevant state into our workflow.\n\n```python\nfrom langgraph.store.memory import InMemoryStore\n\n# Initialize the memory store\nstore = InMemoryStore()\n\n# Define namespace for organizing memories\nnamespace = (\"rlm\", \"joke_generator\")\n\n# Store the generated joke in memory\nstore.put(\n    namespace,                             # namespace for organization\n    \"last_joke\",                          # key identifier\n    {\"joke\": joke_generator_state[\"joke\"]} # value to store\n)\n\n# Select (retrieve) the joke from memory\nretrieved_joke = store.get(namespace, \"last_joke\").value\n\n# Display the retrieved context\nconsole.print(\"\\n[bold green]Retrieved Context from Memory:[/bold green]\")\npprint(retrieved_joke)\n\n#### OUTPUT ####\nRetrieved Context from Memory:\n{'joke': 'Why did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!'}\n```\nIt successfully retrieves the correct joke from memory.\n\nNow, we need to write a proper `generate_joke` function that can:\n\n1.  Take the current state (for the scratchpad context).\n2.  Use memory (to fetch past jokes if we’re performing a joke improvement task).\n\nLet’s code that next.\n```python\n# Initialize storage components\ncheckpointer = InMemorySaver()\nmemory_store = InMemoryStore()\n\ndef generate_joke(state: State, store: BaseStore) -\u003e dict[str, str]:\n    \"\"\"Generate a joke with memory-aware context selection.\n    \n    This function demonstrates selecting context from memory before\n    generating new content, ensuring consistency and avoiding duplication.\n    \n    Args:\n        state: Current state containing the topic\n        store: Memory store for persistent context\n        \n    Returns:\n        Dictionary with the generated joke\n    \"\"\"\n    # Select prior joke from memory if it exists\n    prior_joke = store.get(namespace, \"last_joke\")\n    if prior_joke:\n        prior_joke_text = prior_joke.value[\"joke\"]\n        print(f\"Prior joke: {prior_joke_text}\")\n    else:\n        print(\"Prior joke: None!\")\n\n    # Generate a new joke that differs from the prior one\n    prompt = (\n        f\"Write a short joke about {state['topic']}, \"\n        f\"but make it different from any prior joke you've written: {prior_joke_text if prior_joke else 'None'}\"\n    )\n    msg = llm.invoke(prompt)\n\n    # Store the new joke in memory for future context selection\n    store.put(namespace, \"last_joke\", {\"joke\": msg.content})\n\n    return {\"joke\": msg.content}\n```\nWe can now simply execute this memory-aware workflow the same way we did earlier.\n```python\n# Build the memory-aware workflow\nworkflow = StateGraph(State)\nworkflow.add_node(\"generate_joke\", generate_joke)\n\n# Connect the workflow\nworkflow.add_edge(START, \"generate_joke\")\nworkflow.add_edge(\"generate_joke\", END)\n\n# Compile with both checkpointing and memory store\nchain = workflow.compile(checkpointer=checkpointer, store=memory_store)\n\n# Execute the workflow with the first thread\nconfig = {\"configurable\": {\"thread_id\": \"1\"}}\njoke_generator_state = chain.invoke({\"topic\": \"cats\"}, config)\n\n#### OUTPUT ####\nPrior joke: None!\n```\nNo prior joke is detected, We can now print the latest state structure.\n```python\n# Get the latest state of the graph\nlatest_state = chain.get_state(config)\n\nconsole.print(\"\\n[bold magenta]Latest Graph State:[/bold magenta]\")\npprint(latest_state)\n```\nOur output:\n```\n#### OUTPUT OF LATEST STATE ####\nStateSnapshot(\n    values={\n        'topic': 'cats',\n        'joke': \"Here's a new one:\\n\\nWhy did the cat join a band?\\n\\nBecause it wanted to be the purr-cussionist!\"\n    },\n    next=(),\n    config={\n        'configurable': {\n            'thread_id': '1',\n            'checkpoint_ns': '',\n            'checkpoint_id': '1f068357-cc8d-68cb-8001-31f64daf7bb6'\n        }\n    },\n    metadata={'source': 'loop', 'step': 1, 'parents': {}},\n    created_at='2025-07-24T02:25:38.457825+00:00',\n    parent_config={\n        'configurable': {\n            'thread_id': '1',\n            'checkpoint_ns': '',\n            'checkpoint_id': '1f068357-c459-6deb-8000-16ce383a5b6b'\n        }\n    },\n    tasks=(),\n    interrupts=()\n)\n```\nWe fetch the previous joke from memory and pass it to the LLM to improve it.\n```python\n# Execute the workflow with a second thread to demonstrate memory persistence\nconfig = {\"configurable\": {\"thread_id\": \"2\"}}\njoke_generator_state = chain.invoke({\"topic\": \"cats\"}, config)\n\n\n#### OUTPUT ####\nPrior joke: Here is a new one:\nWhy did the cat join a band?\nBecause it wanted to be the purr-cussionist!\n```\nIt has successfully **fetched the correct joke from memory** and **improved it** as expected.\n\n### Advantage of LangGraph BigTool Calling\nAgents use tools, but giving them too many tools can cause confusion, especially when tool descriptions overlap. This makes it harder for the model to choose the right tool.\n\nA solution is to use RAG (Retrieval-Augmented Generation) on tool descriptions to fetch only the most relevant tools based on semantic similarity a method Drew Breunig calls [tool loadout](https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html).\n\n\u003e According to [recent research](https://arxiv.org/abs/2505.03275), this improves tool selection accuracy by up to 3x.\n\nFor tool selection, the [LangGraph Bigtool](https://github.com/langchain-ai/langgraph-bigtool) library is ideal. It applies semantic similarity search over tool descriptions to select the most relevant ones for the task. It uses LangGraph’s long-term memory store, allowing agents to search and retrieve the right tools for a given problem.\n\nLet’s understand `langgraph-bigtool` by using an agent with all functions from Python’s built-in math library.\n```python\nimport math\n\n# Collect functions from `math` built-in\nall_tools = []\nfor function_name in dir(math):\n    function = getattr(math, function_name)\n    if not isinstance(\n        function, types.BuiltinFunctionType\n    ):\n        continue\n    # This is an idiosyncrasy of the `math` library\n    if tool := convert_positional_only_function_to_tool(\n        function\n    ):\n        all_tools.append(tool)\n```\nWe first append all functions from Python’s math module into a list. Next, we need to convert these tool descriptions into vector embeddings so the agent can perform semantic similarity searches.\n\nFor this, we will use an embedding model in our case, the OpenAI text-embedding model.\n```python\n# Create registry of tools. This is a dict mapping\n# identifiers to tool instances.\ntool_registry = {\n    str(uuid.uuid4()): tool\n    for tool in all_tools\n}\n\n# Index tool names and descriptions in the LangGraph\n# Store. Here we use a simple in-memory store.\nembeddings = init_embeddings(\"openai:text-embedding-3-small\")\n\nstore = InMemoryStore(\n    index={\n        \"embed\": embeddings,\n        \"dims\": 1536,\n        \"fields\": [\"description\"],\n    }\n)\nfor tool_id, tool in tool_registry.items():\n    store.put(\n        (\"tools\",),\n        tool_id,\n        {\n            \"description\": f\"{tool.name}: {tool.description}\",\n        },\n    )\n```\nEach function is assigned a unique ID, and we structure these functions into a proper standardized format. This structured format ensures that the functions can be easily converted into embeddings for semantic search.\n\nLet’s now visualize the agent to see how it looks with all the math functions embedded and ready for semantic search!\n```python\n# Initialize agent\nbuilder = create_agent(llm, tool_registry)\nagent = builder.compile(store=store)\nagent\n```\n![Our Tool Agent](https://cdn-images-1.medium.com/max/1000/1*7uXCS9bgbNCwxB-6t6ZXOw.png)\n\nWe can now invoke our agent with a simple query and observe how our tool-calling agent selects and uses the most relevant math functions to answer the question.\n```python\n# Import a utility function to format and display messages\nfrom utils import format_messages\n\n# Define the query for the agent.\n# This query asks the agent to use one of its math tools to find the arc cosine.\nquery = \"Use available tools to calculate arc cosine of 0.5.\"\n\n# Invoke the agent with the query. The agent will search its tools,\n# select the 'acos' tool based on the query's semantics, and execute it.\nresult = agent.invoke({\"messages\": query})\n\n# Format and display the final messages from the agent's execution.\nformat_messages(result['messages'])\n```\n```\n┌────────────── Human   ───────────────┐\n│ Use available tools to calculate     │\n│ arc cosine of 0.5.                   │\n└──────────────────────────────────────┘\n\n┌────────────── 📝 AI ─────────────────┐\n│ I will search for a tool to calculate│\n│ the arc cosine of 0.5.               │\n│                                      │\n│ 🔧 Tool Call: retrieve_tools         │\n│ Args: {                              │\n│   \"query\": \"arc cosine arccos        │\n│            inverse cosine trig\"      │\n│ }                                    │\n└──────────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ────────┐\n│ Available tools: ['acos', 'acosh']   │\n└──────────────────────────────────────┘\n\n┌────────────── 📝 AI ─────────────────┐\n│ Perfect! I found the `acos` function │\n│ which calculates the arc cosine.     │\n│ Now I will use it to calculate the   │\n│ arc                                  │\n│ cosine of 0.5.                       │\n│                                      │\n│ 🔧 Tool Call: acos                   │\n│ Args: { \"x\": 0.5 }                   │\n└──────────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ────────┐\n│ 1.0471975511965976                   │\n└──────────────────────────────────────┘\n\n┌────────────── 📝 AI ─────────────────┐\n│ The arc cosine of 0.5 is ≈**1.047**  │\n│ radians.                             │\n│                                      │\n│ ✔ Check: cos(π/3)=0.5, π/3≈1.047 rad │\n│ (60°).                               │\n└──────────────────────────────────────┘\n```\nYou can see how efficiently our ai agent is calling the correct tool. You can learn more about:\n\n*   [**Toolshed**](https://arxiv.org/abs/2410.14594) introduces Toolshed Knowledge Bases and Advanced RAG-Tool Fusion for better tool selection in AI agents.\n*   [**Graph RAG-Tool Fusion**](https://arxiv.org/abs/2502.07223) combines vector retrieval with graph traversal to capture tool dependencies.\n*   [**LLM-Tool-Survey**](https://github.com/quchangle1/LLM-Tool-Survey) a comprehensive survey of tool learning with LLMs.\n*   [**ToolRet**](https://arxiv.org/abs/2503.01763) a benchmark for evaluating and improving tool retrieval in LLMs.\n\n### RAG with Contextual Engineering\n[RAG (Retrieval-Augmented Generation)](https://github.com/langchain-ai/rag-from-scratch) is a vast topic, and code agents are some of the best examples of agentic RAG in production.\n\nIn practice, RAG is often the central challenge of context engineering. As [Varun from Windsurf](https://x.com/_mohansolo/status/1899630246862966837) points out:\n\u003e Indexing ≠ context retrieval. Embedding search with AST-based chunking works, but fails as codebases grow. We need hybrid retrieval: grep/file search, knowledge-graph linking, and relevance-based re-ranking.\n\nLangGraph provides [tutorials and videos](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/) to help integrate RAG into agents. Typically, you build a retrieval tool that can use any combination of RAG techniques mentioned above.\n\nTo demonstrate, we’ll fetch documents for our RAG system using three of the most recent pages from Lilian Weng’s excellent blog.\n\nWe will start by pulling page content with the `WebBaseLoader` utility.\n```python\n# Import the WebBaseLoader to fetch documents from URLs\nfrom langchain_community.document_loaders import WebBaseLoader\n\n# Define the list of URLs for Lilian Weng's blog posts\nurls = [\n    \"https://lilianweng.github.io/posts/2025-05-01-thinking/\",\n    \"https://lilianweng.github.io/posts/2024-11-28-reward-hacking/\",\n    \"https://lilianweng.github.io/posts/2024-07-07-hallucination/\",\n    \"https://lilianweng.github.io/posts/2024-04-12-diffusion-video/\",\n]\n\n# Load the documents from the specified URLs using a list comprehension.\n# This creates a WebBaseLoader for each URL and calls its load() method.\ndocs = [WebBaseLoader(url).load() for url in urls]\n```\nThere are different ways to chunk data for RAG, and proper chunking is crucial for effective retrieval.\n\nHere, we’ll split the fetched documents into smaller chunks before indexing them into our vectorstore. We’ll use a simple, direct approach such as recursive chunking with overlapping segments to preserve context across chunks while keeping them manageable for embedding and retrieval.\n```python\n# Import the text splitter for chunking documents\nfrom langchain_text_splitters import RecursiveCharacterTextSplitter\n\n# Flatten the list of documents. WebBaseLoader returns a list of documents for each URL,\n# so we have a list of lists. This comprehension combines them into a single list.\ndocs_list = [item for sublist in docs for item in sublist]\n\n# Initialize the text splitter. This will split the documents into smaller chunks\n# of a specified size, with some overlap between chunks to maintain context.\ntext_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n    chunk_size=2000, chunk_overlap=50\n)\n\n# Split the documents into chunks.\ndoc_splits = text_splitter.split_documents(docs_list)\n```\nNow that we have our split documents, we can index them into a vector store that we’ll use for semantic search.\n```python\n# Import the necessary class for creating an in-memory vector store\nfrom langchain_core.vectorstores import InMemoryVectorStore\n\n# Create an in-memory vector store from the document splits.\n# This uses the 'doc_splits' created in the previous cell and the 'embeddings' model\n# initialized earlier to create vector representations of the text chunks.\nvectorstore = InMemoryVectorStore.from_documents(\n    documents=doc_splits, embedding=embeddings\n)\n\n# Create a retriever from the vector store.\n# The retriever provides an interface to search for relevant documents\n# based on a query.\nretriever = vectorstore.as_retriever()\n```\nWe have to create a retriever tool that we can use in our agent.\n```python\n# Import the function to create a retriever tool\nfrom langchain.tools.retriever import create_retriever_tool\n\n# Create a retriever tool from the vector store retriever.\n# This tool allows the agent to search for and retrieve relevant\n# documents from the blog posts based on a query.\nretriever_tool = create_retriever_tool(\n    retriever,\n    \"retrieve_blog_posts\",\n    \"Search and return information about Lilian Weng blog posts.\",\n)\n\n# The following line is an example of how to invoke the tool directly.\n# It's commented out as it's not needed for the agent execution flow but can be useful for testing.\n# retriever_tool.invoke({\"query\": \"types of reward hacking\"})\n```\nNow, we can implement an agent that can select context from the tool.\n```python\n# Augment the LLM with tools\ntools = [retriever_tool]\ntools_by_name = {tool.name: tool for tool in tools}\nllm_with_tools = llm.bind_tools(tools)\n```\nFor RAG based solutions, we need to create a clear system prompt to guide our agent’s behavior. This prompt acts as its core instruction set.\n```python\nfrom langgraph.graph import MessagesState\nfrom langchain_core.messages import SystemMessage, ToolMessage\nfrom typing_extensions import Literal\n\nrag_prompt = \"\"\"You are a helpful assistant tasked with retrieving information from a series of technical blog posts by Lilian Weng. \nClarify the scope of research with the user before using your retrieval tool to gather context. Reflect on any context you fetch, and\nproceed until you have sufficient context to answer the user's research request.\"\"\"\n```\nNext, we define the nodes of our graph. We’ll need two main nodes:\n\n1.  `llm_call` This is the brain of our agent. It takes the current conversation history (user query + previous tool outputs). It then decides the next step, call a tool or generate a final answer.\n2.  `tool_node` This is the action part of our agent. It executes the tool call requested by `llm_call`. It returns the tool’s result back to the agent.\n\n```python\n# --- Define Agent Nodes ---\n\ndef llm_call(state: MessagesState):\n    \"\"\"LLM decides whether to call a tool or generate a final answer.\"\"\"\n    # Add the system prompt to the current message state\n    messages_with_prompt = [SystemMessage(content=rag_prompt)] + state[\"messages\"]\n    \n    # Invoke the LLM with the augmented message list\n    response = llm_with_tools.invoke(messages_with_prompt)\n    \n    # Return the LLM's response to be added to the state\n    return {\"messages\": [response]}\n    \ndef tool_node(state: dict):\n    \"\"\"Performs the tool call and returns the observation.\"\"\"\n    # Get the last message, which should contain the tool calls\n    last_message = state[\"messages\"][-1]\n    \n    # Execute each tool call and collect the results\n    result = []\n    for tool_call in last_message.tool_calls:\n        tool = tools_by_name[tool_call[\"name\"]]\n        observation = tool.invoke(tool_call[\"args\"])\n        result.append(ToolMessage(content=str(observation), tool_call_id=tool_call[\"id\"]))\n        \n    # Return the tool's output as a message\n    return {\"messages\": result}\n```\nWe need a way to control the agent’s flow deciding whether it should call a tool or if it’s finished.\n\nTo handle this, we will create a conditional edge function called `should_continue`.\n\n*   This function checks if the last message from the LLM contains a tool call.\n*   If it does, the graph routes to the `tool_node`.\n*   If not, the execution ends.\n\n```python\n# --- Define Conditional Edge ---\n\ndef should_continue(state: MessagesState) -\u003e Literal[\"Action\", END]:\n    \"\"\"Decides the next step based on whether the LLM made a tool call.\"\"\"\n    last_message = state[\"messages\"][-1]\n    \n    # If the LLM made a tool call, route to the tool_node\n    if last_message.tool_calls:\n        return \"Action\"\n    # Otherwise, end the workflow\n    return END\n```\nWe can now simply build the workflow and compile the graph.\n```python\n# Build workflow\nagent_builder = StateGraph(MessagesState)\n\n# Add nodes\nagent_builder.add_node(\"llm_call\", llm_call)\nagent_builder.add_node(\"environment\", tool_node)\n\n# Add edges to connect nodes\nagent_builder.add_edge(START, \"llm_call\")\nagent_builder.add_conditional_edges(\n    \"llm_call\",\n    should_continue,\n    {\n        # Name returned by should_continue : Name of next node to visit\n        \"Action\": \"environment\",\n        END: END,\n    },\n)\nagent_builder.add_edge(\"environment\", \"llm_call\")\n\n# Compile the agent\nagent = agent_builder.compile()\n\n# Show the agent\ndisplay(Image(agent.get_graph(xray=True).draw_mermaid_png()))\n```\n![RAG Based Agent](https://cdn-images-1.medium.com/max/1000/1*0QxVbzakDabkoMfgURIx2w.png)\n\nThe graph shows a clear cycle:\n\n1.  the agent starts, calls the LLM.\n2.  based on the LLM’s decision, it either performs an action (calls our retriever tool) and loops back, or it finishes and provides the answer\n\nLet’s test our RAG agent. We’ll ask it a specific question about **“reward hacking”** that can only be answered by retrieving information from the blog posts we indexed.\n```python\n# Define the user's query\nquery = \"What are the types of reward hacking discussed in the blogs?\"\n\n# Invoke the agent with the query\nresult = agent.invoke({\"messages\": [(\"user\", query)]})\n\n# --- Display the Final Messages ---\n# Format and print the conversation flow\nformat_messages(result['messages'])\n```\n```\n┌──────────────  Human  ───────────────┐\n│ Clarify scope: I want types of       │\n│ reward hacking from Lilian Weng’s    │\n│ blog on RL.                          │\n└──────────────────────────────────────┘\n\n┌────────────── 📝 AI ─────────────────┐\n│ Fetching context from her posts...   │\n└──────────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ────────┐\n│ She lists 3 main types of reward     │\n│ hacking in RL:                       │\n└──────────────────────────────────────┘\n\n┌────────────── 📝 AI ─────────────────┐\n│ 1. **Spec gaming** – Exploit reward  │\n│    loopholes, not real goal.         │\n│                                      │\n│ 2. **Reward tampering** – Change or  │\n│    hack reward signals.              │\n│                                      │\n│ 3. **Wireheading** – Self-stimulate  │\n│    reward instead of task.           │\n└──────────────────────────────────────┘\n\n┌────────────── 📝 AI ─────────────────┐\n│ These can cause harmful, unintended  │\n│ behaviors in RL agents.              │\n└──────────────────────────────────────┘\n```\nAs you can see, the agent correctly identified that it needed to use its retrieval tool. It then successfully retrieved the relevant context from the blog posts and used that information to provide a detailed and accurate answer.\n\n\u003e This is a perfect example of how contextual engineering through RAG can create powerful, knowledgeable agents.\n\n### Compression Strategy with knowledgeable Agents\nAgent interactions can span [hundreds of turns](https://www.anthropic.com/engineering/built-multi-agent-research-system) and involve token-heavy tool calls. Summarization is a common way to manage this.\n\n![Third Component of CE](https://cdn-images-1.medium.com/max/1000/1*Xu76qgF1u2G3JipeIgHo5Q.png)\n*Third Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nFor example:\n\n*   Claude Code uses “[auto-compact](https://docs.anthropic.com/en/docs/claude-code/costs)” when the context window exceeds 95%, summarizing the entire user-agent interaction history.\n*   Summarization can compress an [agent trajectory](https://langchain-ai.github.io/langgraph/concepts/memory/#manage-short-term-memory) using strategies like [recursive](https://arxiv.org/pdf/2308.15022#:~:text=the%20retrieved%20utterances%20capture%20the,based%203) or [hierarchical](https://alignment.anthropic.com/2025/summarization-for-monitoring/#:~:text=We%20addressed%20these%20issues%20by,of%20our%20computer%20use%20capability) summarization.\n\nYou can also add summarization at specific points:\n\n*   After token-heavy tool calls (e.g., search tools) [example here](https://github.com/langchain-ai/open_deep_research/blob/e5a5160a398a3699857d00d8569cb7fd0ac48a4f/src/open_deep_research/utils.py#L1407).\n*   At agent-agent boundaries for knowledge transfer [Cognition](https://cognition.ai/blog/dont-build-multi-agents#a-theory-of-building-long-running-agents) does this in Devin using a fine-tuned model.\n\n![Summarization approach langgraph](https://cdn-images-1.medium.com/max/1500/1*y5AhaYoM_XDDrvlAnnFhcQ.png)\n*Summarization approach langgraph (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nLangGraph is a [low-level orchestration framework](https://blog.langchain.com/how-to-think-about-agent-frameworks/), giving you full control over:\n\n*   Designing your agent as a set of [nodes](https://www.youtube.com/watch?v=aHCDrAbH_go).\n*   Explicitly defining logic within each node.\n*   Passing a shared state object between nodes.\n\nThis makes it easy to compress context in different ways. For instance, you can:\n\n*   Use a message list as the agent state.\n*   Summarize it with [built-in utilities](https://langchain-ai.github.io/langgraph/how-tos/memory/add-memory/#manage-short-term-memory).\n\nWe will b using the same RAG based tool calling agent we coded earlier and add summarization of its conversation history.\n\nFirst, we need to extend our graph’s state to include a field for the final summary.\n```python\n# Define extended state with a summary field\nclass State(MessagesState):\n    \"\"\"Extended state that includes a summary field for context compression.\"\"\"\n    summary: str\n```\nNext, we’ll define a dedicated prompt for summarization and keep our RAG prompt from before.\n```python\n# Define the summarization prompt\nsummarization_prompt = \"\"\"Summarize the full chat history and all tool feedback to \ngive an overview of what the user asked about and what the agent did.\"\"\"\n```\nNow, we’ll create a `summary_node`.\n\n*   This node will be triggered at the end of the agent’s work to generate a concise summary of the entire interaction.\n*   The `llm_call` and `tool_node` remain unchanged.\n```python\ndef summary_node(state: MessagesState) -\u003e dict:\n    \"\"\"\n    Generate a summary of the conversation and tool interactions.\n\n    Args:\n        state: The current state of the graph, containing the message history.\n\n    Returns:\n        A dictionary with the key \"summary\" and the generated summary string\n        as the value, which updates the state.\n    \"\"\"\n    # Prepend the summarization system prompt to the message history\n    messages = [SystemMessage(content=summarization_prompt)] + state[\"messages\"]\n    \n    # Invoke the language model to generate the summary\n    result = llm.invoke(messages)\n    \n    # Return the summary to be stored in the 'summary' field of the state\n    return {\"summary\": result.content}\n```\nOur conditional edge should_continue now needs to decide whether to call a tool or move forward to the new summary_node.\n```python\ndef should_continue(state: MessagesState) -\u003e Literal[\"Action\", \"summary_node\"]:\n    \"\"\"Determine next step based on whether LLM made tool calls.\"\"\"\n    last_message = state[\"messages\"][-1]\n    \n    # If LLM made tool calls, execute them\n    if last_message.tool_calls:\n        return \"Action\"\n    # Otherwise, proceed to summarization\n    return \"summary_node\"\n```\nLet’s build the graph with this new summarization step at the end.\n```python\n# Build the RAG agent workflow\nagent_builder = StateGraph(State)\n\n# Add nodes to the workflow\nagent_builder.add_node(\"llm_call\", llm_call)\nagent_builder.add_node(\"Action\", tool_node)\nagent_builder.add_node(\"summary_node\", summary_node)\n\n# Define the workflow edges\nagent_builder.add_edge(START, \"llm_call\")\nagent_builder.add_conditional_edges(\n    \"llm_call\",\n    should_continue,\n    {\n        \"Action\": \"Action\",\n        \"summary_node\": \"summary_node\",\n    },\n)\nagent_builder.add_edge(\"Action\", \"llm_call\")\nagent_builder.add_edge(\"summary_node\", END)\n\n# Compile the agent\nagent = agent_builder.compile()\n\n# Display the agent workflow\ndisplay(Image(agent.get_graph(xray=True).draw_mermaid_png()))\n```\n![Our Created Agent](https://cdn-images-1.medium.com/max/1000/1*UTtZj95DQ9_0hXb-h2UetQ.png)\n\nNow, let’s run it with a query that will require fetching a lot of context.\n```python\nfrom rich.markdown import Markdown\n\nquery = \"Why does RL improve LLM reasoning according to the blogs?\"\nresult = agent.invoke({\"messages\": [(\"user\", query)]})\n\n# Print the final message to the user\nformat_message(result['messages'][-1])\n\n# Print the generated summary\nMarkdown(result[\"summary\"])\n\n\n#### OUTPUT ####\nThe user asked about why reinforcement learning (RL) improves LLM re...\n```\nNice, but it uses **115k tokens**! You can see the full trace [here](https://smith.langchain.com/public/50d70503-1a8e-46c1-bbba-a1efb8626b05/r). This is a common challenge with agents that have token-heavy tool calls.\n\nA more efficient approach is to compress the context *before* it enters the agent’s main scratchpad. Let’s update the RAG agent to summarize the tool call output on the fly.\n\nFirst, a new prompt for this specific task:\n```python\ntool_summarization_prompt = \"\"\"You will be provided a doc from a RAG system.\nSummarize the docs, ensuring to retain all relevant / essential information.\nYour goal is simply to reduce the size of the doc (tokens) to a more manageable size.\"\"\"\n```\nNext, we’ll modify our **tool_node** to include this summarization step.\n```python\ndef tool_node_with_summarization(state: dict):\n    \"\"\"Performs the tool call and then summarizes the output.\"\"\"\n    result = []\n    for tool_call in state[\"messages\"][-1].tool_calls:\n        tool = tools_by_name[tool_call[\"name\"]]\n        observation = tool.invoke(tool_call[\"args\"])\n        \n        # Summarize the doc\n        summary_msg = llm.invoke([\n            SystemMessage(content=tool_summarization_prompt),\n            (\"user\", str(observation))\n        ])\n        \n        result.append(ToolMessage(content=summary_msg.content, tool_call_id=tool_call[\"id\"]))\n    return {\"messages\": result}\n```\nNow, our `should_continue` edge can be simplified since we don’t need the final `summary_node` anymore.\n```python\ndef should_continue(state: MessagesState) -\u003e Literal[\"Action\", END]:\n    \"\"\"Decide if we should continue the loop or stop.\"\"\"\n    if state[\"messages\"][-1].tool_calls:\n        return \"Action\"\n    return END\n```\nLet’s build and compile this more efficient agent.\n```python\n# Build workflow\nagent_builder = StateGraph(MessagesState)\n\n# Add nodes\nagent_builder.add_node(\"llm_call\", llm_call)\nagent_builder.add_node(\"Action\", tool_node_with_summarization)\n\n# Add edges to connect nodes\nagent_builder.add_edge(START, \"llm_call\")\nagent_builder.add_conditional_edges(\n    \"llm_call\",\n    should_continue,\n    {\n        \"Action\": \"Action\",\n        END: END,\n    },\n)\nagent_builder.add_edge(\"Action\", \"llm_call\")\n\n# Compile the agent\nagent = agent_builder.compile()\n\n# Show the agent\ndisplay(Image(agent.get_graph(xray=True).draw_mermaid_png()))\n```\n![Our Updated Agent](https://cdn-images-1.medium.com/max/1000/1*FCRrXQxZveaQxyLHf6AROQ.png)\n\nLet’s run the same query and see the difference.\n```python\nquery = \"Why does RL improve LLM reasoning according to the blogs?\"\nresult = agent.invoke({\"messages\": [(\"user\", query)]})\nformat_messages(result['messages'])\n```\n```\n┌────────────── user ───────────────┐\n│ Why does RL improve LLM reasoning?│\n│ According to the blogs?            │\n└───────────────────────────────────┘\n\n┌────────────── 📝 AI ──────────────┐\n│ Searching Lilian Weng’s blog for  │\n│ how RL improves LLM reasoning...  │\n│                                   │\n│ 🔧 Tool Call: retrieve_blog_posts │\n│ Args: {                           │\n│ \"query\": \"Reinforcement Learning  │\n│ for LLM reasoning\"                │\n│ }                                │\n└───────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ─────┐\n│ Lilian Weng explains RL helps LLM │\n│ reasoning by training on rewards  │\n│ for each reasoning step (Process- │\n│ based Reward Models). This guides │\n│ the model to think step-by-step,  │\n│ improving coherence and logic.    │\n└───────────────────────────────────┘\n\n┌────────────── 📝 AI ──────────────┐\n│ RL improves LLM reasoning by       │\n│ rewarding stepwise thinking via    │\n│ PRMs, encouraging coherent,        │\n│ logical argumentation over final   │\n│ answers. It helps the model self-  │\n│ correct and explore better paths.  │\n└───────────────────────────────────┘\n```\n\u003e This time, the agent only used **60k tokens** See the trace [here](https://smith.langchain.com/public/994cdf93-e837-4708-9628-c83b397dd4b5/r).\n\nThis simple change cut our token usage nearly in half, making the agent far more efficient and cost-effective.\n\nYou can learn more about:\n\n*   [**Heuristic Compression and Message Trimming**](https://langchain-ai.github.io/langgraph/how-tos/memory/add-memory/#trim-messages) managing token limits by trimming messages to prevent context overflow.\n*   [**SummarizationNode as Pre-Model Hook**](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent-manage-message-history/) summarizing conversation history to control token usage in ReAct agents.\n*   [**LangMem Summarization**](https://langchain-ai.github.io/langmem/guides/summarization/) strategies for long context management with message summarization and running summaries.\n\n### Isolating Context using Sub-Agents Architecture\nA common way to isolate context is by splitting it across sub-agents. OpenAI [Swarm](https://github.com/openai/swarm) library was designed for this “[separation of concerns](https://openai.github.io/openai-agents-python/ref/agent/)” where each agent manages a specific sub-task with its own tools, instructions, and context window.\n\n![Fourth Component of CE](https://cdn-images-1.medium.com/max/1000/1*-b9BLPkLHkYsy2iLQIdxUg.png)\n*Fourth Component of CE (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nAnthropic’s [multi-agent researcher](https://www.anthropic.com/engineering/built-multi-agent-research-system) showed that multiple agents with isolated contexts outperformed a single agent by 90.2%, as each sub-agent focuses on a narrower sub-task.\n\n\u003e *Subagents operate in parallel with their own context windows, exploring different aspects of the question simultaneously.*\n\nHowever, multi-agent systems have challenges:\n\n*   Much higher token use (sometimes 15× more tokens than single-agent chat).\n*   Careful [prompt engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system) is required to plan sub-agent work.\n*   Coordinating sub-agents can be complex.\n\n![Multi Agent Parallelization](https://cdn-images-1.medium.com/max/1000/1*N_BT9M5OyYB7UJfDkpcL-g.png)\n*Multi Agent Parallelization (From [LangChain docs](https://blog.langchain.com/context-engineering-for-agents/))*\n\nLangGraph supports multi-agent setups. A common approach is the [supervisor](https://github.com/langchain-ai/langgraph-supervisor-py) architecture, also used in Anthropic multi-agent researcher. The supervisor delegates tasks to sub-agents, each running in its own context window.\n\nLet’s build a simple supervisor that manages two agents:\n\n*   `math_expert` handles mathematical calculations.\n*   `research_expert` searches and provides researched information.\n\nThe supervisor will decide which expert to call based on the query and coordinate their responses within the LangGraph workflow.\n```python\nfrom langgraph.prebuilt import create_react_agent\nfrom langgraph_supervisor import create_supervisor\n\n# --- Define Tools for Each Agent ---\ndef add(a: float, b: float) -\u003e float:\n    \"\"\"Add two numbers.\"\"\"\n    return a + b\n\ndef multiply(a: float, b: float) -\u003e float:\n    \"\"\"Multiply two numbers.\"\"\"\n    return a * b\n\ndef web_search(query: str) -\u003e str:\n    \"\"\"Mock web search function that returns FAANG company headcounts.\"\"\"\n    return (\n        \"Here are the headcounts for each of the FAANG companies in 2024:\\n\"\n        \"1. **Facebook (Meta)**: 67,317 employees.\\n\"\n        \"2. **Apple**: 164,000 employees.\\n\"\n        \"3. **Amazon**: 1,551,000 employees.\\n\"\n        \"4. **Netflix**: 14,000 employees.\\n\"\n        \"5. **Google (Alphabet)**: 181,269 employees.\"\n    )\n```\nNow we can create our specialized agents and the supervisor to manage them.\n```python\n# --- Create Specialized Agents with Isolated Contexts ---\nmath_agent = create_react_agent(\n    model=llm,\n    tools=[add, multiply],\n    name=\"math_expert\",\n    prompt=\"You are a math expert. Always use one tool at a time.\"\n)\n\nresearch_agent = create_react_agent(\n    model=llm,\n    tools=[web_search],\n    name=\"research_expert\",\n    prompt=\"You are a world class researcher with access to web search. Do not do any math.\"\n)\n\n# --- Create Supervisor Workflow for Coordinating Agents ---\nworkflow = create_supervisor(\n    [research_agent, math_agent],\n    model=llm,\n    prompt=(\n        \"You are a team supervisor managing a research expert and a math expert. \"\n        \"Delegate tasks to the appropriate agent to answer the user's query. \"\n        \"For current events or facts, use research_agent. \"\n        \"For math problems, use math_agent.\"\n    )\n)\n\n# Compile the multi-agent application\napp = workflow.compile()\n```\nLet’s execute the workflow and see how the supervisor delegates tasks.\n```python\n# --- Execute the Multi-Agent Workflow ---\nresult = app.invoke({\n    \"messages\": [\n        {\n            \"role\": \"user\",\n            \"content\": \"what's the combined headcount of the FAANG companies in 2024?\"\n        }\n    ]\n})\n\n# Format and display the results\nformat_messages(result['messages'])\n```\n```\n┌────────────── user ───────────────┐\n│ Learn more about LangGraph Swarm  │\n│ and multi-agent systems.          │\n└───────────────────────────────────┘\n\n┌────────────── 📝 AI ──────────────┐\n│ Fetching details on LangGraph     │\n│ Swarm and related resources...    │\n└───────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ─────┐\n│ **LangGraph Swarm**               │\n│ Repo:                             │\n│ https://github.com/langchain-ai/  │\n│ langgraph-swarm-py                │\n│                                   │\n│ • Python library for multi-agent  │\n│   AI with dynamic collaboration.  │\n│ • Agents hand off control based   │\n│   on specialization, keeping      │\n│   conversation context.           │\n│ • Supports custom handoffs,       │\n│   streaming, memory, and human-   │\n│   in-the-loop.                    │\n│ • Install:                        │\n│   `pip install langgraph-swarm`   │\n└───────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ─────┐\n│ **Videos on multi-agent systems** │\n│ 1. https://youtu.be/4nZl32FwU-o   │\n│ 2. https://youtu.be/JeyDrn1dSUQ   │\n│ 3. https://youtu.be/B_0TNuYi56w   │\n└───────────────────────────────────┘\n\n┌────────────── 📝 AI ──────────────┐\n│ LangGraph Swarm makes it easy to  │\n│ build context-aware multi-agent    │\n│ systems. Check videos for deeper   │\n│ insights on multi-agent behavior.  │\n└───────────────────────────────────┘\n```\nHere, the supervisor correctly isolates the context for each task sending the research query to the researcher and the math problem to the mathematician showing effective context isolation.\n\nYou can learn more about:\n\n*   [**LangGraph Swarm**](https://github.com/langchain-ai/langgraph-swarm-py) a Python library for building multi-agent systems with dynamic handoffs, memory, and human-in-the-loop support.\n*   [**Videos on multi-agent systems**](https://www.youtube.com/watch?v=4nZl32FwU-o) additional insights into building collaborative AI agents ([video 2](https://www.youtube.com/watch?v=JeyDrn1dSUQ), [video 3](https://www.youtube.com/watch?v=B_0TNuYi56w)).\n\n### Isolation using Sandboxed Environments\nHuggingFace’s [deep researcher](https://huggingface.co/blog/open-deep-research#:~:text=From%20building%20,it%20can%20still%20use%20it) shows a cool way to isolate context. Most agents use [tool calling APIs](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview) that return JSON arguments to run tools like search APIs and get results.\n\nHuggingFace uses a [CodeAgent](https://huggingface.co/papers/2402.01030) that writes code to call tools. This code runs in a secure [sandbox](https://e2b.dev/), and results from running the code are sent back to the LLM.\n\nThis keeps heavy data (like images or audio) outside the LLM’s token limit. HuggingFace explains:\n\n\u003e *[Code Agents allow for] better handling of state … Need to store this image/audio/other for later? Just save it as a variable in your state and use it later.*\n\nUsing sandboxes with LangGraph is easy. The [LangChain Sandbox](https://github.com/langchain-ai/langchain-sandbox) runs untrusted Python code securely using Pyodide (Python compiled to WebAssembly). You can add this as a tool to any LangGraph agent.\n\n**Note:** Deno is required. Install it here: https://docs.deno.com/runtime/getting_started/installation/\n```python\nfrom langchain_sandbox import PyodideSandboxTool\nfrom langgraph.prebuilt import create_react_agent\n\n# Create a sandbox tool with network access for package installation\ntool = PyodideSandboxTool(allow_net=True)\n\n# Create a ReAct agent with the sandbox tool\nagent = create_react_agent(llm, tools=[tool])\n\n# Execute a mathematical query using the sandbox\nresult = await agent.ainvoke(\n    {\"messages\": [{\"role\": \"user\", \"content\": \"what's 5 + 7?\"}]},\n)\n\n# Format and display the results\nformat_messages(result['messages'])\n```\n```\n┌────────────── user ───────────────┐\n│ what's 5 + 7?                    │\n└──────────────────────────────────┘\n\n┌────────────── 📝 AI ──────────────┐\n│ I can solve this by executing     │\n│ Python code in the sandbox.       │\n│                                  │\n│ 🔧 Tool Call: pyodide_sandbox     │\n│ Args: {                          │\n│   \"code\": \"print(5 + 7)\"          │\n│ }                                │\n└──────────────────────────────────┘\n\n┌────────────── 🔧 Tool Output ─────┐\n│ 12                               │\n└──────────────────────────────────┘\n\n┌────────────── 📝 AI ──────────────┐\n│ The answer is 12.                 │\n└──────────────────────────────────┘\n```\n### State Isolation in LangGraph\nAn agent’s **runtime state object** is another great way to isolate context, similar to sandboxing. You can design this state with a schema (like a Pydantic model) that has different fields for storing context.\n\nFor example, one field (like `messages`) is shown to the LLM each turn, while other fields keep information isolated until needed.\n\nLangGraph is built around a [**state**](https://langchain-ai.github.io/langgraph/concepts/low_level/#state) object, letting you create a custom state schema and access its fields throughout the agent’s workflow.\n\nFor instance, you can store tool call results in specific fields, keeping them hidden from the LLM until necessary. You’ve seen many examples of this in these notebooks.\n\n### Summarizing Everything\nLet’s summarize, what we have done so far:\n\n*   We used LangGraph `StateGraph` to create a **\"scratchpad\"** for short-term memory and an `InMemoryStore` for long-term memory, allowing our agent to store and recall information.\n*   We demonstrated how to selectively pull relevant information from the agent’s state and long-term memory. This included using Retrieval-Augmented Generation (`RAG`) to find specific knowledge and `langgraph-bigtool` to select the right tool from many options.\n*   To manage long conversations and token-heavy tool outputs, we implemented summarization.\n*   We showed how to compress `RAG` results on-the-fly to make the agent more efficient and reduce token usage.\n*   We explored keeping contexts separate to avoid confusion by building a multi-agent system with a supervisor that delegates tasks to specialized sub-agents and by using sandboxed environments to run code.\n\nAll these techniques fall under **“Contextual Engineering”** a strategy to improve AI agents by carefully managing their working memory (`context`) to make them more efficient, accurate, and capable of handling complex, long-running tasks.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffareedkhan-dev%2Fcontextual-engineering-guide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffareedkhan-dev%2Fcontextual-engineering-guide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffareedkhan-dev%2Fcontextual-engineering-guide/lists"}