{"id":29095032,"url":"https://github.com/fareedkhan-dev/multi-agent-ai-system","last_synced_at":"2026-05-07T13:45:37.741Z","repository":{"id":296512100,"uuid":"993635199","full_name":"FareedKhan-dev/Multi-Agent-AI-System","owner":"FareedKhan-dev","description":"Building a Multi-Agent AI System with LangGraph and LangSmith","archived":false,"fork":false,"pushed_at":"2025-05-31T07:40:54.000Z","size":1656,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-05-31T19:12:21.403Z","etag":null,"topics":["ai-agents","langchain","langgraph","langsmith","multi-agent-systems","openai"],"latest_commit_sha":null,"homepage":"https://medium.com/@fareedkhandev/6cb70487cd81","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FareedKhan-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-31T07:17:23.000Z","updated_at":"2025-05-31T16:23:00.000Z","dependencies_parsed_at":"2025-05-31T19:12:31.289Z","dependency_job_id":"4f1187f4-e425-46b0-93bb-ecfb38d4e2a4","html_url":"https://github.com/FareedKhan-dev/Multi-Agent-AI-System","commit_stats":null,"previous_names":["fareedkhan-dev/multi-agent-ai-system"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FareedKhan-dev/Multi-Agent-AI-System","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2FMulti-Agent-AI-System","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2FMulti-Agent-AI-System/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2FMulti-Agent-AI-System/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2FMulti-Agent-AI-System/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FareedKhan-dev","download_url":"https://codeload.github.com/FareedKhan-dev/Multi-Agent-AI-System/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FareedKhan-dev%2FMulti-Agent-AI-System/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262412211,"owners_count":23306875,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","langchain","langgraph","langsmith","multi-agent-systems","openai"],"created_at":"2025-06-28T10:06:39.742Z","updated_at":"2026-05-07T13:45:37.688Z","avatar_url":"https://github.com/FareedKhan-dev.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- omit in toc --\u003e\n# Multi-Agent AI System\n\nThis project is built following on top of the comprehensive guide from [LangChain](https://github.com/langchain-ai) official notebook documentation.\n\n\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-3100/) [![LangGraph](https://img.shields.io/badge/LangGraph-Multi--Agent-orange)](https://langchain-ai.github.io/langgraph/) [![LangSmith](https://img.shields.io/badge/LangSmith-Tracing-purple)](https://www.langchain.com/langsmith) [![Together AI](https://img.shields.io/badge/Together%20AI-API-green)](https://www.together.ai/) [![OpenAI](https://img.shields.io/badge/OpenAI-API-lightgrey)](https://openai.com/) [![SQLite](https://img.shields.io/badge/SQLite-Database-blue)](https://www.sqlite.org/) [![Medium](https://img.shields.io/badge/Medium-Blog-black?logo=medium)](https://medium.com/@fareedkhandev/building-a-multi-agent-ai-system-with-langgraph-and-langsmith-6cb70487cd81)\n\nIt is now becoming a trend that a powerful AI agent gets created by combining several smaller subagents. But this also brings challenges like reducing hallucinations, managing the conversation flow, keeping an eye on how the agent works during testing, allowing human in the loop, and evaluating its performance. You need to do a lot of trial and error.\n\nIn this blog, we will start by creating two simple subagents, then build a multi-agent system using a supervisor approach. Along the way, we will cover the basics, the challenges you might face when creating complex AI agentic architecture, and how to evaluate and improve them.\n\nWe will use tools like `LangGraph` and `LangSmith` to help us with this process.\n\n\u003c!-- omit in toc --\u003e\n## Getting Started\n\nThe repository tree looks like this:\n\n```\nMulti-Agent-AI-System/\n├── .env                # Environment variables for API keys\n├── README.md           # Project documentation\n├── requirements.txt    # Python dependencies\n├── multi_agent.ipynb   # Jupyter notebook for the multi-agent AI system\n├── utils.py            # Utility functions for the project\n└── LICENSE             # Project license information (MIT License)\n```\n\nMake sure you have `Python 3.10+` installed on your system, as this project requires it. You can install the required dependencies using pip:\n\n```bash\n# Clone the Multi-Agent AI System repository from GitHub\ngit clone https://github.com/FareedKhan-dev/Multi-Agent-AI-System.git\n\n# Navigate into the project directory\ncd Multi-Agent-AI-System\n\n# Install all required Python dependencies from requirements.txt\npip install -r requirements.txt\n```\n\n---\n\n\u003c!-- omit in toc --\u003e\n## Table of Contents\n\n- [Setting up the Environment](#setting-up-the-environment)\n- [Purpose of LangSmith](#purpose-of-langsmith)\n- [Choosing our Dataset](#choosing-our-dataset)\n- [Short-Term and Long-Term Memory](#short-term-and-long-term-memory)\n- [Our Multi-Agent Architecture](#our-multi-agent-architecture)\n- [Catalog Information Sub-agent](#catalog-information-sub-agent)\n  - [Defining State, Tools and Nodes](#defining-state-tools-and-nodes)\n  - [Testing First Sub-agent](#testing-first-sub-agent)\n- [Invoice Information Sub-agent Using Pre-built](#invoice-information-sub-agent-using-pre-built)\n  - [Testing Second Sub-agent](#testing-second-sub-agent)\n- [Creating Multi-Agent Using Supervisor](#creating-multi-agent-using-supervisor)\n  - [Testing our Multi-agent Architecture](#testing-our-multi-agent-architecture)\n- [Adding Human-in-the-Loop](#adding-human-in-the-loop)\n- [Adding Long-Term Memory](#adding-long-term-memory)\n  - [Testing our Long-term Memory Multi-agent](#testing-our-long-term-memory-multi-agent)\n- [Evaluating our Multi-AI Agent](#evaluating-our-multi-ai-agent)\n- [Swarm vs Supervisor](#swarm-vs-supervisor)\n\n---\n\n## Setting up the Environment\n\nSo, LangChain, LangGraph all these modules form an entire architecture. If I import all the libraries at once, it will definitely create confusion.\n\nSo we will only import modules when they are needed, as it will help us learn in a proper way.\n\nThe very first step is to create environment variables that will hold our sensitive info like API keys and other such things.\n\n```python\nimport os\n\n# Set environment variables for API integrations\nos.environ[\"OPENAI_API_KEY\"] = \"your-openai-api-key\"\nos.environ[\"LANGSMITH_API_KEY\"] = \"your-langsmith-api-key\"\nos.environ[\"LANGSMITH_TRACING\"] = \"true\"  # Enables LangSmith tracing\nos.environ[\"LANGSMITH_PROJECT\"] = \"intelligent-rag-system\"  # Project name for organizing LangSmith traces\n```\n\nWe will be using OpenAI models for both text generation and embeddings in this project. While the original notebook may reference other providers like Nebius AI or Together AI, LangChain offers extensive support for various model providers. You can explore the full range of available embedding and text generation models in their [documentation](https://python.langchain.com/docs/integrations/text_embedding/).\n\nLangSmith might be a new term for you. In case you don't know what it is, in the next section we will discuss its purpose. If you already know, you can skip to the following section.\n\nTo get the LangSmith API key, you can go to their [website](https://www.langchain.com/langsmith) and create an account. After that, under settings, you will find your API key.\n\n```python\nfrom langsmith import utils\n\n# Check and print whether LangSmith tracing is currently enabled\nprint(f\"LangSmith tracing is enabled: {utils.tracing_is_enabled()}\")\n```\n\n```text\n### output ###\nLangSmith tracing is enabled: True\n```\n\nWe just imported the `utils` from LangSmith that we will be using later, and tracing is set to true because previously we set the environment variable `LANGSMITH_TRACING = TRUE`, which helps us record and visualize the execution of our AI Agent application.\n\n## Purpose of LangSmith\n\nWhen we build AI agentic apps with LLMs, **LangSmith helps you understand and improve them**. It works like a **dashboard** that shows what is happening inside your app and lets you:\n\n![LangSmith Simple Workflow](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*BX5lOdJVWoW4CfS6F6YEAA.png)\n\n\n*   **Debug** when things go wrong\n*   **Test** your prompts and logic\n*   **Evaluate** how good the answers are\n*   **Monitor** your app in real time\n*   **Track** usage, speed, and cost\n\nLangSmith makes all of this easy to use, even if you are not a developer.\n\nSo, now that we understand the high-level purpose of LangSmith, and since we will be coding within it from time to time, let's import it.\n\n## Choosing our Dataset\n\nWe are going to use the **[Chinook Database](https://github.com/lerocha/chinook-database)**, which is a popular sample database used for learning and testing SQL. It simulates a digital music store’s data and operations, such as customer information, purchase history, and music catalog.\n\nIt comes in multiple formats like MySQL, PostgreSQL, and others, but we are going to use the SQLite version of the data, as it also helps us learn how an AI agent interacts with a database, especially useful for someone who is new to this AI agent guide.\n\nSo, let's define a function that will set up the SQLite database for us.\n\n```python\nimport sqlite3\nimport requests\nfrom langchain_community.utilities.sql_database import SQLDatabase\nfrom sqlalchemy import create_engine\nfrom sqlalchemy.pool import StaticPool\n\ndef get_engine_for_chinook_db():\n    \"\"\"\n    Pull SQL file, populate in-memory database, and create engine.\n    \n    Downloads the Chinook database SQL script from GitHub and creates an in-memory \n    SQLite database populated with the sample data.\n    \n    Returns:\n        sqlalchemy.engine.Engine: SQLAlchemy engine connected to the in-memory database\n    \"\"\"\n    # Download the Chinook database SQL script from the official repository\n    url = \"https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql\"\n    response = requests.get(url)\n    sql_script = response.text\n\n    # Create an in-memory SQLite database connection\n    # check_same_thread=False allows the connection to be used across threads\n    connection = sqlite3.connect(\":memory:\", check_same_thread=False)\n    \n    # Execute the SQL script to populate the database with sample data\n    connection.executescript(sql_script)\n    \n    # Create and return a SQLAlchemy engine that uses the populated connection\n    return create_engine(\n        \"sqlite://\",  # SQLite URL scheme\n        creator=lambda: connection,  # Function that returns the database connection\n        poolclass=StaticPool,  # Use StaticPool to maintain single connection\n        connect_args={\"check_same_thread\": False},  # Allow cross-thread usage\n    )\n```\n\nSo we just defined our first function, `get_engine_for_chinook_db()`, which sets up a temporary in-memory SQLite database using the Chinook sample dataset.\n\nIt downloads the SQL script from GitHub, creates the database in memory, runs the script to populate it with tables and data, and then returns a SQLAlchemy engine connected to this database.\n\nNow we need to initialize this function so that the SQLite database gets created.\n\n```python\n# Initialize the database engine with the Chinook sample data\nengine = get_engine_for_chinook_db()\n\n# Create a LangChain SQLDatabase wrapper around the engine\n# This provides convenient methods for database operations and query execution\ndb = SQLDatabase(engine)\n```\n\nWe just called the function and initialized the engine to run query operations on that database later on using the AI agent.\n\n## Short-Term and Long-Term Memory\n\nNow, that we initialize our database, we are going to look for first advantage of our combo (LangGraph + LangSmith), which is the two different types of memory availability, but first understand what is memory.\n\nIn any intelligent agent, memory plays a important role. Just like humans, an AI agent needs to remember past interactions to maintain context and provide personalized responses.\n\nIn LangGraph, we differentiate between **short-term memory** and **long-term memory**, here is quick difference between them:\n\n*   Short-term memory helps an agent keep track of the current conversation. In LangGraph, this is handled by a **MemorySaver**, which saves and resumes the state of the conversation.\n*   While Long-term memory lets the agent remember information across different conversations, like user preferences. For example, we can use an **InMemoryStore** for quick storage, but in real apps, you’d use a more permanent database.\n\nLet’s initialize them both.\n\n```python\nfrom langgraph.checkpoint.memory import MemorySaver\nfrom langgraph.store.memory import InMemoryStore\n\n# Initialize long-term memory store for persistent data between conversations\nin_memory_store = InMemoryStore()\n\n# Initialize checkpointer for short-term memory within a single thread/conversation\ncheckpointer = MemorySaver()\n```\n\nWe are using `in_memory_store` as long-term memory which will let us save user preferences even after a conversation ends.\n\nMeanwhile, the `MemorySaver` (checkpointer) keeps the current conversation’s context intact, enabling smooth multi-turn interactions.\n\n## Our Multi-Agent Architecture\n\nSo, our goal is to a realistic customer support agent which is not a single agent but through a multi-agent workflow in LangGraph.\n\nWe will start from a simple ReAct agent and add additional steps into the workflow, simulating a realistic customer support example, showcasing human-in-the-loop, long term memory, and the LangGraph pre-built library.\n\n![Multi-Agent Workflow Architecture](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*xQ0IskvmWCxBIcvi4FsbLg.png)\n\nWe will be building each of these components of our multi-agent workflow step by step, as it contains two sub-agents, two specialized ReAct (Reasoning and Acting) sub-agents which will then combine to create a multi-agent workflow including additional steps.\n\nOur workflow starts with:\n1.  **human_input**, where the user provides account information.\n2.  Then, in **verify_info**, the system checks the account and clarifies the user’s intent if needed.\n3.  Next, **load_memory** retrieves the user’s music preferences.\n4.  The **supervisor** coordinates two sub-agents: **music_catalog** (for music data) and **invoice_info** (for billing).\n5.  Finally, **create_memory** updates the user’s memory with new info from the interaction.\n\nSo now we have understand the basic, Let’s start building our first sub agent.\n\n## Catalog Information Sub-agent\n\nOur first sub-agent will be a **music catalog information agent**. Its primary role will be to assist customers with inquiries related to our digital music catalog, such as searching for artists, albums, or songs.\n\n![Catalog Information Sub-agent](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*QCPbwiD4Cn1sKUk1HDIOKA.png)\n\nHow will our agent remember information, decide what to do, and carry out actions? This brings us to three fundamental LangGraph concepts: **State**, **Tools**, and **Nodes**.\n\n### Defining State, Tools and Nodes\n\nIn LangGraph, the **State** holds the current data snapshot flowing through the graph, basically the agent’s memory.\n\nFor our customer support agent, the State includes:\n\n*   **customer_id:** Identifies the customer for personalized responses and data retrieval.\n*   **messages:** A list of all messages exchanged in the conversation, giving context to the agent.\n*   **loaded_memory:** Long-term user-specific info (like preferences) loaded into the conversation.\n*   **remaining_steps:** Counts how many steps are left to prevent infinite loops.\n\nEach node updates this State as the conversation progresses. Let’s define our State using `TypedDict` for type hinting and `Annotated` from LangGraph's message module for easy message appending.\n\n```python\nfrom typing_extensions import TypedDict\nfrom typing import Annotated, List\nfrom langgraph.graph.message import AnyMessage, add_messages\nfrom langgraph.managed.is_last_step import RemainingSteps\n\nclass State(TypedDict):\n    \"\"\"\n    State schema for the multi-agent customer support workflow.\n    \n    This defines the shared data structure that flows between nodes in the graph,\n    representing the current snapshot of the conversation and agent state.\n    \"\"\"\n    # Customer identifier retrieved from account verification\n    customer_id: str\n    \n    # Conversation history with automatic message aggregation\n    messages: Annotated[list[AnyMessage], add_messages]\n    \n    # User preferences and context loaded from long-term memory store\n    loaded_memory: str\n    \n    # Counter to prevent infinite recursion in agent workflow\n    remaining_steps: RemainingSteps \n```\n\nThis State class will serve as the blueprint for how information is managed and passed between different parts of our multi-agent system.\n\nNext, we’ll extend our agent’s abilities using **Tools**. Tools are functions that let the LLM do things it can’t do on its own, like calling APIs or accessing databases.\n\nFor our agent, tools will connect to the **Chinook database** to fetch music-related info.\n\nWe’ll define Python functions and mark them with `@tool` from `langchain_core.tools`, so the LLM can find and use them when needed.\n\n```python\nfrom langchain_core.tools import tool\nimport ast\n\n@tool\ndef get_albums_by_artist(artist: str):\n    \"\"\"\n    Get albums by an artist from the music database.\n    \n    Args:\n        artist (str): The name of the artist to search for albums.\n    \n    Returns:\n        str: Database query results containing album titles and artist names.\n    \"\"\"\n    return db.run(\n        f\"\"\"\n        SELECT Album.Title, Artist.Name \n        FROM Album \n        JOIN Artist ON Album.ArtistId = Artist.ArtistId \n        WHERE Artist.Name LIKE '%{artist}%';\n        \"\"\",\n        include_columns=True\n    )\n\n@tool\ndef get_tracks_by_artist(artist: str):\n    \"\"\"\n    Get songs/tracks by an artist (or similar artists) from the music database.\n    \n    Args:\n        artist (str): The name of the artist to search for tracks.\n    \n    Returns:\n        str: Database query results containing song names and artist names.\n    \"\"\"\n    return db.run(\n        f\"\"\"\n        SELECT Track.Name as SongName, Artist.Name as ArtistName \n        FROM Album \n        LEFT JOIN Artist ON Album.ArtistId = Artist.ArtistId \n        LEFT JOIN Track ON Track.AlbumId = Album.AlbumId \n        WHERE Artist.Name LIKE '%{artist}%';\n        \"\"\",\n        include_columns=True\n    )\n\n@tool\ndef get_songs_by_genre(genre: str):\n    \"\"\"\n    Fetch songs from the database that match a specific genre.\n    \n    This function first looks up the genre ID(s) for the given genre name,\n    then retrieves songs that belong to those genre(s), limiting results\n    to 8 songs grouped by artist.\n    \n    Args:\n        genre (str): The genre of the songs to fetch.\n    \n    Returns:\n        list[dict] or str: A list of songs with artist information that match \n                          the specified genre, or an error message if no songs found.\n    \"\"\"\n    # First, get the genre ID(s) for the specified genre\n    genre_id_query = f\"SELECT GenreId FROM Genre WHERE Name LIKE '%{genre}%'\"\n    genre_ids = db.run(genre_id_query)\n    \n    # Check if any genres were found\n    if not genre_ids:\n        return f\"No songs found for the genre: {genre}\"\n    \n    # Parse the genre IDs and format them for the SQL query\n    genre_ids = ast.literal_eval(genre_ids)\n    genre_id_list = \", \".join(str(gid[0]) for gid in genre_ids)\n\n    # Query for songs in the specified genre(s)\n    songs_query = f\"\"\"\n        SELECT Track.Name as SongName, Artist.Name as ArtistName\n        FROM Track\n        LEFT JOIN Album ON Track.AlbumId = Album.AlbumId\n        LEFT JOIN Artist ON Album.ArtistId = Artist.ArtistId\n        WHERE Track.GenreId IN ({genre_id_list})\n        GROUP BY Artist.Name\n        LIMIT 8;\n    \"\"\"\n    songs = db.run(songs_query, include_columns=True)\n    \n    # Check if any songs were found\n    if not songs:\n        return f\"No songs found for the genre: {genre}\"\n    \n    # Format the results into a structured list of dictionaries\n    formatted_songs = ast.literal_eval(songs)\n    return [\n        {\"Song\": song[\"SongName\"], \"Artist\": song[\"ArtistName\"]}\n        for song in formatted_songs\n    ]\n\n@tool\ndef check_for_songs(song_title):\n    \"\"\"\n    Check if a song exists in the database by its name.\n    \n    Args:\n        song_title (str): The title of the song to search for.\n    \n    Returns:\n        str: Database query results containing all track information \n             for songs matching the given title.\n    \"\"\"\n    return db.run(\n        f\"\"\"\n        SELECT * FROM Track WHERE Name LIKE '%{song_title}%';\n        \"\"\",\n        include_columns=True\n    )\n```\n\nIn this block, we have defined four specific tools:\n\n*   `get_albums_by_artist`: To find albums by a given artist\n*   `get_tracks_by_artist`: To find individual songs by an artist\n*   `get_songs_by_genre`: To retrieve songs belonging to a specific genre\n*   `check_for_songs`: To verify if a particular song exists in the catalog\n\nEach of these tools interacts with our `db` (the SQLDatabase wrapper we initialized earlier) by executing a SQL query. The results are then returned in a structured format.\n\n```python\n# Create a list of all music-related tools for the agent\nmusic_tools = [get_albums_by_artist, get_tracks_by_artist, get_songs_by_genre, check_for_songs]\n\n# Bind the music tools to the language model for use in the ReAct agent\nllm_with_music_tools = llm.bind_tools(music_tools)\n```\n\nFinally, we bind these `music_tools` to our `llm` using `llm.bind_tools()`.\n\nThis crucial step allows the LLM to understand when and how to call these functions based on the user's query.\n\nNow that our **State** are being defined and **Tools** ready, we can now define the **Nodes** of our graph.\n\nNodes are the core processing units in a LangGraph application that take the graph current State as input, perform some logic, and return an updated State.\n\nFor our ReAct agent, we will define two key types of nodes:\n\n*   **music_assistant** is the LLM reasoning node. It uses the current conversation history and memory to decide the next action, either calling a tool or generating a response, and updates the State.\n*   **music_tool_node** runs the tool selected by music_assistant. LangGraph ToolNode manages the tool call and updates the State with the result.\n\nBy combining these nodes, we enable dynamic reasoning and action within our multi-agent workflow.\n\nLet’s first create the `ToolNode` for our `music_tools`:\n\n```python\nfrom langgraph.prebuilt import ToolNode\n\n# Create a tool node that executes the music-related tools\n# ToolNode is a pre-built LangGraph component that handles tool execution\nmusic_tool_node = ToolNode(music_tools)\n```\n\nNow, we’ll define the `music_assistant` node. This node will use our LLM (with the `music_tools` bound to it) to determine the next action.\n\nIt also incorporates any `loaded_memory` into its prompt, allowing for personalized responses.\n\n```python\nfrom langchain_core.messages import ToolMessage, SystemMessage, HumanMessage\nfrom langchain_core.runnables import RunnableConfig\n\ndef generate_music_assistant_prompt(memory: str = \"None\") -\u003e str:\n    \"\"\"\n    Generate a system prompt for the music assistant agent.\n    \n    Args:\n        memory (str): User preferences and context from long-term memory store\n        \n    Returns:\n        str: Formatted system prompt for the music assistant\n    \"\"\"\n    return f\"\"\"\n    You are a member of the assistant team, your role specifically is to focused on helping customers discover and learn about music in our digital catalog. \n    If you are unable to find playlists, songs, or albums associated with an artist, it is okay. \n    Just inform the customer that the catalog does not have any playlists, songs, or albums associated with that artist.\n    You also have context on any saved user preferences, helping you to tailor your response. \n    \n    CORE RESPONSIBILITIES:\n    - Search and provide accurate information about songs, albums, artists, and playlists\n    - Offer relevant recommendations based on customer interests\n    - Handle music-related queries with attention to detail\n    - Help customers discover new music they might enjoy\n    - You are routed only when there are questions related to music catalog; ignore other questions. \n    \n    SEARCH GUIDELINES:\n    1. Always perform thorough searches before concluding something is unavailable\n    2. If exact matches aren't found, try:\n       - Checking for alternative spellings\n       - Looking for similar artist names\n       - Searching by partial matches\n       - Checking different versions/remixes\n    3. When providing song lists:\n       - Include the artist name with each song\n       - Mention the album when relevant\n       - Note if it's part of any playlists\n       - Indicate if there are multiple versions\n    \n    Additional context is provided below: \n\n    Prior saved user preferences: {memory}\n    \n    Message history is also attached.  \n    \"\"\"\n```\n\nWe also need to create a `music_assistant` function too, so let's create one.\n\n```python\ndef music_assistant(state: State, config: RunnableConfig):\n    \"\"\"\n    Music assistant node that handles music catalog queries and recommendations.\n    \n    This node processes customer requests related to music discovery, album searches,\n    artist information, and personalized recommendations based on stored preferences.\n    \n    Args:\n        state (State): Current state containing customer_id, messages, loaded_memory, etc.\n        config (RunnableConfig): Configuration for the runnable execution\n        \n    Returns:\n        dict: Updated state with the assistant's response message\n    \"\"\"\n    # Retrieve long-term memory preferences if available\n    memory = \"None\" \n    if \"loaded_memory\" in state: \n        memory = state[\"loaded_memory\"]\n\n    # Generate instructions for the music assistant agent\n    music_assistant_prompt = generate_music_assistant_prompt(memory)\n\n    # Invoke the language model with tools and system prompt\n    # The model can decide whether to use tools or respond directly\n    response = llm_with_music_tools.invoke([SystemMessage(music_assistant_prompt)] + state[\"messages\"])\n    \n    # Return updated state with the assistant's response\n    return {\"messages\": [response]}\n```\n\nThe `music_assistant` node constructs a detailed system prompt for the LLM, including general instructions and the `loaded_memory` for personalization.\n\nIt then invokes the `llm_with_music_tools` with this system message and the current conversation messages. Based on its reasoning, the LLM might output a final answer or a tool call.\n\nIt simply returns this LLM response, which `add_messages` (from our State definition) will automatically append to the `messages` list in the State.\n\nWith our State and Nodes in place, the next step is to connect them using Edges, which define the execution flow in the graph.\n\nNormal Edges are straightforward — they always route from one specific node to another.\n\nConditional Edges are dynamic. These are Python functions that examine the current State and decide which node to visit next.\n\nFor our ReAct agent, we need a conditional edge that checks whether the `music_assistant` should:\n\n*   **Invoke tools:** If the LLM decides to call a tool, we route to `music_tool_node` to execute it.\n*   End the process, If the LLM provides a final response without tool calls, we conclude the sub-agent’s execution.\n\nTo handle this logic, we define the `should_continue` function.\n\n```python\ndef should_continue(state: State, config: RunnableConfig):\n    \"\"\"\n    Conditional edge function that determines the next step in the ReAct agent workflow.\n    \n    This function examines the last message in the conversation to decide whether the agent\n    should continue with tool execution or end the conversation.\n    \n    Args:\n        state (State): Current state containing messages and other workflow data\n        config (RunnableConfig): Configuration for the runnable execution\n        \n    Returns:\n        str: Either \"continue\" to execute tools or \"end\" to finish the workflow\n    \"\"\"\n    # Get all messages from the current state\n    messages = state[\"messages\"]\n    \n    # Examine the most recent message to check for tool calls\n    last_message = messages[-1]\n    \n    # If the last message doesn't contain any tool calls, the agent is done\n    if not last_message.tool_calls:\n        return \"end\"\n    # If there are tool calls present, continue to execute them\n    else:\n        return \"continue\"\n```\n\nThe `should_continue` function checks the last message in the State. If it includes `tool_calls`, it means the LLM wants to use a tool, so the function returns `\"continue\"`.\n\nOtherwise, it returns `\"end\"`, indicating the LLM has provided a direct response and the sub-agent’s task is complete.\n\nNow that we have all the pieces, State, Nodes, and Edges.\n\nLet’s assemble them to construct our complete ReAct agent using `StateGraph`.\n\n```python\nfrom langgraph.graph import StateGraph, START, END\nfrom utils import show_graph # Assuming utils.py has this function\n\n# Create a new StateGraph instance for the music workflow\nmusic_workflow = StateGraph(State)\n\n# Add nodes to the graph\n# music_assistant: The reasoning node that decides which tools to invoke or responds directly\nmusic_workflow.add_node(\"music_assistant\", music_assistant)\n# music_tool_node: The execution node that handles all music-related tool calls\nmusic_workflow.add_node(\"music_tool_node\", music_tool_node)\n\n# Add edges to define the flow of the graph\n# Set the entry point - all queries start with the music assistant\nmusic_workflow.add_edge(START, \"music_assistant\")\n\n# Add conditional edge from music_assistant based on whether tools need to be called\nmusic_workflow.add_conditional_edges(\n    \"music_assistant\",\n    # Conditional function that determines the next step\n    should_continue,\n    {\n        # If tools need to be executed, route to tool node\n        \"continue\": \"music_tool_node\",\n        # If no tools needed, end the workflow\n        \"end\": END,\n    },\n)\n\n# After tool execution, always return to the music assistant for further processing\nmusic_workflow.add_edge(\"music_tool_node\", \"music_assistant\")\n\n# Compile the graph with checkpointer for short-term memory and store for long-term memory\nmusic_catalog_subagent = music_workflow.compile(\n    name=\"music_catalog_subagent\", \n    checkpointer=checkpointer, \n    store=in_memory_store\n)\n\n# Display the compiled graph structure\nshow_graph(music_catalog_subagent)\n```\n\nIn this final step, we create a `StateGraph` using our defined State. We add nodes for both `music_assistant` and `music_tool_node`.\n\nThe graph starts at `START`, which leads to `music_assistant`. The core ReAct loop is set up with conditional edges from `music_assistant` that route to `music_tool_node` if a tool call is detected, or to `END` if the response is final.\n\nAfter `music_tool_node` runs, an edge brings the flow back to `music_assistant`, allowing the LLM to process the tool’s output and continue reasoning.\n\nLet’s take a look at our graph:\n\n![Graph of our Agent](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*TL9b3DPuVn1xVKzPyel5qg.png)\n\n### Testing First Sub-agent\n\nNow, its time to test our first sub agent:\n\n```python\nimport uuid\nfrom langchain_core.messages import HumanMessage # Assuming HumanMessage is imported or defined elsewhere\n\n# Generate a unique thread ID for this conversation session\nthread_id = uuid.uuid4()\n\n# Define the user's question about music recommendations\nquestion = \"I like the Rolling Stones. What songs do you recommend by them or by other artists that I might like?\"\n\n# Set up configuration with the thread ID for maintaining conversation context\nconfig = {\"configurable\": {\"thread_id\": thread_id}}\n\n# Invoke the music catalog subagent with the user's question\n# The agent will use its tools to search for Rolling Stones music and provide recommendations\nresult = music_catalog_subagent.invoke({\"messages\": [HumanMessage(content=question)]}, config=config)\n\n# Display all messages from the conversation in a formatted way\nfor message in result[\"messages\"]:\n   message.pretty_print() # Assuming message.pretty_print() is a custom method\n```\n\n```text\n======= Human Message ======\n\nI like the Rolling Stones. What songs do you recommend by them or by\nother artists that I might like?\n\n======= Ai Message ======\n\nTool Calls:\n  get_tracks_by_artist (chatcmpl-tool-012bac57d6af46ddaad8e8971cca2bf7)\n Call ID: chatcmpl-tool-012bac57d6af46ddaad8e8971cca2bf7\n  Args:\n    artist: The Rolling Stones\n```\n\nSo, based on the human message which is our query, it responds with the correct tool `get_tracks_by_artist` which is responsible for finding recommendations based on the artist specified in our query.\n\nNow, that we have created our first sub agent let’s create our second sub agent.\n\n## Invoice Information Sub-agent Using Pre-built\n\nWhile building a ReAct agent from scratch is great for understanding the fundamentals, LangGraph also offers **pre-built libraries** for common architectures.\n\nAs it allow us to quickly set up standard patterns like ReAct without manually defining all nodes and edges. You can find a full list of these pre-built libraries in the [LangGraph documentation](https://langchain-ai.github.io/langgraph/how-tos/prebuilt/).\n\n![Invoice Information Sub-agent](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*fQ1T168pUtQBedYTJHAGiw.png)\n\nJust like before, we start by defining the specific tools and the prompt for our `invoice_information_subagent`. These tools will interact with the Chinook database to retrieve invoice details.\n\n```python\nfrom langchain_core.tools import tool\n\n@tool \ndef get_invoices_by_customer_sorted_by_date(customer_id: str) -\u003e list[dict]:\n    \"\"\"\n    Look up all invoices for a customer using their ID.\n    The invoices are sorted in descending order by invoice date, which helps when the customer wants to view their most recent/oldest invoice, or if \n    they want to view invoices within a specific date range.\n    \n    Args:\n        customer_id (str): customer_id, which serves as the identifier.\n    \n    Returns:\n        list[dict]: A list of invoices for the customer.\n    \"\"\"\n    return db.run(f\"SELECT * FROM Invoice WHERE CustomerId = {customer_id} ORDER BY InvoiceDate DESC;\")\n\n@tool \ndef get_invoices_sorted_by_unit_price(customer_id: str) -\u003e list[dict]:\n    \"\"\"\n    Use this tool when the customer wants to know the details of one of their invoices based on the unit price/cost of the invoice.\n    This tool looks up all invoices for a customer, and sorts the unit price from highest to lowest. In order to find the invoice associated with the customer, \n    we need to know the customer ID.\n    \n    Args:\n        customer_id (str): customer_id, which serves as the identifier.\n    \n    Returns:\n        list[dict]: A list of invoices sorted by unit price.\n    \"\"\"\n    query = f\"\"\"\n        SELECT Invoice.*, InvoiceLine.UnitPrice\n        FROM Invoice\n        JOIN InvoiceLine ON Invoice.InvoiceId = InvoiceLine.InvoiceId\n        WHERE Invoice.CustomerId = {customer_id}\n        ORDER BY InvoiceLine.UnitPrice DESC;\n    \"\"\"\n    return db.run(query)\n\n@tool\ndef get_employee_by_invoice_and_customer(invoice_id: str, customer_id: str) -\u003e dict:\n    \"\"\"\n    This tool will take in an invoice ID and a customer ID and return the employee information associated with the invoice.\n\n    Args:\n        invoice_id (int): The ID of the specific invoice.\n        customer_id (str): customer_id, which serves as the identifier.\n\n    Returns:\n        dict: Information about the employee associated with the invoice.\n    \"\"\"\n\n    query = f\"\"\"\n        SELECT Employee.FirstName, Employee.Title, Employee.Email\n        FROM Employee\n        JOIN Customer ON Customer.SupportRepId = Employee.EmployeeId\n        JOIN Invoice ON Invoice.CustomerId = Customer.CustomerId\n        WHERE Invoice.InvoiceId = ({invoice_id}) AND Invoice.CustomerId = ({customer_id});\n    \"\"\"\n    \n    employee_info = db.run(query, include_columns=True)\n    \n    if not employee_info:\n        return f\"No employee found for invoice ID {invoice_id} and customer identifier {customer_id}.\"\n    return employee_info\n```\n\nWe have defined three specialized tools for invoice handling:\n\n*   `get_invoices_by_customer_sorted_by_date`: Retrieves all invoices for a customer, sorted by date\n*   `get_invoices_sorted_by_unit_price`: Retrieves invoices sorted by the unit price of items within them\n*   `get_employee_by_invoice_and_customer`: Finds the support employee associated with a specific invoice\n\nAnd also after just like before we have to append all these tools into a list.\n\n```python\n# Create a list of all invoice-related tools for the agent\ninvoice_tools = [get_invoices_by_customer_sorted_by_date, get_invoices_sorted_by_unit_price, get_employee_by_invoice_and_customer]\n```\n\nNow, let’s define the prompt that will guide our invoice sub-agent’s behavior:\n\n```python\ninvoice_subagent_prompt = \"\"\"\n    You are a subagent among a team of assistants. You are specialized for retrieving and processing invoice information. You are routed for invoice-related portion of the questions, so only respond to them.. \n\n    You have access to three tools. These tools enable you to retrieve and process invoice information from the database. Here are the tools:\n    - get_invoices_by_customer_sorted_by_date: This tool retrieves all invoices for a customer, sorted by invoice date.\n    - get_invoices_sorted_by_unit_price: This tool retrieves all invoices for a customer, sorted by unit price.\n    - get_employee_by_invoice_and_customer: This tool retrieves the employee information associated with an invoice and a customer.\n    \n    If you are unable to retrieve the invoice information, inform the customer you are unable to retrieve the information, and ask if they would like to search for something else.\n    \n    CORE RESPONSIBILITIES:\n    - Retrieve and process invoice information from the database\n    - Provide detailed information about invoices, including customer details, invoice dates, total amounts, employees associated with the invoice, etc. when the customer asks for it.\n    - Always maintain a professional, friendly, and patient demeanor\n    \n    You may have additional context that you should use to help answer the customer's query. It will be provided to you below:\n    \"\"\"\n```\n\nThis prompt outlines the sub-agent’s role, its available tools, core responsibilities, and guidelines for handling cases where information isn’t found.\n\nThis targeted instruction helps the LLM act effectively within its specialized domain.\n\nNow, Instead of manually creating nodes and conditional edges for the ReAct pattern as we did with our previous sub agent, we will use LangGraph `create_react_agent` pre-built function.\n\n```python\nfrom langgraph.prebuilt import create_react_agent\n\n# Create the invoice information subagent using LangGraph's pre-built ReAct agent\n# This agent specializes in handling customer invoice queries and billing information\ninvoice_information_subagent = create_react_agent(\n    llm,                           # Language model for reasoning and responses\n    tools=invoice_tools,           # Invoice-specific tools for database queries\n    name=\"invoice_information_subagent\",  # Unique identifier for the agent\n    prompt=invoice_subagent_prompt,       # System instructions for invoice handling\n    state_schema=State,            # State schema for data flow between nodes\n    checkpointer=checkpointer,     # Short-term memory for conversation context\n    store=in_memory_store         # Long-term memory store for persistent data\n)\n```\n\nThe `create_react_agent` function takes our `llm`, the `invoice_tools`, a name for the agent (important for multi-agent routing), the prompt we just defined, our custom `State` schema, and hooks up the checkpointer and store for memory.\n\nWith just few lines, we have a fully functional ReAct agent, this is the advantage we have using LangGraph.\n\n### Testing Second Sub-agent\n\nLet’s test our new `invoice_information_subagent` to ensure it works as expected. We'll provide a query that requires fetching invoice and employee information.\n\n```python\n# Generate a unique thread ID for this conversation session\nthread_id = uuid.uuid4()\n\n# Define the user's question about their recent invoice and employee assistance\nquestion = \"My customer id is 1. What was my most recent invoice, and who was the employee that helped me with it?\"\n\n# Set up configuration with the thread ID for maintaining conversation context\nconfig = {\"configurable\": {\"thread_id\": thread_id}}\n\n# Invoke the invoice information subagent with the user's question\n# The agent will use its tools to search for invoice information and employee details\nresult = invoice_information_subagent.invoke({\"messages\": [HumanMessage(content=question)]}, config=config)\n\n# Display all messages from the conversation in a formatted way\nfor message in result[\"messages\"]:\n    message.pretty_print()\n```\n\n```text\n======= Human Message ======\n\nMy customer id is 1. What was my most recent invoice, and who\nwas the employee that helped me with it?\n\n======= Ai Message ======\n\nName: invoice_information_subagent\nTool Calls:\n  get_invoices_by_customer_sorted_by_date (chatcmpl-tool-8f3cc6f6ef41454099eaae576409bfe2)\n Call ID: chatcmpl-tool-8f3cc6f6ef41454099eaae576409bfe2\n  Args:\n    customer_id: 1\n```\n\nIt prints the correct tool based on our query, and the output is pretty much the same as we saw earlier with our first sub-agent that we manually created, with all the correct arguments fetched from the query.\n\nSo, we have created two sub-agents, now we can move on to creating the Multi-Agent architecture. Let’s do that.\n\n## Creating Multi-Agent Using Supervisor\n\nWe have two sub-agents: one for music questions and one for invoices. A natural question arises:\n\n\u003e **How do we ensure customer tasks are appropriately routed to the correct sub-agent?**\n\nThis is where the concept of a **Supervisor Agent** comes into play. It routes customer requests to the right sub-agent based on the query. After a sub-agent finishes, control goes back to the supervisor or can be passed to another sub-agent.\n\nA supervisor-based multi-agent architecture brings key benefits:\n\n![Supervisor](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*qQyV2aH2hDJGIC4Z5HyCVg.png)\n\n*   Each sub-agent focuses on a specific domain, improving accuracy and making it easy to add new agents.\n*   Agents can be added, removed, or updated without impacting the whole system, supporting scalability.\n*   Limiting LLMs to specific tasks lowers the chance of wrong or irrelevant outputs.\n\nWe will use LangGraph built-in supervisor library to quickly build this multi-agent setup.\n\nFirst, we will create a set of instructions for our supervisor. This prompt will define its role, inform it about the available sub-agents and their capabilities, and guide its decision-making process for routing.\n\n```python\nsupervisor_prompt = \"\"\"You are an expert customer support assistant for a digital music store. \nYou are dedicated to providing exceptional service and ensuring customer queries are answered thoroughly. \nYou have a team of subagents that you can use to help answer queries from customers. \nYour primary role is to serve as a supervisor/planner for this multi-agent team that helps answer queries from customers. \n\nYour team is composed of two subagents that you can use to help answer the customer's request:\n1. music_catalog_information_subagent: this subagent has access to user's saved music preferences. It can also retrieve information about the digital music store's music \ncatalog (albums, tracks, songs, etc.) from the database. \n3. invoice_information_subagent: this subagent is able to retrieve information about a customer's past purchases or invoices \nfrom the database. \n\nBased on the existing steps that have been taken in the messages, your role is to generate the next subagent that needs to be called. \nThis could be one step in an inquiry that needs multiple sub-agent calls. \"\"\"\n```\n\nThis supervisor prompt defines its role as a router and planner, understanding what the `music_catalog_information_subagent` and `invoice_information_subagent` can do, and deciding which one to call next.\n\nNow, let’s put our supervisor to work using the `create_supervisor` function from LangGraph pre-built.\n\n```python\nfrom langgraph_supervisor import create_supervisor # Assuming this is a custom or pre-built library\n\n# Create supervisor workflow using LangGraph's pre-built supervisor\n# The supervisor coordinates between multiple subagents based on the incoming queries\nsupervisor_prebuilt_workflow = create_supervisor(\n    agents=[invoice_information_subagent, music_catalog_subagent],  # List of subagents to supervise\n    output_mode=\"last_message\",  # Return only the final response (alternative: \"full_history\")\n    model=llm,  # Language model for supervisor reasoning and routing decisions\n    prompt=(supervisor_prompt),  # System instructions for the supervisor agent\n    state_schema=State  # State schema defining data flow structure\n)\n\n# Compile the supervisor workflow with memory components\n# - checkpointer: Enables short-term memory within conversation threads\n# - store: Provides long-term memory storage across conversations\nsupervisor_prebuilt = supervisor_prebuilt_workflow.compile(\n    name=\"music_catalog_subagent\", \n    checkpointer=checkpointer, \n    store=in_memory_store\n)\n\n# Display the compiled supervisor graph structure\nshow_graph(supervisor_prebuilt)\n```\n\nWe provide it with our list of sub-agents, set the `output_mode` to return only the last message from the active sub-agent, specify our LLM model, supply the supervisor prompt, and connect our State schema.\n\nLet’s see what our supervisor architecture looks like:\n\n![Supervisor Architecture](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*VLi1cXxCdxNtxBcRX4RVlQ.png)\n\nAs I said earlier supervisor is comprised of our two sub agent that we defined earlier as they will act according to supervisor prompt we described.\n\n### Testing our Multi-agent Architecture\n\nLet’s test our supervisor based multi-agent architecture and see how it goes.\n\n```python\n# Generate a unique thread ID for this conversation session\nthread_id = uuid.uuid4()\n\n# Define a question that tests both invoice and music catalog capabilities\nquestion = \"My customer ID is 1. How much was my most recent purchase? What albums do you have by U2?\"\n\n# Set up configuration with the thread ID for maintaining conversation context\nconfig = {\"configurable\": {\"thread_id\": thread_id}}\n\n# Invoke the supervisor workflow with the multi-part question\n# The supervisor will route to appropriate subagents for invoice and music queries\nresult = supervisor_prebuilt.invoke({\"messages\": [HumanMessage(content=question)]}, config=config)\n\n# Display all messages from the conversation in a formatted way\nfor message in result[\"messages\"]:\n    message.pretty_print()\n```\n\n```text\n================================ [1m Human Message  [0m=================================\n\nMy customer ID is 1. How much was my most recent purchase? What albums do you have by U2?\n================================== [1m Ai Message  [0m==================================\nName: supervisor\nTool Calls:\n  transfer_to_invoice_information_subagent (chatcmpl-tool-bece02300e1845dea927ce0e505e1f7f)\n Call ID: chatcmpl-tool-bece02300e1845dea927ce0e505e1f7f\n  Args:\n================================= [1m Tool Message  [0m=================================\nName: transfer_to_invoice_information_subagent\n\nSuccessfully transferred to invoice_information_subagent\n================================== [1m Ai Message  [0m==================================\nName: invoice_information_subagent\n\nYour most recent purchase was on '2025-08-07 00:00:00' and the total amount was $8.91. Unfortunately, I am unable to provide information about U2 albums as it is not related to invoice information. Would you like to search for something else?\n================================== [1m Ai Message  [0m==================================\nName: invoice_information_subagent\n\nTransferring back to supervisor\nTool Calls:\n  transfer_back_to_supervisor (9f3d9fce-0f11-43c0-88c4-adcd459a30a0)\n Call ID: chatcmpl-tool-9f3d9fce-0f11-43c0-88c4-adcd459a30a0\n  Args:\n================================= [1m Tool Message  [0m=================================\nName: transfer_back_to_supervisor\n\nSuccessfully transferred back to supervisor\n================================== [1m Ai Message  [0m==================================\nName: supervisor\nTool Calls:\n  transfer_to_music_catalog_information_subagent (chatcmpl-tool-72475cf0c17f404583145912fca0b718)\n Call ID: chatcmpl-tool-72475cf0c17f404583145912fca0b718\n  Args:\n================================= [1m Tool Message  [0m=================================\nName: transfer_to_music_catalog_information_subagent\n\nError: transfer_to_music_catalog_information_subagent is not a valid tool, try one of [transfer_to_music_catalog_subagent, transfer_to_invoice_information_subagent].\n================================== [1m Ai Message  [0m==================================\nName: supervisor\nTool Calls:\n  transfer_to_music_catalog_subagent (chatcmpl-tool-71cc764428ff4efeb0ba7bf24b64a6ec)\n Call ID: chatcmpl-tool-71cc764428ff4efeb0ba7bf24b64a6ec\n  Args:\n================================= [1m Tool Message  [0m=================================\nName: transfer_to_music_catalog_subagent\n\nSuccessfully transferred to music_catalog_subagent\n================================== [1m Ai Message  [0m==================================\n\nU2 has the following albums in our catalog: \n1. Achtung Baby\n2. All That You Can't Leave Behind\n3. B-Sides 1980-1990\n4. How To Dismantle An Atomic Bomb\n5. Pop\n6. Rattle And Hum\n7. The Best Of 1980-1990\n8. War\n9. Zooropa\n10. Instant Karma: The Amnesty International Campaign to Save Darfur\n\nWould you like to explore more music or is there something else I can help you with?\n================================== [1m Ai Message  [0m==================================\nName: music_catalog_subagent\n\nTransferring back to supervisor\nTool Calls:\n  transfer_back_to_supervisor (4739ce04-dd11-47c8-b35a-9e4fca21b0c1)\n Call ID: chatcmpl-tool-4739ce04-dd11-47c8-b35a-9e4fca21b0c1\n  Args:\n================================= [1m Tool Message  [0m=================================\nName: transfer_back_to_supervisor\n\nSuccessfully transferred back to supervisor\n================================== [1m Ai Message  [0m==================================\nName: supervisor\n\nI hope this information helps you with your inquiry. Is there anything else I can help you with?\n```\n\nThere is a lot happening around, which is great our multi agent is having a very detailed conversation with our user. Let’s understand this.\n\nIn this example, the user asks a question involving both invoice details and music catalog data. Here’s what happens:\n1.  The supervisor receives the query.\n2.  It detects the invoice-related part (“most recent purchase”) and sends it to the `invoice_information_subagent`.\n3.  The invoice sub-agent processes that part, fetches the invoice, but can’t answer the U2 albums question, so it hands control back to the supervisor.\n4.  The supervisor then routes the remaining music query to the `music_catalog_subagent`.\n5.  The music sub-agent retrieves the U2 albums info and returns control to the supervisor.\n6.  The supervisor wraps up, having coordinated both sub-agents to fully answer the user’s multi-part question.\n\n## Adding Human-in-the-Loop\n\nSo far we have built a multi-agent system that routes customer queries to specialized sub-agents. However, in a real-world customer support scenario, we don’t always have the customer_id readily available.\n\nBefore allowing an agent to access sensitive information like invoice history, we typically need to **verify the customer’s identity**.\n\n![Human in the loop](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*hDKse0pMkf95Rf84TyQAAQ.png)\n\nIn this step, we will enhance our workflow by adding a customer verification layer. This will involve a **human-in-the-loop** component, where the system might pause and prompt the customer to provide their account information if it’s missing or unverified.\n\nTo implement this, we introduce two new nodes:\n\n1.  **verify_info node** attempts to extract and verify customer identification (ID, email, or phone) from the user input using our database.\n2.  **human_input node** is triggered if verification fails. It pauses the graph and prompts the user for the missing information. This is easily handled using LangGraph `interrupt()` feature.\n\nFirst, let’s define a Pydantic schema for parsing user input and a system prompt for an LLM to extract this information reliably.\n\n```python\nfrom pydantic import BaseModel, Field\n\nclass UserInput(BaseModel):\n    \"\"\"Schema for parsing user-provided account information.\"\"\"\n    identifier: str = Field(description=\"Identifier, which can be a customer ID, email, or phone number.\")\n\n# Create a structured LLM that outputs responses conforming to the UserInput schema\nstructured_llm = llm.with_structured_output(schema=UserInput)\n\n# System prompt for extracting customer identifier information\nstructured_system_prompt = \"\"\"You are a customer service representative responsible for extracting customer identifier.\nOnly extract the customer's account information from the message history. \nIf they haven't provided the information yet, return an empty string for the identifier.\"\"\"\n```\n\nThe `UserInput` Pydantic model defines the expected data as a single identifier.\n\nWe use `with_structured_output()` to make the LLM return JSON in this format. A system prompt helps the LLM focus only on extracting the identifier.\n\nNext, we need a helper function to take the extracted identifier (which could be a customer ID, phone number, or email) and look it up in our Chinook database to retrieve the actual `customer_id`.\n\n```python\nfrom typing import Optional \n\n# Helper function for customer identification\ndef get_customer_id_from_identifier(identifier: str) -\u003e Optional[int]:\n    \"\"\"\n    Retrieve Customer ID using an identifier, which can be a customer ID, email, or phone number.\n    \n    This function supports three types of identifiers:\n    1. Direct customer ID (numeric string)\n    2. Phone number (starts with '+')\n    3. Email address (contains '@')\n    \n    Args:\n        identifier (str): The identifier can be customer ID, email, or phone number.\n    \n    Returns:\n        Optional[int]: The CustomerId if found, otherwise None.\n    \"\"\"\n    # Check if identifier is a direct customer ID (numeric)\n    if identifier.isdigit():\n        return int(identifier)\n    \n    # Check if identifier is a phone number (starts with '+')\n    elif identifier[0] == \"+\":\n        query = f\"SELECT CustomerId FROM Customer WHERE Phone = '{identifier}';\"\n        result = db.run(query)\n        formatted_result = ast.literal_eval(result)\n        if formatted_result:\n            return formatted_result[0][0]\n    \n    # Check if identifier is an email address (contains '@')\n    elif \"@\" in identifier:\n        query = f\"SELECT CustomerId FROM Customer WHERE Email = '{identifier}';\"\n        result = db.run(query)\n        formatted_result = ast.literal_eval(result)\n        if formatted_result:\n            return formatted_result[0][0]\n    \n    # Return None if no match found\n    return None \n```\n\nThis utility function tries to interpret the provided identifier as a customer ID, phone number, or email, then queries the database to find the corresponding numeric `CustomerId`.\n\nNow, we define our `verify_info` node. This node orchestrates the identifier extraction and verification process.\n\n```python\ndef verify_info(state: State, config: RunnableConfig):\n    \"\"\"\n    Verify the customer's account by parsing their input and matching it with the database.\n    \n    This node handles customer identity verification as the first step in the support process.\n    It extracts customer identifiers (ID, email, or phone) from user messages and validates\n    them against the database.\n    \n    Args:\n        state (State): Current state containing messages and potentially customer_id\n        config (RunnableConfig): Configuration for the runnable execution\n        \n    Returns:\n        dict: Updated state with customer_id if verified, or request for more info\n    \"\"\"\n    # Only verify if customer_id is not already set\n    if state.get(\"customer_id\") is None: \n        # System instructions for prompting customer verification\n        system_instructions = \"\"\"You are a music store agent, where you are trying to verify the customer identity \n        as the first step of the customer support process. \n        Only after their account is verified, you would be able to support them on resolving the issue. \n        In order to verify their identity, one of their customer ID, email, or phone number needs to be provided.\n        If the customer has not provided the information yet, please ask them for it.\n        If they have provided the identifier but cannot be found, please ask them to revise it.\"\"\"\n\n        # Get the most recent user message\n        user_input = state[\"messages\"][-1] \n    \n        # Use structured LLM to parse customer identifier from the message\n        parsed_info = structured_llm.invoke([SystemMessage(content=structured_system_prompt)] + [user_input])\n    \n        # Extract the identifier from parsed response\n        identifier = parsed_info.identifier\n    \n        # Initialize customer_id as empty\n        customer_id = \"\"\n        \n        # Attempt to find the customer ID using the provided identifier\n        if (identifier):\n            customer_id = get_customer_id_from_identifier(identifier)\n    \n        # If customer found, confirm verification and set customer_id in state\n        if customer_id != \"\":\n            intent_message = SystemMessage(\n                content= f\"Thank you for providing your information! I was able to verify your account with customer id {customer_id}.\"\n            )\n            return {\n                  \"customer_id\": customer_id,\n                  \"messages\" : [intent_message]\n                  }\n        else:\n            # If customer not found, ask for correct information\n            response = llm.invoke([SystemMessage(content=system_instructions)]+state['messages'])\n            return {\"messages\": [response]}\n\n    else: \n        # Customer already verified, no action needed\n        pass\n```\n\nSo this `verify_info` node first checks if `customer_id` is already in the State. If not, it uses the `structured_llm` to extract an identifier from `user_input` and validates it with `get_customer_id_from_identifier`.\n\nIf valid, it updates the State and confirms with a message. If not, it uses the main LLM and system instructions to politely ask the user for their info.\n\nNow, let’s create our `human_input` node. This node acts as a placeholder that triggers `interrupt()` in the graph, pausing execution to wait for user input. This is important for human-in-the-loop interactions, allowing the agent to directly request missing information.\n\n```python\nfrom langgraph.types import interrupt\n\ndef human_input(state: State, config: RunnableConfig):\n    \"\"\"\n    Human-in-the-loop node that interrupts the workflow to request user input.\n    \n    This node creates an interruption point in the workflow, allowing the system\n    to pause and wait for human input before continuing. It's typically used\n    for customer verification or when additional information is needed.\n    \n    Args:\n        state (State): Current state containing messages and workflow data\n        config (RunnableConfig): Configuration for the runnable execution\n        \n    Returns:\n        dict: Updated state with the user's input message\n    \"\"\"\n    # Interrupt the workflow and prompt for user input\n    user_input = interrupt(\"Please provide input.\")\n    \n    # Return the user input as a new message in the state\n    return {\"messages\": [user_input]}\n```\n\nThe `interrupt()` function is a powerful LangGraph feature. When executed, it pauses the graph's execution and signals that human intervention is required.\n\nThe `run_graph` function (which we will update later for evaluation) will need to handle this interrupt by providing new input to resume the graph.\n\nNow, we just need to put this together. We define a new conditional edge (`should_interrupt`) that routes to the `human_input` node if the `customer_id` is not yet verified.\n\nOtherwise, it allows the flow to continue to the main supervisor agent.\n\n```python\n# Conditional edge: should_interrupt\ndef should_interrupt(state: State, config: RunnableConfig):\n    \"\"\"\n    Determines whether the workflow should interrupt and ask for human input.\n    \n    If the customer_id is present in the state (meaning verification is complete),\n    the workflow continues. Otherwise, it interrupts to get human input for verification.\n    \"\"\"\n    if state.get(\"customer_id\") is not None:\n        return \"continue\" # Customer ID is verified, continue to the next step (supervisor)\n    else:\n        return \"interrupt\" # Customer ID is not verified, interrupt for human input\n```\n\nNow, let’s integrate these new nodes and edges into our overall graph:\n\n```python\n# Create a new StateGraph instance for the multi-agent workflow with verification\nmulti_agent_verify = StateGraph(State)\n\n# Add new nodes for customer verification and human interaction\nmulti_agent_verify.add_node(\"verify_info\", verify_info)\nmulti_agent_verify.add_node(\"human_input\", human_input)\n# Add the existing supervisor agent as a node\nmulti_agent_verify.add_node(\"supervisor\", supervisor_prebuilt)\n\n# Define the graph's entry point: always start with information verification\nmulti_agent_verify.add_edge(START, \"verify_info\")\n\n# Add a conditional edge from verify_info to decide whether to continue or interrupt\nmulti_agent_verify.add_conditional_edges(\n    \"verify_info\",\n    should_interrupt, # The function that checks if customer_id is verified\n    {\n        \"continue\": \"supervisor\", # If verified, proceed to the supervisor\n        \"interrupt\": \"human_input\", # If not verified, interrupt for human input\n    },\n)\n# After human input, always loop back to verify_info to re-attempt verification\nmulti_agent_verify.add_edge(\"human_input\", \"verify_info\")\n# After the supervisor completes its task, the workflow ends\nmulti_agent_verify.add_edge(\"supervisor\", END)\n\n# Compile the complete graph with checkpointer and long-term memory store\nmulti_agent_verify_graph = multi_agent_verify.compile(\n    name=\"multi_agent_verify\", \n    checkpointer=checkpointer, \n    store=in_memory_store\n)\n\n# Display the updated graph structure\nshow_graph(multi_agent_verify_graph)\n```\n\n![Added Human in the loop](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*GB6HY39kFzZNPlnwP92C-g.png)\n\nThe new graph starts at `verify_info`. If verification succeeds, it moves to the `supervisor`. If not, it routes to `human_input`, which interrupts the flow and waits for user input.\n\nOnce input is provided, it loops back to `verify_info` to try again. The `supervisor` is the final processing step before reaching `END`. The `show_graph` function will visually display this verification loop.\n\nLet’s test it out! First, we’ll ask a question *without* providing any identification.\n\n```python\nthread_id = uuid.uuid4()\nquestion = \"How much was my most recent purchase?\"\nconfig = {\"configurable\": {\"thread_id\": thread_id}}\n\nresult = multi_agent_verify_graph.invoke({\"messages\": [HumanMessage(content=question)]}, config=config)\nfor message in result[\"messages\"]:\n    message.pretty_print()\n```\n\n```text\n### OUTPUT ###\n======== Human Message =======\n\nHow much was my most recent purchase?\n\n======== Ai Message ==========\n\nBefore I can look up your most recent purchase,\nI need to verify your identity. Could you please provide your\ncustomer ID, email, or phone number associated with your account?\nThis will help me to access your information and assist you\nwith your query.\n```\n\nAs expected, the agent will interrupt and ask for your customer ID, email, or phone number because the `customer_id` is initially `None` in the state.\n\nNow, let's resume the conversation and provide the requested information. LangGraph `invoke` method can accept a `Command(resume=...)` to pick up from an interrupt.\n\n```python\nfrom langgraph.types import Command\n\n# Resume from the interrupt, providing the phone number for verification\nquestion = \"My phone number is +55 (12) 3923-5555.\"\nresult = multi_agent_verify_graph.invoke(Command(resume=question), config=config)\nfor message in result[\"messages\"]:\n    message.pretty_print()\n```\n\n```text\n### OUTPUT ###\n======= Human Message =========\n\nHow much was my most recent purchase?\n\n=========== Ai Message =======\nBefore I can look up your most recent purchase, I need to verify your identity. Could you please provide your customer ID, email, or phone number associated with your account? This will help me to access your information and assist you with your query.\n\n========== Human Message ===========\n\nMy phone number is +55 (12) 3923-5555.\n\n============ System Message =======\n\nThank you for providing your information! I was able to verify your account with customer id 1.\n\n========== Ai Message ==========\u003c\nName: supervisor\n\n{\"type\": \"function\", \"function\": {\"name\": \"transfer_to_invoice_information_subagent\", \"parameters\": {}}}\n```\n\nAfter the user provides their phone number, the `verify_info` node successfully identifies the `customer_id` (which is `1` for this number in the Chinook database).\n\nIt confirms the verification and, as defined in our graph, passes control to the `supervisor`, which then routes the original query.\n\n\u003e This confirms that our human-in-the-loop verification mechanism works as intended!\n\nA key advantage of LangGraph state management is that once `customer_id` is verified and saved in the State, it persists throughout the conversation.\n\nThis means the agent won’t ask for verification again in follow-up questions within the same thread.\n\nLet’s test this persistence by asking a follow-up question without re-providing the ID:\n\n```python\nquestion = \"What albums do you have by the Rolling Stones?\"\nresult = multi_agent_verify_graph.invoke({\"messages\": [HumanMessage(content=question)]}, config=config)\nfor message in result[\"messages\"]:\n    message.pretty_print()\n```\n\n```text\n### OUTPUT ###\n=== Human Message ===\nHow much was my most recent purchase?\n\n=== Ai Message ===\nBefore I can look up your most recent purchase, I need to verify your identity. Could you please provide your customer ID, email, or phone number associated with your account?\n\n=== Human Message ===\nMy phone number is +55 (12) 3923-5555.\n\n=== System Message ===\nThank you for providing your information! I was able to verify your account with customer id 1.\n\n=== Ai Message ===\nName: supervisor\n{\"type\": \"function\", \"function\": {\"name\": \"transfer_to_invoice_information_subagent\", \"parameters\": {}}}\n\n=== Human Message ===\nWhat albums do you have by the Rolling Stones?\n\n=== Ai Message ===\nName: supervisor\n{\"type\": \"function\", \"function\": {\"name\": \"transfer_to_music_catalog_subagent\", \"parameters\": {}}}\n```\n\nNotice that the `verify_info` node doesn't re-prompt for identification. Since `state.get(\"customer_id\")` is already set to `1`, it immediately moves to the `supervisor`, which routes the query to the `music_catalog_subagent`.\n\nThis shows how State maintains context and avoids repeating steps, improving the user experience.\n\n## Adding Long-Term Memory\n\nWe’ve already initialized our InMemoryStore for **long-term memory** in the **“Setting up Short-Term and Long-Term Memory”** section.\n\n![Long term memory](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*dFMx7x12urcFslCKBquWWQ.png)\n\nNow, it’s time to fully integrate it into our multi-agent workflow. Long-term memory is incredibly powerful because it allows the agent to recall and leverage information from past conversations, leading to more personalized and context-aware interactions over time.\n\nIn this step, we add two new nodes to handle long-term memory:\n\n*   **load_memory** retrieves the user’s existing preferences from the `in_memory_store` at the start of the conversation (after verification).\n*   **create_memory** saves any new music interests shared by the user during the conversation to the `in_memory_store` for future use.\n\nFirst, a helper function to format the user’s stored music preferences into a readable string that can be easily injected into an LLM’s prompt.\n\n```python\nfrom langgraph.store.base import BaseStore\n\n# Helper function to format user memory data for LLM prompts\ndef format_user_memory(user_data):\n    \"\"\"Formats music preferences from users, if available.\"\"\"\n    # Access the 'memory' key which holds the UserProfile object\n    profile = user_data['memory'] \n    result = \"\"\n    # Check if music_preferences attribute exists and is not empty\n    if hasattr(profile, 'music_preferences') and profile.music_preferences:\n        result += f\"Music Preferences: {', '.join(profile.music_preferences)}\"\n    return result.strip()\n\n# Node: load_memory\ndef load_memory(state: State, config: RunnableConfig, store: BaseStore):\n    \"\"\"\n    Loads music preferences from the long-term memory store for a given user.\n    \n    This node fetches previously saved user preferences to provide context\n    for the current conversation, enabling personalized responses.\n    \"\"\"\n    # Get the user_id from the configurable part of the config\n    # In our evaluation setup, we might pass user_id via config\n    user_id = config[\"configurable\"].get(\"user_id\", state[\"customer_id\"]) # Use customer_id if user_id not in config\n    \n    # Define the namespace and key for accessing memory in the store\n    namespace = (\"memory_profile\", user_id)\n    key = \"user_memory\"\n    \n    # Retrieve existing memory for the user\n    existing_memory = store.get(namespace, key)\n    formatted_memory = \"\"\n    \n    # Format the retrieved memory if it exists and has content\n    if existing_memory and existing_memory.value:\n        formatted_memory = format_user_memory(existing_memory.value)\n\n    # Update the state with the loaded and formatted memory\n    return {\"loaded_memory\": formatted_memory}\n```\n\nThe `load_memory` node uses the `user_id` (from config or state) to build a namespace key and fetch existing `user_memory` from the `in_memory_store`.\n\nIt formats this memory and updates the `loaded_memory` field in the State. This memory is then included in the `music_assistant` prompt, as set up in `generate_music_assistant_prompt`.\n\nNext, we need a Pydantic schema to structure the user’s profile for saving to memory.\n\n```python\n# Pydantic model to define the structure of the user profile for memory storage\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\nclass UserProfile(BaseModel):\n    customer_id: str = Field(\n        description=\"The customer ID of the customer\"\n    )\n    music_preferences: List[str] = Field(\n        description=\"The music preferences of the customer\"\n    )\n```\n\nNow, we define the `create_memory` node. This node will use an LLM-as-a-judge pattern to analyze the conversation history and existing memory, then update the `UserProfile` with any newly identified music interests.\n\n```python\n# Prompt for the create_memory agent, guiding it to update user memory\ncreate_memory_prompt = \"\"\"You are an expert analyst that is observing a conversation that has taken place between a customer and a customer support assistant. The customer support assistant works for a digital music store, and has utilized a multi-agent team to answer the customer's request. \nYou are tasked with analyzing the conversation that has taken place between the customer and the customer support assistant, and updating the memory profile associated with the customer. The memory profile may be empty. If it's empty, you should create a new memory profile for the customer.\n\nYou specifically care about saving any music interest the customer has shared about themselves, particularly their music preferences to their memory profile.\n\nTo help you with this task, I have attached the conversation that has taken place between the customer and the customer support assistant below, as well as the existing memory profile associated with the customer that you should either update or create. \n\nThe customer's memory profile should have the following fields:\n- customer_id: the customer ID of the customer\n- music_preferences: the music preferences of the customer\n\nThese are the fields you should keep track of and update in the memory profile. If there has been no new information shared by the customer, you should not update the memory profile. It is completely okay if you do not have new information to update the memory profile with. In that case, just leave the values as they are.\n\n*IMPORTANT INFORMATION BELOW*\n\nThe conversation between the customer and the customer support assistant that you should analyze is as follows:\n{conversation}\n\nThe existing memory profile associated with the customer that you should either update or create based on the conversation is as follows:\n{memory_profile}\n\nEnsure your response is an object that has the following fields:\n- customer_id: the customer ID of the customer\n- music_preferences: the music preferences of the customer\n\nFor each key in the object, if there is no new information, do not update the value, just keep the value that is already there. If there is new information, update the value. \n\nTake a deep breath and think carefully before responding.\n\"\"\"\n```\n\nSo we have define the memory prompt. Let’s create the memory node function.\n\n```python\n# Node: create_memory\nfrom langchain_core.messages import SystemMessage\n\ndef create_memory(state: State, config: RunnableConfig, store: BaseStore):\n    \"\"\"\n    Analyzes conversation history and updates the user's long-term memory profile.\n    \n    This node extracts new music preferences shared by the customer during the\n    conversation and persists them in the InMemoryStore for future interactions.\n    \"\"\"\n    # Get the user_id from the configurable part of the config or from the state\n    user_id = str(config[\"configurable\"].get(\"user_id\", state[\"customer_id\"]))\n    \n    # Define the namespace and key for the memory profile\n    namespace = (\"memory_profile\", user_id)\n    key = \"user_memory\"\n    \n    # Retrieve the existing memory profile for the user\n    existing_memory = store.get(namespace, key)\n    \n    # Format the existing memory for the LLM prompt\n    formatted_memory = \"\"\n    if existing_memory and existing_memory.value:\n        existing_memory_dict = existing_memory.value\n        # Ensure 'music_preferences' is treated as a list, even if it might be missing or None\n        music_prefs = existing_memory_dict.get('music_preferences', [])\n        if music_prefs:\n            formatted_memory = f\"Music Preferences: {', '.join(music_prefs)}\"\n    \n    # Prepare the system message for the LLM to update memory\n    formatted_system_message = SystemMessage(content=create_memory_prompt.format(\n        conversation=state[\"messages\"], \n        memory_profile=formatted_memory\n    ))\n    \n    # Invoke the LLM with the UserProfile schema to get structured updated memory\n    updated_memory = llm.with_structured_output(UserProfile).invoke([formatted_system_message])\n    \n    # Store the updated memory profile\n    store.put(namespace, key, {\"memory\": updated_memory})\n```\n\nThe `create_memory` node retrieves the current user memory from the store, formats it, and sends it along with the full conversation (`state[\"messages\"]`) to the LLM.\n\nThe LLM extracts new music preferences into a `UserProfile` object, merging them with existing data. The updated memory is then saved back to the `in_memory_store` using `store.put()`.\n\nLet’s integrate the memory nodes into our graph:\n\n*   The `load_memory` node runs right after verification to load user preferences.\n*   The `create_memory` node runs just before the graph ends, saving any updates.\n\nThis make sure that memory is loaded at the start and saved at the end of each interaction.\n\n```python\nmulti_agent_final = StateGraph(State)\n\n# Add all existing and new nodes to the graph\nmulti_agent_final.add_node(\"verify_info\", verify_info)\nmulti_agent_final.add_node(\"human_input\", human_input)\nmulti_agent_final.add_node(\"load_memory\", load_memory)\nmulti_agent_final.add_node(\"supervisor\", supervisor_prebuilt) # Our supervisor agent\nmulti_agent_final.add_node(\"create_memory\", create_memory)\n\n# Define the graph's entry point: always start with information verification\nmulti_agent_final.add_edge(START, \"verify_info\")\n\n# Conditional routing after verification: interrupt if needed, else load memory\nmulti_agent_final.add_conditional_edges(\n    \"verify_info\",\n    should_interrupt, # Checks if customer_id is verified\n    {\n        \"continue\": \"load_memory\", # If verified, proceed to load long-term memory\n        \"interrupt\": \"human_input\", # If not verified, interrupt for human input\n    },\n)\n# After human input, loop back to verify_info\nmulti_agent_final.add_edge(\"human_input\", \"verify_info\")\n# After loading memory, pass control to the supervisor\nmulti_agent_final.add_edge(\"load_memory\", \"supervisor\")\n# After supervisor completes, save any new memory\nmulti_agent_final.add_edge(\"supervisor\", \"create_memory\")\n# After creating/updating memory, the workflow ends\nmulti_agent_final.add_edge(\"create_memory\", END)\n\n# Compile the final graph with all components\nmulti_agent_final_graph = multi_agent_final.compile(\n    name=\"multi_agent_verify\", \n    checkpointer=checkpointer, \n    store=in_memory_store\n)\n\n# Display the complete graph structure\nshow_graph(multi_agent_final_graph)\n```\n\nOur Long memory integrated agent visuals is this:\n\n![Long Memory Multi Agent Flow](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*fe_qkWp5Yl53Di_B1Ukbpw.png)\n\nThe `show_graph` output now shows the complete, sophisticated workflow: START -\u003e `verify_info` (with a loop to `human_input` if needed) -\u003e `load_memory` -\u003e `supervisor` (which internally orchestrates sub-agents) -\u003e `create_memory` -\u003e END.\n\nThis architecture combines verification, multi-agent routing, and long-term personalization.\n\n### Testing our Long-term Memory Multi-agent\n\nLet’s test this fully integrated graph! We will give it a complex query, including an identifier for verification and a music preference to be saved.\n\n```python\nthread_id = uuid.uuid4()\n\nquestion = \"My phone number is +55 (12) 3923-5555. How much was my most recent purchase? What albums do you have by the Rolling Stones?\"\nconfig = {\"configurable\": {\"thread_id\": thread_id}}\n\nresult = multi_agent_final_graph.invoke({\"messages\": [HumanMessage(content=question)]}, config=config)\nfor message in result[\"messages\"]:\n    message.pretty_print()\n```\n\n```text\n=== Human Message ===\n\nMy phone number is +55 (12) 3923-5555. How much was my most recent purchase? What albums do you have by the Rolling Stones?\n\n=== System Message ===\n\nThank you for providing your information! I was able to verify your account with customer id 1.\n\n=== Ai Message ===\n\nName: supervisor\nTool Calls:\ntransfer_to_invoice_information_subagent\n\n=== Tool Message ===\n\nName: transfer_to_invoice_information_subagent\n\nSuccessfully transferred to invoice_information_subagent\n\n=== Ai Message ===\n\nName: invoice_information_subagent\n\nYour most recent purchase was on August 7, 2025, and the total amount was $8.91. I am unable to provide information about albums by the Rolling Stones. Would you like to search for something else?\n\n=== Ai Message ===\n\nName: invoice_information_subagent\nTool Calls:\ntransfer_back_to_supervisor\n\n=== Tool Message ===\n\nName: transfer_back_to_supervisor\n\nSuccessfully transferred back to supervisor\n\n=== Ai Message ===\n\nName: supervisor\nTool Calls:\ntransfer_to_music_catalog_subagent\n\n=== Tool Message ===\n\nName: transfer_to_music_catalog_subagent\n\nSuccessfully transferred to music_catalog_subagent\n\n=== Ai Message ===\n\nThe Rolling Stones have several albums available, including \"Hot Rocks, 1964-1971 (Disc 1)\", \"No Security\", and \"Voodoo Lounge\". Would you like to explore more music or purchase one of these albums?\n\n=== Ai Message ===\n\nName: music_catalog_subagent\nTool Calls:\ntransfer_back_to_supervisor\n\n=== Tool Message ===\n\nName: transfer_back_to_supervisor\n\nSuccessfully transferred back to supervisor\n\n=== Ai Message ===\n\nName: supervisor\n\nIs there anything else I can help you with?\n```\n\nThis interaction shows the full flow:\n\n*   **Verification:** `verify_info` extracts the phone number, gets `customer_id = 1`, and updates the state.\n*   **Load Memory:** `load_memory` runs next. Since it's likely the first session, it loads \"None\".\n*   **Supervisor Routing:** The supervisor routes the query to `invoice_information_subagent` and `music_catalog_subagent` as needed.\n*   **Create Memory:** After the response about “The Rolling Stones” `create_memory` analyzes the conversation, identifies the artist as a new preference, and saves it to the `in_memory_store` for `customer_id = 1`.\n\nThis flow is purely showing how long term memory is gettinh handled by our agent, but infact we take a look at the memory.\n\nWe can directly access our `in_memory_store` to check if the music preference was saved.\n\n```python\nuser_id = \"1\" # Assuming customer ID 1 was used in the previous interaction\nnamespace = (\"memory_profile\", user_id)\nmemory = in_memory_store.get(namespace, \"user_memory\")\n\n# Access the UserProfile object stored under the \"memory\" key\nsaved_music_preferences = memory.value.get(\"memory\").music_preferences\n\nprint(saved_music_preferences)\n```\n\n```text\n### OUTPUT ###\n['Rolling Stones']\n```\n\nThe output `['Rolling Stones']` confirms that our `create_memory` node successfully extracted and saved the user's music preference to long-term memory.\n\nIn future interactions, this information can be loaded by `load_memory` to provide even more personalized responses.\n\n## Evaluating our Multi-AI Agent\n\n**Evaluations** help you measure how well your agents perform, which is critical because LLM behavior can vary with even small prompt or model changes. Evaluations give you a structured way to catch failures, compare versions, and improve reliability.\n\n![Evaluation of AI Agent](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*MfcI7iD-qsZfVh8ivsvvXg.png)\n\nEvaluations consist of 3 parts:\n\n1.  **Dataset:** A set of test inputs and expected outputs.\n2.  **Target function:** The app or agent you’re testing; it takes inputs and returns outputs.\n3.  **Evaluators:** Tools that score the agent’s outputs.\n\nAnd some Common Agent Evaluation Types:\n\n1.  **Final Response:** Check if the agent gave the correct final answer.\n2.  **Single Step:** Evaluate one step (e.g. was the right tool chosen?).\n3.  **Trajectory:** Evaluate the full reasoning path the agent took to reach the answer.\n\nOne of the most straightforward ways to evaluate an agent is to assess its overall performance on a task.\n\nThis is like treating the agent as a **“black box”** and simply evaluating whether or not its final response successfully addresses the user’s query and meets the expected criteria.\n\n*   **Input**: The user’s initial query.\n*   **Output**: The agent’s final generated response.\n\nFirst, we need a dataset of questions and their corresponding expected (ground truth) final responses. This dataset will serve as the benchmark for our evaluation. We’ll use the `langsmith.Client` to create and upload this dataset.\n\n```python\nfrom langsmith import Client\n\nclient = Client()\n\n# Define example questions and their expected final responses for evaluation\nexamples = [\n    {\n        \"question\": \"My name is Aaron Mitchell. My number associated with my account is +1 (204) 452-6452. I am trying to find the invoice number for my most recent song purchase. Could you help me with it?\",\n        \"response\": \"The Invoice ID of your most recent purchase was 342.\",\n    },\n    {\n        \"question\": \"I'd like a refund.\",\n        \"response\": \"I need additional information to help you with the refund. Could you please provide your customer identifier so that we can fetch your purchase history?\",\n    },\n    {\n        \"question\": \"Who recorded Wish You Were Here again?\",\n        \"response\": \"Wish You Were Here is an album by Pink Floyd\", # Note: The model might return more details, but this is the core expected fact.\n    },\n    { \n        \"question\": \"What albums do you have by Coldplay?\",\n        \"response\": \"There are no Coldplay albums available in our catalog at the moment.\",\n    },\n]\n\ndataset_name = \"LangGraph 101 Multi-Agent: Final Response\"\n\n# Check if the dataset already exists to avoid recreation errors\nif not client.has_dataset(dataset_name=dataset_name):\n    dataset = client.create_dataset(dataset_name=dataset_name)\n    client.create_examples(\n        inputs=[{\"question\": ex[\"question\"]} for ex in examples],\n        outputs=[{\"response\": ex[\"response\"]} for ex in examples],\n        dataset_id=dataset.id\n    )\n```\n\nNow we defines four example scenarios, each with a question (the input to our agent) and an expected response (what we consider a correct final output).\n\nIt then creates a dataset in LangSmith and populates it with these examples.\n\nNext, we define a target function that encapsulates how our agent (`multi_agent_final_graph`) should be run for evaluation.\n\nThis function will take the question from our dataset as input and return the agent’s final generated response.\n\n```python\nimport uuid\nfrom langgraph.types import Command\n\ngraph = multi_agent_final_graph\n\nasync def run_graph(inputs: dict):\n    \"\"\"\n    Run the multi-agent graph workflow and return the final response.\n    \n    This function handles the complete workflow including:\n    1. Initial invocation with user question\n    2. Handling human-in-the-loop interruption for customer verification\n    3. Resuming with customer ID to complete the request\n    \n    Args:\n        inputs (dict): Dictionary containing the user's question\n        \n    Returns:\n        dict: Dictionary containing the final response from the agent\n    \"\"\"\n    # Create a unique thread ID for this conversation session\n    thread_id = uuid.uuid4()\n    configuration = {\"configurable\": {\"thread_id\": thread_id, \"user_id\": \"10\"}}\n\n    # Initial invocation of the graph with the user's question\n    # This will trigger the verification process and likely hit the interrupt\n    result = await graph.ainvoke({\n        \"messages\": [{\"role\": \"user\", \"content\": inputs['question']}]\n    }, config=configuration)\n    \n    # Resume from the human-in-the-loop interrupt by providing customer ID\n    # This allows the workflow to continue past the verification step\n    result = await graph.ainvoke(\n        Command(resume=\"My customer ID is 10\"), \n        config={\"configurable\": {\"thread_id\": thread_id, \"user_id\": \"10\"}}\n    )\n    \n    # Return the final response content from the last message\n    return {\"response\": result['messages'][-1].content}\n```\n\nNow, let’s define how to run our graph. Note that we must continue past the `interrupt()` by supplying a `Command(resume=\"\")` to the graph.\n\n```python\nfrom openevals.llm import create_llm_as_judge\nfrom openevals.prompts import CORRECTNESS_PROMPT\n\n# Using Open Eval pre-built \ncorrectness_evaluator = create_llm_as_judge(\n    prompt=CORRECTNESS_PROMPT,\n    feedback_key=\"correctness\",\n    judge=llm\n)\n```\n\nWe can also define our own evaluator too, like this.\n\n```python\nfrom typing import TypedDict, Annotated\nfrom langchain_core.messages import SystemMessage, HumanMessage\n\n# Custom definition of LLM-as-judge instructions\ngrader_instructions = \"\"\"You are a teacher grading a quiz.\n\nYou will be given a QUESTION, the GROUND TRUTH (correct) RESPONSE, and the STUDENT RESPONSE.\n\nHere is the grade criteria to follow:\n(1) Grade the student responses based ONLY on their factual accuracy relative to the ground truth answer.\n(2) Ensure that the student response does not contain any conflicting statements.\n(3) It is OK if the student response contains more information than the ground truth response, as long as it is factually accurate relative to the ground truth response.\n\nCorrectness:\nTrue means that the student's response meets all of the criteria.\nFalse means that the student's response does not meet all of the criteria.\n\nExplain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct.\"\"\"\n\n# LLM-as-judge output schema\nclass Grade(TypedDict):\n    \"\"\"Compare the expected and actual answers and grade the actual answer.\"\"\"\n    reasoning: Annotated[str, ..., \"Explain your reasoning for whether the actual response is correct or not.\"]\n    is_correct: Annotated[bool, ..., \"True if the student response is mostly or exactly correct, otherwise False.\"]\n\n# Judge LLM\ngrader_llm = llm.with_structured_output(Grade, method=\"json_schema\", strict=True)\n\n# Evaluator function\nasync def final_answer_correct(inputs: dict, outputs: dict, reference_outputs: dict) -\u003e bool:\n    \"\"\"Evaluate if the final response is equivalent to reference response.\"\"\"\n    # Note that we assume the outputs has a 'response' dictionary. We'll need to make sure\n    # that the target function we define includes this key.\n    user = f\"\"\"QUESTION: {inputs['question']}\n    GROUND TRUTH RESPONSE: {reference_outputs['response']}\n    STUDENT RESPONSE: {outputs['response']}\"\"\"\n\n    grade = await grader_llm.ainvoke([SystemMessage(content=grader_instructions), HumanMessage(content=user)])\n    return grade[\"is_correct\"]\n```\n\nWe can use LLM as a judge to between our ground truth and our ai agent response. Now that we have compile each and everything, let’s run the evaluation.\n\n```python\n# Run the evaluation experiment\n# This will test our multi-agent graph against the dataset using both evaluators\nexperiment_results = await client.aevaluate(\n    run_graph,                                    # The application function to evaluate\n    data=dataset_name,                           # Dataset containing test questions and expected responses\n    evaluators=[final_answer_correct, correctness_evaluator],  # List of evaluators to assess performance\n    experiment_prefix=\"agent-result\",       # Prefix for organizing experiment results in LangSmith\n    num_repetitions=1,                           # Number of times to run each test case\n    max_concurrency=5,                           # Maximum number of concurrent evaluations\n)\n```\n\nWhen you run this command and the evaluation completes, it will output the LangSmith dashboard page containing our results. Let’s check that out.\n\n![Langsmith Dashboard Result](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*DIEOIpyiFedQoSKB9Iw59w.png)\n\nOur LangSmith dashboard contains the results of our evaluation, showing parameters such as correctness, final results, their comparison, and more.\n\nThere are other evaluation techniques are also can be used which you can find in the notebook in more detailed, make sure to check them out!\n\n## Swarm vs Supervisor\n\nSo far, we’ve built a multi-agent system using the **Supervisor** approach, where a central agent manages the flow and delegates tasks to sub-agents.\n\nAn alternative is the **Swarm Architecture**, as described in the [LangGraph docs](https://langchain-ai.github.io/langgraph/how-tos/multi-agent/swarm/). In a swarm, agents collaborate and pass tasks directly among themselves, without a central coordinator.\n\nIn the github notebook, you can find the swarm architecture also but take a look at comparison between swarm and supervisor.\n\n| Feature         | Supervisor                                        | Swarm                                                         |\n| :-------------- | :------------------------------------------------ | :------------------------------------------------------------ |\n| **Control Flow** | Centralized; a primary agent directs traffic.     | Decentralized; agents collaborate and hand off tasks directly. |\n| **Hierarchy**   | Hierarchical; a \"boss\" agent delegates to sub-agents. | Peer-to-peer; no central authority.                           |\n| **Predictability**| More predictable and structured path.             | Adaptive and dynamic collaboration.                           |\n| **Resilience**   | Control typically returns to the supervisor.      | Potentially more resilient due to direct agent collaboration. |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffareedkhan-dev%2Fmulti-agent-ai-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffareedkhan-dev%2Fmulti-agent-ai-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffareedkhan-dev%2Fmulti-agent-ai-system/lists"}