{"id":50768063,"url":"https://github.com/dynstat/aiagent-rag","last_synced_at":"2026-06-11T15:30:44.141Z","repository":{"id":358418416,"uuid":"1241323690","full_name":"dynstat/aiagent-rag","owner":"dynstat","description":"Project demonstrating Retrieval-Augmented Generation (RAG) with a multi-tool AI agent using LangChain, LangGraph, and LangSmith.","archived":false,"fork":false,"pushed_at":"2026-05-24T13:04:44.000Z","size":4248,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-24T13:24:10.020Z","etag":null,"topics":["agents","ai","langchain","langgraph","langsmith","rag"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dynstat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-17T08:34:09.000Z","updated_at":"2026-05-24T13:04:47.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/dynstat/aiagent-rag","commit_stats":null,"previous_names":["dynstat/aiagent-rag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dynstat/aiagent-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dynstat%2Faiagent-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dynstat%2Faiagent-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dynstat%2Faiagent-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dynstat%2Faiagent-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dynstat","download_url":"https://codeload.github.com/dynstat/aiagent-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dynstat%2Faiagent-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34206487,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","langchain","langgraph","langsmith","rag"],"created_at":"2026-06-11T15:30:41.694Z","updated_at":"2026-06-11T15:30:44.103Z","avatar_url":"https://github.com/dynstat.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Agent RAG — Technical Knowledge Assistant\n\nAn educational project demonstrating **Retrieval-Augmented Generation (RAG)** with a **multi-tool AI agent** designed to help you explore and understand any technical documentation you provide.\n\n## What This Project Teaches\n\n| Concept | Where It's Implemented |\n|---|---|\n| RAG pipeline (embed → store → retrieve) | `rag/` module + `data/ingest.py` |\n| LangGraph agent (nodes + edges + state) | `agent/graph.py` |\n| Tool calling (multiple tools, multiple turns) | `tools/` module |\n| PDF/Markdown/Text ingestion | `data/ingest.py` (via `PyPDFLoader`) |\n| Short-term memory (sliding window) | `memory/conversation_memory.py` |\n| Long-term memory (checkpointer) | `agent/graph.py` → `MemorySaver` |\n| Provider abstraction (Gemini / OpenAI) | `llm_factory.py` |\n\n## Project Structure\n\n```\naiagent-rag/\n├── main.py                      # Entry point — interactive REPL\n├── config.py                    # Centralized config from .env\n├── llm_factory.py               # Creates Gemini or OpenAI LLM\n│\n├── agent/\n│   ├── graph.py                 # LangGraph StateGraph definition\n│   └── runner.py                # High-level AgentRunner class\n│\n├── rag/\n│   ├── embeddings.py            # SentenceTransformers embedding model\n│   └── vector_store.py          # ChromaDB vector store + ingestion\n│\n├── memory/\n│   └── conversation_memory.py   # Sliding-window short-term memory\n│\n├── tools/\n│   ├── rag_tool.py              # Tool: search the vector store\n│   └── utility_tools.py         # Tools: date and time utility\n│\n└── data/\n    ├── ingest.py                # Run once to populate ChromaDB\n    └── knowledge_base/          # Your PDF/MD/TXT files go here\n```\n\n## Setup\n\nThis project uses [uv](https://github.com/astral-sh/uv) for extremely fast Python package and project management.\n\n### 1. Install dependencies\n```bash\nuv sync\n```\nThis automatically creates a virtual environment and installs all required packages from `pyproject.toml`.\n\n### 2. Configure your API keys\n\n**Windows (PowerShell):**\n```powershell\nCopy-Item .env.example .env\n```\n\n**macOS / Linux:**\n```bash\ncp .env.example .env\n```\n\n**Then edit `.env` and fill in your keys.**\n\nGet your keys:\n- **Google Gemini**: https://aistudio.google.com/app/apikey\n- **Groq**: https://console.groq.com/keys\n\n### 3. Ingest your knowledge base (one-time setup)\n```bash\n# Add your technical documents (.pdf, .md, .txt) to data/knowledge_base/\nuv run data/ingest.py\n```\n\n### 4. Run the agent\n```bash\nuv run main.py\n```\n\n## Project Management\n\n### Adding new packages\nIf you need to add new tools or libraries:\n```bash\nuv add \u003cpackage_name\u003e\n```\n\n### Updating the knowledge base\nWhenever you add or remove files in `data/knowledge_base/`, simply re-run the ingestion:\n```bash\nuv run data/ingest.py\n```\n\n## ☁️ Running in Google Colab\n\nThis project runs perfectly in Google Colab. Since Colab provides a temporary environment, follow these steps:\n\n### 1. Set up API Keys (Secrets)\nInstead of a `.env` file, use Colab's built-in **Secrets** (the key icon 🔑 in the left sidebar):\n1.  Add a secret named `LLM_PROVIDER` (value: `gemini`, `openai`, or `groq`).\n2.  Add your specific key (e.g., `GOOGLE_API_KEY` or `GROQ_API_KEY`).\n3.  **IMPORTANT**: Toggle the blue **\"Notebook access\"** switch to **ON** for all keys.\n\n### 2. Run in a Cell\nCopy and paste this into a Colab cell to initialize and start the agent:\n\n```python\n# 1. Install uv and clone\n!pip install uv\n!git clone https://github.com/dynstat/aiagent-rag.git\n%cd aiagent-rag\n\n# 2. Fast install (using --system for Colab)\n!uv pip install . --system\n\n# 3. Inject Secrets into Environment\nfrom google.colab import userdata\nimport os\n\ntry:\n    os.environ[\"LLM_PROVIDER\"] = userdata.get('LLM_PROVIDER')\n    if os.environ[\"LLM_PROVIDER\"] == \"gemini\":\n        os.environ[\"GOOGLE_API_KEY\"] = userdata.get('GOOGLE_API_KEY')\n    elif os.environ[\"LLM_PROVIDER\"] == \"groq\":\n        os.environ[\"GROQ_API_KEY\"] = userdata.get('GROQ_API_KEY')\n    print(f\"✅ Environment Configured: {os.environ['LLM_PROVIDER']}\")\nexcept Exception as e:\n    print(f\"❌ Setup Error: {e}\")\n    print(\"Ensure you added LLM_PROVIDER and your API Key to the 'Secrets' sidebar and enabled 'Notebook access'.\")\n\n# 4. Ingest Data \u0026 Run\n!python data/ingest.py\n!python main.py\n```\n\n## Example Queries (using sample data)\n\n- `What is the core architecture described in the documents?`\n- `How do I implement X based on the provided guides?`\n- `Explain the difference between concepts A and B.`\n- `Summarize the best practices from the technical documentation.`\n\n## How to Customize for Your Own Use Case\n\nThis project is built to be a template. You can easily connect it to **any** technical documentation.\n\n### 1. Adding Your Documentation (RAG)\n1. Place your `.pdf`, `.md`, or `.txt` files in the `data/knowledge_base/` folder.\n2. Run `python data/ingest.py` to embed and store them in the local ChromaDB.\n3. The agent will automatically search this database whenever you ask a question related to your specific content.\n\n### 2. Update the Agent's Persona\nTo change how the agent behaves or its default expertise, edit the `SYSTEM_PROMPT` inside `agent/graph.py`. Give it rules and guidelines specific to your technical domain!\n\n\u003e **Note for Groq Users**: Llama 3 models on Groq are highly sensitive to tool-calling instructions in the system prompt. For best results, keep the `SYSTEM_PROMPT` clean and avoid mentioning tool names directly; let the LLM use the structured tool-calling API automatically. We recommend using `llama-3.3-70b-versatile` for the best balance of speed and reasoning.\n\n## Architecture: How the Agent Thinks\n\n```\nUser Question\n     │\n     ▼\nLLM Node (Gemini/OpenAI/Groq)\n  → Reads system prompt + conversation history\n  → Decides: call a tool OR answer directly\n     │\n     ├─── Tool Call? ──→ ToolNode\n     │                     → rag_search (vector DB lookup)\n     │                     → get_current_date_and_time (Utility)\n     │                     │\n     │                     └──→ back to LLM Node (loop!)\n     │\n     └─── Final Answer? ──→ Return to user\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdynstat%2Faiagent-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdynstat%2Faiagent-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdynstat%2Faiagent-rag/lists"}