An open API service indexing awesome lists of open source software.

awesome-agentic-knowledge-base

How 40+ agentic repos actually build their knowledge bases
https://github.com/irresi/awesome-agentic-knowledge-base

Last synced: 1 day ago
JSON representation

  • Adoption — Observability / Eval (n=46)

  • Enterprise & closed-source landscape

    • Agent memory products

      • Mem0 Cloud - fact extraction, SaaS-managed.
      • Zep Cloud - temporal KG memory as a service.
      • Membase - built from connected apps (Gmail / Slack / Notion / GitHub / Drive) and delivered to agents over MCP. Connector-driven, prosumer-flavored counterpart to mem0 / Zep.
    • Document AI / entity extraction

      • Glean - connector enterprise search + agent platform; the closed-source shape that the cohort's [`onyx-dot-app/onyx`](surveys/onyx-dot-app__onyx.md) most directly competes with.
      • Microsoft 365 Copilot + Microsoft Graph - Microsoft sources.
      • Notion AI - grounded assistant; closest closed analogue to the `wiki-compiler` entries.
    • Foundation-model-vendor RAG & memory

    • Hyperscaler managed RAG / KB

      • Amazon Bedrock Knowledge Bases - extracted at ingest); structured-data NL→SQL retrieval against data lakes/warehouses; multimodal parsing for tables/figures/charts. Closest closed analogue to the `graphrag` + `kb-app` cohort entries combined.
      • Azure AI Search - services/openai/concepts/use-your-data)** — vector + hybrid + semantic ranker; **skillsets** ingestion pipeline (built-in + custom skills incl. entity extraction); one-click RAG wiring on top of an existing index.
      • Google Vertex AI Search & Agent Builder - grounded agents, Gemini grounding with citations, layered on the Vertex AI RAG Engine.
  • Open-source repos

    • infiniflow/ragflow - app | Production RAG with deep document understanding; swappable doc engine + per-format chunkers + in-memory NetworkX GraphRAG ([survey](surveys/infiniflow__ragflow.md)) |
    • OpenHands/OpenHands - agent | Multi-tenant coding-agent orchestrator; sandboxed runtime + microagent skill loader ([survey](surveys/OpenHands__OpenHands.md)) |
    • thedotmack/claude-mem - agent | Claude Code memory plugin; lifecycle hooks → SQLite + ChromaDB-via-stdio-MCP ([survey](surveys/thedotmack__claude-mem.md)) |
    • bytedance/deer-flow - agent | ByteDance super agent harness; LangGraph-native v2 rewrite + 21 public skills ([survey](surveys/bytedance__deer-flow.md)) |
    • cline/cline - agent | VSCode/JetBrains/CLI coding agent; no DB — knowledge in `.clinerules/*.md` + `@file` mentions ([survey](surveys/cline__cline.md)) |
    • Mintplex-Labs/anything-llm - app | Workspace-scoped multi-LLM kb-app; 37 LLMs + 14 embedders + 10 vector backends in-tree ([survey](surveys/Mintplex-Labs__anything-llm.md)) |
    • mem0ai/mem0 - framework | Universal memory layer; LLM auto-extracts atomic facts from chat with 24-vector-backend matrix ([survey](surveys/mem0ai__mem0.md)) |
    • run-llama/llama_index - framework | Foundational Python RAG/agent framework; 571 separately versioned integration packages ([survey](surveys/run-llama__llama_index.md)) |
    • Aider-AI/aider - agent | Terminal pair-programmer; PageRank-weighted tree-sitter "repo-map" KB, no LLM extraction ([survey](surveys/Aider-AI__aider.md)) |
    • safishamsi/graphify - compiler | Code/docs/papers/images → graph; Python lib distributed as Claude Code skill + 10 sibling-IDE bundles ([survey](surveys/safishamsi__graphify.md)) |
    • mindsdb/mindsdb - layer | Federated SQL query engine; agents query unified data via single SQL surface, 34 in-tree handlers ([survey](surveys/mindsdb__mindsdb.md)) |
    • HKUDS/LightRAG - storage abstraction × 13 backends + 6 retrieval modes ([survey](surveys/HKUDS__LightRAG.md)) |
    • khoj-ai/khoj - app | Self-hostable personal "second-brain"; single-Postgres KB stack via pgvector + Muninn memory agent ([survey](surveys/khoj-ai__khoj.md)) |
    • abhigyanpatwari/GitNexus - compiler | "Zero-Server Code Intelligence Engine"; CLI+MCP + browser zero-server from one repo ([survey](surveys/abhigyanpatwari__GitNexus.md)) |
    • microsoft/graphrag
    • AstrBotDevs/AstrBot - app | Multi-platform IM chatbot framework; SQLite + Faiss hybrid retrieval + 8 IM platform adapters ([survey](surveys/AstrBotDevs__AstrBot.md)) |
    • onyx-dot-app/onyx - app | Most enterprise-shaped repo; 49 SaaS connectors + federated retrieval on Vespa/OpenSearch + ACP "Build" sandbox ([survey](surveys/onyx-dot-app__onyx.md)) |
    • simstudioai/sim - app | Bun + Next.js workflow platform; 35 connectors + 220 tools + persisted-workflow-as-MCP server ([survey](surveys/simstudioai__sim.md)) |
    • ComposioHQ/composio - app | Toolkit-routing-as-service; 1000+ third-party-tool integrations + per-user isolated MCP sessions ([survey](surveys/ComposioHQ__composio.md)) |
    • labring/FastGPT - app | TypeScript-first kb + visual workflow platform; pgvector/Milvus/OceanBase + MongoDB metadata ([survey](surveys/labring__FastGPT.md)) |
    • getzep/graphiti - framework | Bi-temporal KG library; every edge carries 4 temporal fields, Neo4j/FalkorDB/Kuzu/Neptune backends ([survey](surveys/getzep__graphiti.md)) |
    • deepset-ai/haystack - framework | Component-pipeline RAG framework; 24 component categories + 50+ vector-backend sibling packages ([survey](surveys/deepset-ai__haystack.md)) |
    • volcengine/OpenViking - framework | ByteDance Volcengine "Context Database for AI Agents"; filesystem-paradigm context with 7 backend plugins ([survey](surveys/volcengine__OpenViking.md)) |
    • HKUDS/DeepTutor - app | Agent-Native Personalized Tutoring; versioned KB indexes + scheduled TutorBot subsystem ([survey](surveys/HKUDS__DeepTutor.md)) |
    • letta-ai/letta - framework | The original MemGPT; agent-self-managed memory blocks + 50 explicitly normalized ORM tables ([survey](surveys/letta-ai__letta.md)) |
    • 1Panel-dev/MaxKB - app | "Max Knowledge Brain" enterprise agent platform from FIT2CLOUD; single-Postgres + pgvector ([survey](surveys/1Panel-dev__MaxKB.md)) |
    • arc53/DocsGPT - app | Private AI platform for agents + assistants + enterprise search; 4-agent-type taxonomy + RAG-as-LLM-tool ([survey](surveys/arc53__DocsGPT.md)) |
    • topoteretes/cognee - framework | ECL (Extract / Cognify / Load) memory platform; rdflib/OWL ontologies + named "memify" pipelines ([survey](surveys/topoteretes__cognee.md)) |
    • AsyncFuncAI/deepwiki-open - compiler | DeepWiki clone; turns GitHub/GitLab/BitBucket repo into wiki + Mermaid diagrams + Ask + DeepResearch ([survey](surveys/AsyncFuncAI__deepwiki-open.md)) |
    • memvid/memvid - framework | First Rust-native repo; single `.mv2` file packs WAL + Tantivy + HNSW + Logic-Mesh graph + signed/encrypted capsules ([survey](surveys/memvid__memvid.md)) |
    • tirth8205/code-review-graph - compiler | Token-efficient codebase KG; tree-sitter (32 languages) + MCP, auto-installs into 11 AI coding tools ([survey](surveys/tirth8205__code-review-graph.md)) |
    • Tencent/WeKnora - app | Tencent's RAG + Agent + Auto-Wiki platform; 7 vector backends + 7 IM platforms + step-graph chat pipeline ([survey](surveys/Tencent__WeKnora.md)) |
    • MODSetter/SurfSense - app | Privacy-focused NotebookLM alternative; 22 connector indexers + 9 ETL parsers + 4-process distribution ([survey](surveys/MODSetter__SurfSense.md)) |
    • NevaMind-AI/memU - framework | "24/7 Always-On Proactive Memory" framework; Python with Rust core via PyO3 ([survey](surveys/NevaMind-AI__memU.md)) |
    • mksglu/context-mode - app | Context-engineering MCP server; tool-output sandboxing + "Think in Code" + 98% context reduction ([survey](surveys/mksglu__context-mode.md)) |
    • vectorize-io/hindsight - framework | Vectorize's open-source agent memory; biomimetic 3-tier (World facts / Experience facts / Mental models) ([survey](surveys/vectorize-io__hindsight.md)) |
    • Lum1104/Understand-Anything - compiler | First wiki-compiler in cohort; Claude Code plugin → KG + React/React-Flow dashboard, no DB ([survey](surveys/Lum1104__Understand-Anything.md)) |
    • MemTensor/MemOS - framework | Research-grade memory framework; three-tier cross-modality (KV-cache / LoRA / textual) + MemCube abstraction ([survey](surveys/MemTensor__MemOS.md)) |
    • xerrors/Yuxi - app | CN-language Agent Harness explicitly built on LightRAG + Vue + FastAPI + LangGraph v1 ([survey](surveys/xerrors__Yuxi.md)) |
    • campfirein/byterover-cli - framework | Memory-router-as-product; `brv` CLI + Ink REPL + Vite Web UI over 7 memory backends ([survey](surveys/campfirein__byterover-cli.md)) |
    • FalkorDB/FalkorDB - layer | Graph-database engine loaded as Redis module; sparse-matrix adjacency via GraphBLAS + OpenCypher + Bolt ([survey](surveys/FalkorDB__FalkorDB.md)) |
    • memgraph/memgraph - layer | Cypher-compatible in-memory graph DB; single-query atomic retrieval (text + vector + graph) ([survey](surveys/memgraph__memgraph.md)) |
    • AgriciDaniel/claude-obsidian - compiler | Claude Code plugin + Obsidian vault implementing Andrej Karpathy's "LLM Wiki" pattern ([survey](surveys/AgriciDaniel__claude-obsidian.md)) |
    • circlemind-ai/fast-graphrag - only GraphRAG; Personalized PageRank as primary retrieval primitive + pickle-only persistence ([survey](surveys/circlemind-ai__fast-graphrag.md)) |
    • plastic-labs/honcho - framework | Plastic Labs's memory library; peer paradigm + scheduled "memory consolidation agent" (Dreamer) ([survey](surveys/plastic-labs__honcho.md)) |
    • basicmachines-co/basic-memory - framework | Local-first Zettelkasten + KG over markdown files; rule-based grammar (no LLM extraction) ([survey](surveys/basicmachines-co__basic-memory.md)) |
    • tinyhumansai/openhuman - app | Rust-core Tauri desktop Personal AI; 4-phase Memory-Tree (bucket-seal L0=50k → L1+ fanout=10) writes an Obsidian-readable vault + cohort-first MeetAgent + TokenJuice ([survey](surveys/tinyhumansai__openhuman.md)) |
  • Patterns observed

    • Cohort meta-patterns