awesome-agentic-knowledge-base
How 40+ agentic repos actually build their knowledge bases
https://github.com/irresi/awesome-agentic-knowledge-base
Last synced: 1 day ago
JSON representation
-
Adoption — Observability / Eval (n=46)
-
Per-repo connector / harness highlights
-
-
Enterprise & closed-source landscape
-
Agent memory products
- Mem0 Cloud - fact extraction, SaaS-managed.
- Zep Cloud - temporal KG memory as a service.
- Membase - built from connected apps (Gmail / Slack / Notion / GitHub / Drive) and delivered to agents over MCP. Connector-driven, prosumer-flavored counterpart to mem0 / Zep.
-
Document AI / entity extraction
- AWS Comprehend - in + custom entity recognizers), key-phrase extraction, PII detection, syntax. Frequently sits upstream of a Bedrock KB or a custom ingestion pipeline.
- Azure AI Document Intelligence - model training.
- Google Document AI
-
Enterprise search & Copilot-style assistants
- Glean - connector enterprise search + agent platform; the closed-source shape that the cohort's [`onyx-dot-app/onyx`](surveys/onyx-dot-app__onyx.md) most directly competes with.
- Microsoft 365 Copilot + Microsoft Graph - Microsoft sources.
- Notion AI - grounded assistant; closest closed analogue to the `wiki-compiler` entries.
-
Foundation-model-vendor RAG & memory
- OpenAI Assistants — File Search & Vector Stores - app` pattern.
- Anthropic Files API + memory tool - managed `memory` tool for long-term context; closest analogue to `memory-framework` entries like letta / honcho.
- Cohere Compass - aspect indexing (JSON-aware chunks) pitched at agentic RAG.
-
Hyperscaler managed RAG / KB
- Amazon Bedrock Knowledge Bases - extracted at ingest); structured-data NL→SQL retrieval against data lakes/warehouses; multimodal parsing for tables/figures/charts. Closest closed analogue to the `graphrag` + `kb-app` cohort entries combined.
- Azure AI Search - services/openai/concepts/use-your-data)** — vector + hybrid + semantic ranker; **skillsets** ingestion pipeline (built-in + custom skills incl. entity extraction); one-click RAG wiring on top of an existing index.
- Google Vertex AI Search & Agent Builder - grounded agents, Gemini grounding with citations, layered on the Vertex AI RAG Engine.
-
-
Open-source repos
- infiniflow/ragflow - app | Production RAG with deep document understanding; swappable doc engine + per-format chunkers + in-memory NetworkX GraphRAG ([survey](surveys/infiniflow__ragflow.md)) |
- OpenHands/OpenHands - agent | Multi-tenant coding-agent orchestrator; sandboxed runtime + microagent skill loader ([survey](surveys/OpenHands__OpenHands.md)) |
- thedotmack/claude-mem - agent | Claude Code memory plugin; lifecycle hooks → SQLite + ChromaDB-via-stdio-MCP ([survey](surveys/thedotmack__claude-mem.md)) |
- bytedance/deer-flow - agent | ByteDance super agent harness; LangGraph-native v2 rewrite + 21 public skills ([survey](surveys/bytedance__deer-flow.md)) |
- cline/cline - agent | VSCode/JetBrains/CLI coding agent; no DB — knowledge in `.clinerules/*.md` + `@file` mentions ([survey](surveys/cline__cline.md)) |
- Mintplex-Labs/anything-llm - app | Workspace-scoped multi-LLM kb-app; 37 LLMs + 14 embedders + 10 vector backends in-tree ([survey](surveys/Mintplex-Labs__anything-llm.md)) |
- mem0ai/mem0 - framework | Universal memory layer; LLM auto-extracts atomic facts from chat with 24-vector-backend matrix ([survey](surveys/mem0ai__mem0.md)) |
- run-llama/llama_index - framework | Foundational Python RAG/agent framework; 571 separately versioned integration packages ([survey](surveys/run-llama__llama_index.md)) |
- Aider-AI/aider - agent | Terminal pair-programmer; PageRank-weighted tree-sitter "repo-map" KB, no LLM extraction ([survey](surveys/Aider-AI__aider.md)) |
- safishamsi/graphify - compiler | Code/docs/papers/images → graph; Python lib distributed as Claude Code skill + 10 sibling-IDE bundles ([survey](surveys/safishamsi__graphify.md)) |
- mindsdb/mindsdb - layer | Federated SQL query engine; agents query unified data via single SQL surface, 34 in-tree handlers ([survey](surveys/mindsdb__mindsdb.md)) |
- HKUDS/LightRAG - storage abstraction × 13 backends + 6 retrieval modes ([survey](surveys/HKUDS__LightRAG.md)) |
- khoj-ai/khoj - app | Self-hostable personal "second-brain"; single-Postgres KB stack via pgvector + Muninn memory agent ([survey](surveys/khoj-ai__khoj.md)) |
- abhigyanpatwari/GitNexus - compiler | "Zero-Server Code Intelligence Engine"; CLI+MCP + browser zero-server from one repo ([survey](surveys/abhigyanpatwari__GitNexus.md)) |
- microsoft/graphrag
- AstrBotDevs/AstrBot - app | Multi-platform IM chatbot framework; SQLite + Faiss hybrid retrieval + 8 IM platform adapters ([survey](surveys/AstrBotDevs__AstrBot.md)) |
- onyx-dot-app/onyx - app | Most enterprise-shaped repo; 49 SaaS connectors + federated retrieval on Vespa/OpenSearch + ACP "Build" sandbox ([survey](surveys/onyx-dot-app__onyx.md)) |
- simstudioai/sim - app | Bun + Next.js workflow platform; 35 connectors + 220 tools + persisted-workflow-as-MCP server ([survey](surveys/simstudioai__sim.md)) |
- ComposioHQ/composio - app | Toolkit-routing-as-service; 1000+ third-party-tool integrations + per-user isolated MCP sessions ([survey](surveys/ComposioHQ__composio.md)) |
- labring/FastGPT - app | TypeScript-first kb + visual workflow platform; pgvector/Milvus/OceanBase + MongoDB metadata ([survey](surveys/labring__FastGPT.md)) |
- getzep/graphiti - framework | Bi-temporal KG library; every edge carries 4 temporal fields, Neo4j/FalkorDB/Kuzu/Neptune backends ([survey](surveys/getzep__graphiti.md)) |
- deepset-ai/haystack - framework | Component-pipeline RAG framework; 24 component categories + 50+ vector-backend sibling packages ([survey](surveys/deepset-ai__haystack.md)) |
- volcengine/OpenViking - framework | ByteDance Volcengine "Context Database for AI Agents"; filesystem-paradigm context with 7 backend plugins ([survey](surveys/volcengine__OpenViking.md)) |
- HKUDS/DeepTutor - app | Agent-Native Personalized Tutoring; versioned KB indexes + scheduled TutorBot subsystem ([survey](surveys/HKUDS__DeepTutor.md)) |
- letta-ai/letta - framework | The original MemGPT; agent-self-managed memory blocks + 50 explicitly normalized ORM tables ([survey](surveys/letta-ai__letta.md)) |
- 1Panel-dev/MaxKB - app | "Max Knowledge Brain" enterprise agent platform from FIT2CLOUD; single-Postgres + pgvector ([survey](surveys/1Panel-dev__MaxKB.md)) |
- arc53/DocsGPT - app | Private AI platform for agents + assistants + enterprise search; 4-agent-type taxonomy + RAG-as-LLM-tool ([survey](surveys/arc53__DocsGPT.md)) |
- topoteretes/cognee - framework | ECL (Extract / Cognify / Load) memory platform; rdflib/OWL ontologies + named "memify" pipelines ([survey](surveys/topoteretes__cognee.md)) |
- AsyncFuncAI/deepwiki-open - compiler | DeepWiki clone; turns GitHub/GitLab/BitBucket repo into wiki + Mermaid diagrams + Ask + DeepResearch ([survey](surveys/AsyncFuncAI__deepwiki-open.md)) |
- memvid/memvid - framework | First Rust-native repo; single `.mv2` file packs WAL + Tantivy + HNSW + Logic-Mesh graph + signed/encrypted capsules ([survey](surveys/memvid__memvid.md)) |
- tirth8205/code-review-graph - compiler | Token-efficient codebase KG; tree-sitter (32 languages) + MCP, auto-installs into 11 AI coding tools ([survey](surveys/tirth8205__code-review-graph.md)) |
- Tencent/WeKnora - app | Tencent's RAG + Agent + Auto-Wiki platform; 7 vector backends + 7 IM platforms + step-graph chat pipeline ([survey](surveys/Tencent__WeKnora.md)) |
- MODSetter/SurfSense - app | Privacy-focused NotebookLM alternative; 22 connector indexers + 9 ETL parsers + 4-process distribution ([survey](surveys/MODSetter__SurfSense.md)) |
- NevaMind-AI/memU - framework | "24/7 Always-On Proactive Memory" framework; Python with Rust core via PyO3 ([survey](surveys/NevaMind-AI__memU.md)) |
- mksglu/context-mode - app | Context-engineering MCP server; tool-output sandboxing + "Think in Code" + 98% context reduction ([survey](surveys/mksglu__context-mode.md)) |
- vectorize-io/hindsight - framework | Vectorize's open-source agent memory; biomimetic 3-tier (World facts / Experience facts / Mental models) ([survey](surveys/vectorize-io__hindsight.md)) |
- Lum1104/Understand-Anything - compiler | First wiki-compiler in cohort; Claude Code plugin → KG + React/React-Flow dashboard, no DB ([survey](surveys/Lum1104__Understand-Anything.md)) |
- MemTensor/MemOS - framework | Research-grade memory framework; three-tier cross-modality (KV-cache / LoRA / textual) + MemCube abstraction ([survey](surveys/MemTensor__MemOS.md)) |
- xerrors/Yuxi - app | CN-language Agent Harness explicitly built on LightRAG + Vue + FastAPI + LangGraph v1 ([survey](surveys/xerrors__Yuxi.md)) |
- campfirein/byterover-cli - framework | Memory-router-as-product; `brv` CLI + Ink REPL + Vite Web UI over 7 memory backends ([survey](surveys/campfirein__byterover-cli.md)) |
- FalkorDB/FalkorDB - layer | Graph-database engine loaded as Redis module; sparse-matrix adjacency via GraphBLAS + OpenCypher + Bolt ([survey](surveys/FalkorDB__FalkorDB.md)) |
- memgraph/memgraph - layer | Cypher-compatible in-memory graph DB; single-query atomic retrieval (text + vector + graph) ([survey](surveys/memgraph__memgraph.md)) |
- AgriciDaniel/claude-obsidian - compiler | Claude Code plugin + Obsidian vault implementing Andrej Karpathy's "LLM Wiki" pattern ([survey](surveys/AgriciDaniel__claude-obsidian.md)) |
- circlemind-ai/fast-graphrag - only GraphRAG; Personalized PageRank as primary retrieval primitive + pickle-only persistence ([survey](surveys/circlemind-ai__fast-graphrag.md)) |
- plastic-labs/honcho - framework | Plastic Labs's memory library; peer paradigm + scheduled "memory consolidation agent" (Dreamer) ([survey](surveys/plastic-labs__honcho.md)) |
- basicmachines-co/basic-memory - framework | Local-first Zettelkasten + KG over markdown files; rule-based grammar (no LLM extraction) ([survey](surveys/basicmachines-co__basic-memory.md)) |
- tinyhumansai/openhuman - app | Rust-core Tauri desktop Personal AI; 4-phase Memory-Tree (bucket-seal L0=50k → L1+ fanout=10) writes an Obsidian-readable vault + cohort-first MeetAgent + TokenJuice ([survey](surveys/tinyhumansai__openhuman.md)) |
-
Patterns observed
-
Cohort meta-patterns
- LightRAG #580 - bug fix* for another cohort entry.
-
Programming Languages
Categories
Sub Categories
Keywords
rag
22
llm
18
ai
17
python
9
agents
9
ai-agents
8
knowledge-graph
7
chatgpt
7
graphrag
6
claude-code
5
gpt-4
5
llms
4
openai
4
memory
4
nextjs
4
typescript
4
retrieval-augmented-generation
4
semantic-search
4
vector-database
4
langchain
4
large-language-models
3
gemini
3
claude
3
anthropic
3
open-source
3
graph-database
3
obsidian
3
gpt
3
ollama
3
nlp
3
genai
3
deepseek
3
codex
3
chatbot
3
information-retrieval
3
agent
3
machine-learning
3
react
3
context
2
llama3
2
artificial-intelligence
2
neo4j
2
pytorch
2
graph
2
nodejs
2
agentic-workflow
2
mcp
2
language-model
2
aiagents
2
developer-tools
2