{"id":48931175,"url":"https://github.com/SamurAIGPT/llm-wiki-agent","last_synced_at":"2026-05-03T11:01:20.391Z","repository":{"id":154192321,"uuid":"630824392","full_name":"SamurAIGPT/llm-wiki-agent","owner":"SamurAIGPT","description":"A personal knowledge base that builds and maintains itself. Drop in sources — Claude (or Codex/Gemini) reads them, extracts knowledge, and maintains a persistent interlinked wiki. Works with Claude Code, Codex, OpenCode, Gemini CLI. No API key needed.","archived":false,"fork":false,"pushed_at":"2026-04-23T05:49:50.000Z","size":289,"stargazers_count":2231,"open_issues_count":2,"forks_count":255,"subscribers_count":30,"default_branch":"main","last_synced_at":"2026-04-23T07:25:20.736Z","etag":null,"topics":["claude-code","codex","gemini","knowledge-base","knowledge-graph","llm","markdown","obsidian","personal-knowledge-management","rag","wiki"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SamurAIGPT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2023-04-21T08:31:00.000Z","updated_at":"2026-04-23T06:59:27.000Z","dependencies_parsed_at":"2024-01-06T20:18:22.306Z","dependency_job_id":"53a775b0-b9ba-481b-b3e1-e37cbd8410fd","html_url":"https://github.com/SamurAIGPT/llm-wiki-agent","commit_stats":null,"previous_names":["samuraigpt/gpt-agent","samuraigpt/camel-autogpt","samuraigpt/llm-wiki-agent"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SamurAIGPT/llm-wiki-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamurAIGPT%2Fllm-wiki-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamurAIGPT%2Fllm-wiki-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamurAIGPT%2Fllm-wiki-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamurAIGPT%2Fllm-wiki-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SamurAIGPT","download_url":"https://codeload.github.com/SamurAIGPT/llm-wiki-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamurAIGPT%2Fllm-wiki-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32566444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude-code","codex","gemini","knowledge-base","knowledge-graph","llm","markdown","obsidian","personal-knowledge-management","rag","wiki"],"created_at":"2026-04-17T09:00:40.945Z","updated_at":"2026-05-03T11:01:20.369Z","avatar_url":"https://github.com/SamurAIGPT.png","language":"Python","funding_links":[],"categories":["Python","Productivity Tools"],"sub_categories":["Memory \u0026 Context Management"],"readme":"# LLM Wiki Agent\n\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n\n**A coding agent skill.** Drop source documents into `raw/` and tell the agent to ingest them — it reads them, extracts knowledge, and builds a persistent interlinked wiki. Every new source makes the wiki richer. You never write it.\n\n\u003e Most knowledge tools make you search your own notes. This one reads everything you've collected and writes a structured wiki that compounds over time — cross-references already built, contradictions already flagged, synthesis already done.\n\n```\ningest raw/papers/attention-is-all-you-need.md\n```\n\n```\nwiki/\n├── index.md          catalog of all pages — updated on every ingest\n├── log.md            append-only record of every operation\n├── overview.md       living synthesis across all sources\n├── sources/          one summary page per source document\n├── entities/         people, companies, projects — auto-created\n├── concepts/         ideas, frameworks, methods — auto-created\n└── syntheses/        query answers filed back as wiki pages\ngraph/\n├── graph.json        persistent node/edge data (SHA256-cached)\n└── graph.html        interactive vis.js visualization — open in any browser\n```\n\n## Install\n\n**Requires:** [Claude Code](https://claude.ai/code), [Codex](https://openai.com/codex), [Gemini CLI](https://github.com/google-gemini/gemini-cli), or any agent that reads a config file.\n\n```bash\ngit clone https://github.com/SamurAIGPT/llm-wiki-agent.git\ncd llm-wiki-agent\n```\n\nOpen in your agent — no API key or Python setup needed:\n\n```bash\nclaude      # reads CLAUDE.md + .claude/commands/ (slash commands available)\ncodex       # reads AGENTS.md\nopencode    # reads AGENTS.md\ngemini      # reads GEMINI.md\n```\n\n## Usage\n\nAll agents understand natural language and shorthand triggers:\n\n```\ningest raw/papers/my-paper.md              # ingest a markdown source\ningest report.pdf                          # auto-converts to .md, then ingests\ningest slides.pptx notes.docx              # batch, mixed formats\nquery: what are the main themes?           # synthesize answer from wiki pages\nlint                                       # find orphans, contradictions, gaps\nbuild graph                                # build graph.html from all wikilinks\n```\n\nPlain English works too:\n```\n\"Ingest this paper: raw/papers/llama2.md\"\n\"What does the wiki say about attention mechanisms?\"\n\"Check for contradictions across sources\"\n\"Build the knowledge graph and tell me the most connected nodes\"\n```\n\n**Claude Code** also provides `/wiki-ingest`, `/wiki-query`, `/wiki-lint`, `/wiki-graph` as slash commands (via `.claude/commands/`). These are Claude Code-specific — other agents use the natural language triggers above, which work identically.\n\nWorks with markdown, PDF, DOCX, PPTX, XLSX, HTML, TXT, CSV, JSON, XML, RST, EPUB, and more. Non-markdown files are auto-converted via [markitdown](https://github.com/microsoft/markitdown) at ingest time — no separate step needed.\n\n## What You Get\n\n**Persistent wiki** — structured markdown pages that accumulate across sessions. Unlike chat, nothing is lost.\n\n**Entity pages** — auto-created for every person, company, or project mentioned across sources. Updated each time a new source references them.\n\n**Concept pages** — auto-created for every key idea or framework. Cross-referenced to every source that discusses them.\n\n**Living overview** — `wiki/overview.md` is revised on every ingest to reflect the current synthesis across everything you've read.\n\n**Contradiction flags** — when a new source contradicts an existing claim, it's flagged at ingest time, not buried until query time.\n\n**Knowledge graph** — `graph.html` shows every wiki page as a node, every `[[wikilink]]` as an edge, and Claude-inferred implicit relationships as dotted edges. Community detection clusters related topics.\n\n**Lint reports** — orphan pages, broken links, missing entity pages, data gaps with suggested sources to fill them.\n\n## Use Cases\n\n### Research\n\nGoing deep on a topic over weeks — reading papers, articles, reports.\n\n```\n/wiki-ingest raw/papers/attention-is-all-you-need.md\n/wiki-ingest raw/papers/llama2.md\n/wiki-ingest raw/papers/rag-survey.md\n\n# Wiki builds entity pages (Meta AI, Google Brain) and\n# concept pages (Attention, RLHF, Context Window) automatically.\n\n/wiki-query \"What are the main approaches to reducing hallucination?\"\n/wiki-query \"How has context window size evolved across models?\"\n\n/wiki-lint\n# → \"No sources on mixture-of-experts — consider the Mixtral paper\"\n```\n\nBy the end you have a structured, interlinked reference — not a folder of PDFs you'll never reopen.\n\n---\n\n### Reading a Book\n\nFile each chapter as you go. Build out pages for characters, themes, arguments.\n\n```\n/wiki-ingest raw/book/chapter-01.md\n/wiki-ingest raw/book/chapter-02.md\n\n# Wiki creates entity and theme pages automatically.\n\n/wiki-query \"How has the protagonist's motivation evolved?\"\n/wiki-query \"What contradictions exist in the author's argument so far?\"\n\n/wiki-graph   # → graph.html shows every character/theme and how they connect\n```\n\nThink fan wikis like Tolkien Gateway — built as you read, with the agent doing all the cross-referencing.\n\n---\n\n### Personal Knowledge Base\n\nTrack goals, health, habits, self-improvement — file journal entries, articles, podcast notes.\n\n```\n/wiki-ingest raw/journal/2026-01-week1.md\n/wiki-ingest raw/articles/huberman-sleep-protocol.md\n/wiki-ingest raw/articles/atomic-habits-summary.md\n\n/wiki-query \"What patterns show up in my journal entries about energy?\"\n/wiki-query \"What habits have I tried and what was the outcome?\"\n```\n\nThe wiki builds a structured picture over time. Concepts like \"Sleep\", \"Exercise\", \"Deep Work\" accumulate evidence from every source filed.\n\n---\n\n### Business / Team Intelligence\n\nFeed in meeting transcripts, project docs, customer calls.\n\n```\n/wiki-ingest raw/meetings/q1-planning-transcript.md\n/wiki-ingest raw/docs/product-roadmap-2026.md\n/wiki-ingest raw/calls/customer-interview-acme.md\n\n/wiki-query \"What feature requests have come up most across customer calls?\"\n/wiki-query \"What decisions were made in Q1 and what was the rationale?\"\n\n/wiki-lint\n# → \"Project X mentioned in 5 pages but no dedicated page\"\n# → \"Roadmap contradicts customer interview on priority of feature Y\"\n```\n\nThe wiki stays current because the agent does the maintenance no one wants to do.\n\n---\n\n### Competitive Analysis\n\nTrack a company, market, or technology over time.\n\n```\n/wiki-ingest raw/competitors/openai-announcements.md\n/wiki-ingest raw/market/ai-funding-report-q1.md\n\n/wiki-query \"How do OpenAI and Anthropic differ on safety approach?\"\n/wiki-query \"Which companies announced multimodal models in the last 6 months?\"\n/wiki-query \"Competitive landscape summary as of today\"\n# → agent shows the answer, then asks if you want to save it as a synthesis page\n```\n\n## The Graph\n\nTwo-pass build:\n\n1. **Deterministic** — parses all `[[wikilinks]]` across wiki pages → edges tagged `EXTRACTED`\n2. **Semantic** — agent infers implicit relationships not captured by wikilinks → edges tagged `INFERRED` (with confidence score) or `AMBIGUOUS`\n\nLouvain community detection clusters nodes by topic. SHA256 cache means only changed pages are reprocessed. Output is a self-contained `graph.html` — no server, opens in any browser.\n\n## CLAUDE.md / AGENTS.md\n\nThe schema file tells the agent how to maintain the wiki — page formats, ingest/query/lint/graph workflows, naming conventions. This is the key config file. Edit it to customize behavior for your domain.\n\n| Agent | Schema file |\n|---|---|\n| Claude Code | `CLAUDE.md` |\n| Codex / OpenCode | `AGENTS.md` |\n| Gemini CLI | `GEMINI.md` |\n\n## What Makes This Different from RAG\n\n| RAG | LLM Wiki Agent |\n|---|---|\n| Re-derives knowledge every query | Compiles once, keeps current |\n| Raw chunks as retrieval unit | Structured wiki pages |\n| No cross-references | Cross-references pre-built |\n| Contradictions surface at query time (maybe) | Flagged at ingest time |\n| No accumulation | Every source makes the wiki richer |\n\n## Obsidian Integration\n\nThe wiki is designed to be browsed seamlessly in [Obsidian](https://obsidian.md). Since the agent maintains consistent `[[wikilinks]]`, you get a naturally growing knowledge graph in your vault.\n\n### Vault Symlink Pattern\nIf you want to keep the LLM Wiki Agent repository separate from your main personal vault, use symlinks:\n1. Keep your working agent repository at e.g., `~/llm-wiki-agent`\n2. Create a symlink from your main Obsidian vault:\n   ```bash\n   ln -sfn ~/llm-wiki-agent/wiki ~/your-obsidian-vault/wiki\n   ```\n3. Use the [Obsidian Web Clipper](https://obsidian.md/clipper) or write directly to `raw/` in the agent repo to queue items for ingestion.\n\n\u003e **Note:** If you ever move your local repo directory, remember to update the symlink, otherwise the `wiki/` directory will appear missing in Obsidian.\n\n### Recommended .obsidian Config\n- **Graph View:** Filter out `index.md` and `log.md` (e.g. `-file:index.md -file:log.md`) to avoid them becoming gravity wells in your Obsidian graph.\n- **Dataview:** Use the community plugin [Dataview](https://blacksmithgu.github.io/obsidian-dataview/) to query the YAML frontmatter the agent automatically injects (e.g., `type: source`, `tags: [diary]`).\n\n## Multi-Format Ingest\n\nDrop any supported file directly into `ingest` — no separate conversion step needed:\n\n```bash\n# These all work — auto-converted at ingest time\ningest report.pdf\ningest meeting-notes.docx\ningest slides.pptx\ningest data.xlsx\ningest page.html\ningest raw/mixed-folder/          # recursively finds all supported files\n```\n\n**Supported formats:**\n`.md` `.pdf` `.docx` `.pptx` `.xlsx` `.xls` `.html` `.htm` `.txt` `.csv` `.json` `.xml` `.rst` `.rtf` `.epub` `.ipynb` `.yaml` `.yml` `.tsv` `.wav` `.mp3`\n\nNon-markdown files are auto-converted via [markitdown](https://github.com/microsoft/markitdown). Use `--no-convert` to skip auto-conversion and process only `.md` files.\n\n### arXiv Papers (Advanced)\n\nFor arXiv papers, use `tools/pdf2md.py` for higher-fidelity conversion:\n\n```bash\npython tools/pdf2md.py 2401.12345                      # by arXiv ID\npython tools/pdf2md.py https://arxiv.org/abs/2401.12345 # by URL\npython tools/pdf2md.py paper.pdf --backend marker       # complex multi-column PDFs\n```\n\nThen ingest the resulting `.md`:\n```\ningest raw/papers/my-paper.md\n```\n\n### Batch Directory Conversion (Advanced)\n\nTo pre-convert an entire directory (useful for bulk imports):\n```bash\npython tools/file_to_md.py --input_dir raw/imports/\npython tools/file_to_md.py --input_dir raw/imports/ --delete_source  # remove originals\n```\n\n### Optional Dependencies\n\n| Package | Install | Used for |\n|---|---|---|\n| [markitdown](https://github.com/microsoft/markitdown) | `pip install markitdown` | Auto-conversion of non-.md files (required for multi-format ingest) |\n| [arxiv2md](https://github.com/ryansingman/arxiv2md) | `pip install arxiv2markdown` | arXiv papers via structured source |\n| [Marker](https://github.com/VikParuchuri/marker) | `pip install marker-pdf` | Complex academic PDFs with multi-column layouts |\n| [PyMuPDF4LLM](https://github.com/pymupdf/RAG) | `pip install pymupdf4llm` | Fast PDF extraction (no GPU needed) |\n| [tqdm](https://github.com/tqdm/tqdm) | `pip install tqdm` | Progress bar for batch directory conversion |\n\n## Tips\n\n- Just drop files (PDF, DOCX, etc.) into `raw/` and `ingest` them — conversion is automatic\n- For arXiv papers, `tools/pdf2md.py` gives higher-fidelity output than generic markitdown conversion\n- Query answers are shown first — the agent then asks if you want to file them as synthesis pages. Your explorations compound just like ingested sources\n- The wiki is a git repo — version history for free\n- Standalone Python scripts in `tools/` work without a coding agent (require `ANTHROPIC_API_KEY`)\n\n## Tech Stack\n\nNetworkX + Louvain + Claude + vis.js. No server, no database, runs entirely locally. Everything is plain markdown files.\n\n## Related\n\n- [graphify](https://github.com/safishamsi/graphify) — graph-based knowledge extraction skill (inspiration for the graph layer)\n- [Vannevar Bush's Memex (1945)](https://en.wikipedia.org/wiki/Memex) — the original vision this resembles\n\n## License\n\nMIT License — see [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSamurAIGPT%2Fllm-wiki-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSamurAIGPT%2Fllm-wiki-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSamurAIGPT%2Fllm-wiki-agent/lists"}