https://github.com/tetra-plg/boiling-brain

BoilingBrain — opinionated LLM Wiki template for personal knowledge bases maintained by domain expert agents (Claude Code)
https://github.com/tetra-plg/boiling-brain

agent-orchestration claude-code knowledge-management llm obsidian template wiki

Last synced: about 2 months ago
JSON representation

BoilingBrain — opinionated LLM Wiki template for personal knowledge bases maintained by domain expert agents (Claude Code)

Host: GitHub
URL: https://github.com/tetra-plg/boiling-brain
Owner: tetra-plg
License: mit
Created: 2026-04-30T06:35:44.000Z (about 2 months ago)
Default Branch: develop
Last Pushed: 2026-05-02T07:56:37.000Z (about 2 months ago)
Last Synced: 2026-05-02T12:28:59.971Z (about 2 months ago)
Topics: agent-orchestration, claude-code, knowledge-management, llm, obsidian, template, wiki
Language: Shell
Size: 163 KB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

README

# BoilingBrain

![status: alpha](https://img.shields.io/badge/status-alpha-yellow) ![license: MIT](https://img.shields.io/badge/license-MIT-blue) ![claude code](https://img.shields.io/badge/built%20for-Claude%20Code-purple)

![BoilingBrain wiki graph — domain clusters auto-emerge from the linking structure](docs/graph.png)

> Bootstrap template for an **LLM Wiki** — a personal knowledge base curated by you and maintained by LLM agents.

## Status

First public release (v1.0). The template works end-to-end and has been used to scaffold real vaults. Future releases will iterate on the bootstrap interview, agent prompts, and domain-deduction heuristics based on community feedback. See [CHANGELOG.md](./CHANGELOG.md) for milestones. Bug reports and generic-improvement PRs welcome — see [CONTRIBUTING.md](./CONTRIBUTING.md).

## What is an LLM Wiki?

A vault where:

- **You** curate sources (notes, transcripts, articles, repo docs) into an immutable `raw/` folder.
- **Domain expert agents** ingest those sources and write the wiki layer (`wiki/sources/`, `wiki/concepts/`, `wiki/entities/`, etc.) — one agent per domain you care about.
- **Slash commands** (`/ingest`, `/query`, `/lint`, `/evolve-agent`, …) orchestrate the agents.
- The wiki is **always derivable from `raw/`** — no orphan knowledge, no hallucinated links.

The pattern follows Karpathy's "knowledge compilation" idea: keep the source of truth compact, hash-addressed and immutable; let LLMs maintain a queryable layer on top.

## How does this differ from Karpathy's LLM Wiki?

Karpathy's LLM Wiki is a **concept**: notes maintained by an LLM, with the LLM filling cross-references and refactoring on demand. BoilingBrain is an **opinionated, runnable implementation** of that concept — with concrete trade-offs you can either accept or fork. Specifically :

| Dimension | Karpathy's sketch | BoilingBrain |
|---|---|---|
| **Source of truth** | Notes in a folder, LLM rewrites freely | `raw/` is immutable, hash-addressed (`source_sha256`); the wiki layer is always derivable |
| **Agent topology** | One LLM doing everything | One **expert agent per domain**, declared at bootstrap, with deliverable signatures (cheatsheets, syntheses, diagrams) and per-domain prompts that evolve over time via `/evolve-agent` |
| **Idempotence** | Manual | `/ingest` is hash-based and idempotent — re-running doesn't duplicate; `--force` re-applies new agent reflexes to existing sources |
| **Multimodal** | Text-only typically | First-class video pipeline (`/ingest-video`): download → transcribe → frame extraction (mode A declarative or mode B image-diff induction) → markdown transcription of visuals (tables, Mermaid, LaTeX) so queries don't re-OCR each time |
| **Queries at scale** | Load and read | **Tiered loading**: every page carries `summary_l0` (≤140 chars) + `summary_l1` (~50-150 words). Agents scan a domain via TOC L0, descend to L1 then L2 only when relevant — sublinear in body size |
| **External code/docs** | Out of scope | `/sync-repos` snapshots GitHub repos by SHA into `raw/tracked-repos//`, immutable, never overwritten |
| **Self-improvement** | Implicit, ad-hoc | Explicit human-curated loop: each agent appends `Evolution suggestions` per ingest, `/evolve-agent ` reviews + applies the diff, archives integrated suggestions |
| **Architectural decisions** | Mixed into notes | ADR-lite in `wiki/decisions/` with a fixed structure (Problem → Options → Decision → Why → Open questions) |

Said otherwise: Karpathy says "let LLMs maintain a wiki." BoilingBrain says "*here's what the wiki layout, the agent contract, the ingest protocol and the evolution loop need to look like for that to actually scale past a few weeks of use.*" The opinions come from real usage — fork them if they don't fit.

## Prerequisites

- **[Claude Code](https://claude.com/claude-code)** — the CLI agent that drives the interview and the wiki workflows. The template is built around its slash-commands and `AskUserQuestion` tool.
- **[Obsidian](https://obsidian.md/)** — the markdown editor where your vault lives day-to-day. Wiki pages use `[[wikilinks]]` and the bootstrap generates an `.obsidian/` config (graph filter + per-domain colors) so the graph view shows your `wiki/` clean of `raw/` and `cache/` noise.
- **`gh` CLI** (optional) — only needed if you want to clone via `gh repo clone` and create a remote vault repo automatically at the end of the interview.

After bootstrap, **open the cloned folder in Obsidian** ("Open folder as vault") and switch to the **graph view** (icon in the left ribbon, or `Ctrl/Cmd+G`). You'll see your domains laid out by color — that's the auto-generated `.obsidian/graph.json` doing its job.

## Quick start

```bash
gh repo clone tetra-plg/boiling-brain ~/my-vault
cd ~/my-vault
claude
```

Then in Claude Code:

```
Read BOOTSTRAP.md and run the prompt.
```

*Works in any language — the bootstrap interview adapts to your phrasing.*

The interview takes 5-10 minutes. At the end your vault is personalized, the template's git history is reset, and you can run your first `/ingest`. Then open `~/my-vault` in Obsidian and check the graph view.

Detailed flow → [How to bootstrap](#how-to-bootstrap) below.

## Why a template?

This repo is the **scaffolding**, not a usable instance. It contains:

- Generic slash commands (`/ingest`, `/ingest-video`, `/query`, `/save`, `/lint`, `/evolve-agent`, optional `/sync-repos`).
- Generic scripts (`scan-raw.sh`, `transcribe.sh`, `sample-frames.sh`, `diff-frames.py`, `extract-frames.sh`, `backfill-summaries.py`, `enrich-hub.py`, optional `sync-repos.sh`).
- Templated files (`*.tpl`) with `{{placeholder}}` markers that need to be filled with **your** name, role, domains, and projects.
- A unified `domain-expert.md.tpl` that gets instantiated **once per domain** you declare at bootstrap time, with domain-specific deliverables, observation triggers and frame-visual formats.

## Do NOT push on this repo directly

This template is not meant to be cloned and edited by hand. Each placeholder has dependencies on others (e.g. domain count drives how many agent files exist; the hub-pivot domain gets a special marker; `/sync-repos` is included only if you declare GitHub repos to track).

A **bootstrap prompt** (`BOOTSTRAP.md`, shipped at the root of this repo) walks you through a structured interview, generates the resolved files, and writes them to your machine.

Among other questions, the bootstrap asks whether you want to track GitHub repos (i.e. snapshot their docs by SHA into `raw/`); if yes, `/sync-repos` is included. Otherwise it's removed.

At the end of the interview, the `.git` will be removed or replace by your own git remote.

## How to bootstrap

See [Quick start](#quick-start) above for the commands. A few details on what happens during and after the bootstrap:

- `BOOTSTRAP.md` and `PLACEHOLDERS.md` are moved into `wiki/decisions/` as ADR traces — you can re-read them later to understand how your vault was generated.
- The template's `.git/` is replaced by a fresh history (clean start, no leftover template commits).
- An optional GitHub remote is created via `gh repo create` if you confirm during the interview.

If you really want to clone manually and skip the bootstrap, read every `*.tpl` file, fill placeholders by hand, and rename to drop `.tpl`. You will probably forget the cross-file consistency checks the bootstrap prompt enforces.

## Architecture in 60 seconds

```
.claude/
agents/ # one -expert.md per domain you declared
agent-memory/ # one MEMORY.md per agent (state of the domain)
commands/ # slash commands (generic, no per-instance customization)
rules/ # transverse conventions auto-loaded by Claude Code via `paths` frontmatter
template-version # which template version this vault is aligned with (used by /update-vault)

raw/ # IMMUTABLE source files. Never modified after first write.
cache/ # transient artifacts (downloaded videos, audio, frames). gitignored.

wiki/ # the LLM-maintained layer.
index.md # minimal portal: overview, radar, log, domains
log.md # chronological journal
radar.md # open questions and points of attention
overview.md # your portrait
domains/ # one hub per domain
sources/ entities/ concepts/ syntheses/ cheatsheets/ diagrams/ decisions/

scripts/ # utilities (transcription, frame extraction, image-diff, summary backfill, hub enrichment, ...)

CLAUDE.md # project-wide LLM instructions (slim ≤200 lines, points to .claude/rules/ and .claude/commands/)
tracked-repos.config.json # OPTIONAL: list of GitHub repos to snapshot via /sync-repos
```

## `.claude/rules/` — transverse conventions

Conventions that apply across the wiki (frontmatter format, slug rules, raw immutability) live in `.claude/rules/*.md`. Each rule has a `paths:` field in its frontmatter — Claude Code auto-loads the rule when the current working file matches one of those paths. This mirrors the [Anthropic-recommended pattern](https://www.anthropic.com/) (Boris Cherny, "CLAUDE.md best practices", 24 March 2026).

Three rules ship by default:

- `frontmatter.md` (paths: `wiki/sources/**`, `wiki/concepts/**`, `wiki/syntheses/**`, `wiki/decisions/**`, `wiki/entities/**`, `wiki/cheatsheets/**`, `wiki/diagrams/**`, `wiki/domains/**`) — frontmatter YAML rules, including the hard rule that `source_sha256` must always be computed via `shasum -a 256 ` (never a placeholder).
- `pages-wiki.md` (paths: `wiki/**`) — kebab-case slugs, `[[wikilinks]]`, `[!warning]` / `[!question]` callouts, page sizing.
- `raw-vs-cache.md` (paths: `raw/**`, `cache/**`) — strict immutability of `raw/`, transient nature of `cache/`.

`.claude/rules/` is **upstream-tracked** — `/update-vault` propagates new and updated rules to existing vaults automatically.

## Tiered loading

Every wiki page carries two extra frontmatter fields:

- `summary_l0` — single line, ≤140 chars. Telegraphic. Used as TOC entry when an agent scans an entire domain.
- `summary_l1` — 2-5 sentences (~50-150 words). Used when the agent decides whether to load the full body.

This lets agents (and you, via `/query`) navigate the wiki without paying the full body cost on every page they consider.

## Slash commands shipped

| Command | Purpose |
|---|---|
| `/ingest [path]` | Batch idempotent ingestion of files from `raw/` via domain experts. Hash-based, no duplicates. |
| `/ingest-video ` | Pipeline: download → transcribe → ingest transcript → propose frame extraction (mode A / B / skip). |
| `/query ` | Answer from indexed pages with citations; optionally archive the synthesis. |
| `/save ` | Archive the current synthesis into `wiki/syntheses/.md`. |
| `/lint [domain]` | Detect contradictions, orphans, missing cross-references, gaps. |
| `/evolve-agent ` | Curated update to a domain expert's prompt, fed by accumulated `.suggestions.md`. |
| `/sync-repos [names]` *(optional)* | Snapshot GitHub repos by SHA into `raw/tracked-repos/` (or any `dest` declared per source). |
| `/update-vault` | Cherry-pick improvements from the upstream template into your vault instance (versioned migration machine). |
| `/create-issue [type]` | Sanitize a draft and open an issue on the upstream template repo (auto-strips wikilinks, vault-specific slugs, private paths). Always validated by you before `gh issue create`. |

## Workflow loop

1. Drop a source into `raw/` (note, transcript, PDF, repo doc snapshot).
2. Run `/ingest`. Main context proposes a domain expert; you confirm via `AskUserQuestion`. Agent writes `wiki/sources/`, `wiki/concepts/`, `wiki/entities/`, optionally cheatsheets / syntheses / diagrams. Open questions land in `wiki/radar.md`.
3. Tomorrow morning, ask "show me the radar" — Claude reads `radar.md` and the accumulated `.suggestions.md` of each agent, proposes the day's priorities.
4. After a few ingestions in a domain, run `/evolve-agent ` to fold accumulated suggestions back into the expert's prompt.
5. Use `/query` whenever you need to answer something from the corpus. Substantial answers can be archived via `/save`.

## Who is this for?

**For you if:**

- You're a **solo dev, researcher, PM or writer** building a personal knowledge base across 3-6 domains.
- You already use **Claude Code** (CLI, desktop app, IDE extension, or claude.ai/code) and **Obsidian** — or want to.
- You want a **thinking partner** that grounds its answers in *your* sources (with citations), not a chatbot guessing from training data.
- You'd rather **have an opinion than a blank page**: hash-keyed `raw/`, one expert agent per domain, mandatory frontmatter — these constraints feel like a feature, not a friction.
- You want a vault that **scales past a few weeks** without falling into the "200 unlinked notes" trap.
- You enjoy **owning your wiki layer** (the LLM writes it, you curate the prompts and the diff).

**Probably not for you if:**

- You want a **team or shared wiki** — BoilingBrain is single-tenant by design (one agent set per vault).
- You want a **one-click hosted product** (Mem, NotebookLM, ChatGPT projects) where you upload notes and the storage is abstracted away — here the wiki is yours, plain markdown, you read and edit it directly.
- You're expecting a **vector embeddings / RAG** retrieval layer — there's none, on purpose. For a personal knowledge base at this scale, hash-keyed sources + LLM-maintained wikilinks are simpler and effective; full RAG is overkill.
- You don't use Claude Code (Codex, Cursor, Aider support is on the roadmap — contributions welcome, see open issues).
- You want a **no-code, no-config** experience — every action goes through a slash command and you'll edit YAML frontmatter.

## FAQ

### Do I need to know which domains I want before bootstrapping?

Roughly. The bootstrap interview will ask you to list 3-6 domains with a one-line description each, mark one as "hub pivot" if you have a domain that feeds the others, and indicate per-domain deliverables (cheatsheets? diagrams? syntheses?). You can add domains later by writing a new agent file by hand — it's just a copy of an existing one with the placeholders filled differently.

### Why one agent per domain instead of a single generic agent?

Because each domain has different observation triggers, deliverable signatures, and visual frame formats. A poker source needs a 13×13 range grid; an astrophysics source needs LaTeX equations and EXIF metadata; a management source has confidentiality concerns. A single prompt that tries to cover all of these dilutes the signal. The unified `domain-expert.md.tpl` solves this by being templated rather than generic at runtime.

### What about the `/evolve-agent` loop — is it dangerous?

No. `/evolve-agent ` reads the suggestions accumulated by an expert across ingestions, proposes a **diff**, and waits for your approval before writing. The previous prompt is preserved (suggestions integrated are archived in a separate file). It's an **explicit, human-curated** improvement loop, not silent self-modification.

### Can I keep my `raw/` folder separate from this repo?

Recommended actually — `raw/` is gitignored by default (see `.gitignore`). Your sources are personal; commit only the `wiki/` layer if at all. Many users keep the whole vault in a private repo separate from the template clone.

### How do I handle large videos?

The `LLMWIKI_VIDEO_CACHE` environment variable points to a directory where video files are stored. Default is `cache/videos/` (in-vault). Override to an external SSD or a dedicated cache directory if you process many or large videos. The wiki itself never references `cache/` — only `raw/transcripts/` and `raw/frames/` are referenced.

### How do I add my own domain-specific tools?

Drop them in `.claude/commands/` (slash-commands) or `scripts/` (utilities). They live in your instance and never need to be merged back into the template.

**Concrete example:** in one of the early instances of this template, the user added `/extract-range-grid` — a poker-specific OCR pipeline for range matrices — alongside the core pipeline. The slash-command (`.claude/commands/extract-range-grid.md`) and the Python script (`scripts/extract-range-grid.py`) live in their vault, not here. Yours can do the same for whatever your domains need: a LaTeX renderer, a k8s manifest validator, a financial-statement parser — anything that's a natural extension of one of your domains.

### How do I clip a web page into the vault?

The recommended flow uses the **[Obsidian Web Clipper](https://obsidian.md/clipper)** browser extension (Chrome/Firefox/Safari):

1. **Clip** — click the extension on any article, Reddit post, or documentation page. It saves a markdown file (with YAML frontmatter: `title`, `source`, `published`, `author`, `tags`) into your vault's `Clippings/` folder by default.
2. **Move to `raw/`** — drag the file from `Clippings/` into the right sub-folder of `raw/`:
- `raw/notes/` for personal takes, Reddit threads, short reads
- `raw/articles/` for long-form articles and documentation pages (create the folder if it doesn't exist yet)
- `raw/pdfs/` if you clipped a PDF-backed page (keep the PDF there too)
3. **Ingest** — run `/ingest raw/notes/your-clip.md` (or just `/ingest` to sweep all new files). The main context scans the file, proposes a domain expert, and you confirm before the agent writes the wiki pages.

The `Clippings/` folder is gitignored — it's a staging area, not part of the vault. `raw/` is your immutable archive once a file lands there.

> **Tip:** if you use multiple devices, keep `Clippings/` synced via Obsidian Sync or iCloud, then move files to `raw/` only when you're at your main workstation where Claude Code runs.

### What about secrets / credentials?

Don't commit them. Use the `.gitignore` (already excludes `cache/`, `raw/`, `Clippings/`). For repo-syncing via `/sync-repos`, authentication relies on `gh auth login` — no tokens stored in the vault.

## License

MIT. See `LICENSE` (added by GitHub when the repo was created).

## Contributing

This template is opinionated. The opinions come from real usage. If you have a generic improvement (better hub structure, better tiered-loading default, a missing slash command), feel free to open an issue or PR. Domain-specific tooling — keep it in your own instance and document it in your README.

---

*Generated as part of the Phase 2 scaffolding of the LLM Wiki bootstrap.*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tetra-plg/boiling-brain

Awesome Lists containing this project

README