{"id":50053578,"url":"https://github.com/notoriouslab/vault-curate","last_synced_at":"2026-06-02T06:00:28.086Z","repository":{"id":349140625,"uuid":"1201212574","full_name":"notoriouslab/vault-curate","owner":"notoriouslab","description":"Hybrid semantic search and AI curation for your vault. Combines BM25 keyword, on-device WebGPU embeddings, and fuzzy title matching. Multilingual, with particularly strong Chinese/CJK support. Opt-in AI features auto-generate note descriptions and topic-grouped Maps of Content. Local-first, no API keys.","archived":false,"fork":false,"pushed_at":"2026-05-21T09:35:38.000Z","size":2829,"stargazers_count":90,"open_issues_count":0,"forks_count":17,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-21T16:56:38.461Z","etag":null,"topics":["bge","chinese-nlp","chinese-search","embeddings","hybrid-search","obsidian","obsidian-md","obsidian-plugin","ollama","pkm","rag","semantic-search","traditional-chinese","transformers-js","webgpu"],"latest_commit_sha":null,"homepage":"https://community.obsidian.md/plugins/vault-curate","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/notoriouslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-04T11:14:03.000Z","updated_at":"2026-05-21T09:35:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/notoriouslab/vault-curate","commit_stats":null,"previous_names":["notoriouslab/vault-search","notoriouslab/vault-curate"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/notoriouslab/vault-curate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fvault-curate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fvault-curate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fvault-curate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fvault-curate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/notoriouslab","download_url":"https://codeload.github.com/notoriouslab/vault-curate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/notoriouslab%2Fvault-curate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33808702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bge","chinese-nlp","chinese-search","embeddings","hybrid-search","obsidian","obsidian-md","obsidian-plugin","ollama","pkm","rag","semantic-search","traditional-chinese","transformers-js","webgpu"],"created_at":"2026-05-21T11:06:50.253Z","updated_at":"2026-06-02T06:00:28.074Z","avatar_url":"https://github.com/notoriouslab.png","language":"TypeScript","funding_links":[],"categories":["原生中文插件，欢迎支持"],"sub_categories":["链接与知识管理"],"readme":"\u003cdiv align=\"center\"\u003e\n\n# Vault Curate\n\n[![Release](https://img.shields.io/github/v/release/notoriouslab/vault-curate?style=flat-square)](https://github.com/notoriouslab/vault-curate/releases)\n[![License](https://img.shields.io/github/license/notoriouslab/vault-curate?style=flat-square)](LICENSE)\n[![Obsidian Desktop](https://img.shields.io/badge/Obsidian-Desktop-7C3AED?style=flat-square\u0026logo=obsidian)](https://obsidian.md/)\n[![WebGPU Accelerated](https://img.shields.io/badge/WebGPU-Accelerated-FF6A00?style=flat-square)]()\n[![Ollama Optional](https://img.shields.io/badge/Ollama-Optional-000?style=flat-square)](https://ollama.com/)\n[![Last Commit](https://img.shields.io/github/last-commit/notoriouslab/vault-curate?style=flat-square)](https://github.com/notoriouslab/vault-curate)\n\n**High-quality Chinese-friendly semantic search for Obsidian, with optional AI curation.**\n\nTraditional Chinese · Simplified Chinese · CJK · local-first · hybrid retrieval (BM25 + embeddings + fuzzy) · WebGPU on-device · no API keys\n\n[繁體中文](./README.zh-TW.md)\n\n![Vault Curate](./docs/intro.jpg)\n\n\u003c/div\u003e\n\n---\n\n\u003e ⓘ **Previously published as `vault-search`** (plugin id and repository renamed). A different plugin authored by a separate developer now occupies the `vault-search` id — see the [Upgrading from vault-search](#upgrading-from-vault-search) section below before installing if you used earlier versions.\n\n## Why Vault Curate?\n\nObsidian's built-in search is literal: think \"prayer\" but your note says \"devotional\" and you'll miss it. Most semantic-search plugins use generic multilingual models, which tend to under-perform on Chinese content.\n\n[Andrej Karpathy shared](https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an/) his vision of LLM-maintained knowledge bases — letting AI \"compile\" your notes into structured wikis. Compelling, but it asks you to hand over full editorial control. **Vault Curate takes a different stance: AI should help you *see*, not think for you.**\n\n### Three differentiators\n\n| Feature | How it works |\n|---|---|\n| **Chinese semantic quality beats generic multilingual models** | Ships with `bge-small-zh-v1.5` (Chinese-only training). In head-to-head testing on Chinese names, religious terms, and colloquial phrases, generic MiniLM-style multilingual models miss most of the matches; Vault Curate consistently recalls the right notes. |\n| **Zero-config to run, WebGPU accelerated** | ~110 MB model downloads once. WebGPU indexing: 342 notes / 5,004 chunks in about **1m23s** (WASM fallback still works, around 27 minutes). |\n| **AI curation is opt-in, never silent** | Description generation, MOC clustering, and frontmatter rewrites all require explicit opt-in. Nothing runs LLMs in the background and nothing rewrites your notes without you asking. |\n\n---\n\n## Quick Start\n\n1. In Obsidian, go to **Settings → Community plugins** and search for **Vault Curate**\n2. After enabling, the **Welcome to Vault Curate** modal opens. Under **Embedding provider**, pick **Built-in (on-device, WebGPU)** and click **Index my vault now**\n3. After the ~110 MB model download and WebGPU indexing finish, click the sidebar compass icon and start searching\n\n\u003e ⚠️ Vault Curate is currently going through Obsidian's community review. If you can't find it in Community plugins yet, use the [Manual install](#manual-install) path below.\n\n---\n\n## Installation\n\n**Requirements**\n- [Obsidian](https://obsidian.md/) desktop (v1.0.0+)\n- Advanced paths only: a local [Ollama](https://ollama.com/) instance or any OpenAI-compatible server\n\n### From Community plugins (recommended)\n\n1. Open **Settings → Community plugins** in Obsidian\n2. Make sure **Restricted mode** is off, click **Community plugins → Browse**\n3. Search **Vault Curate** → **Install** → **Enable**\n4. The **Welcome to Vault Curate** modal opens automatically on first launch\n\n### Manual install\n\n1. Download `main.js`, `manifest.json`, `styles.css`, `worker.js`, `ort-wasm-simd-threaded.wasm` from [Releases](https://github.com/notoriouslab/vault-curate/releases)\n2. Copy them into `.obsidian/plugins/vault-curate/` in your vault\n3. Enable in **Settings → Community plugins**\n\n\u003e **Tip:** If your vault is Git-tracked, add `.obsidian/plugins/*/data.json` and `.obsidian/plugins/*/index.sqlite` to `.gitignore`.\n\n---\n\n## Upgrading from vault-search\n\nIf you used the earlier `vault-search` plugin, follow this path:\n\n1. **Open your vault folder** and locate `.obsidian/plugins/vault-search/`\n2. **Delete that folder directly** from the filesystem. ⚠️ Do *not* use Community plugins → Uninstall — a different plugin now occupies the `vault-search` id and may insert itself when you uninstall.\n3. Install Vault Curate via the [Installation](#installation) steps above\n4. **Enable**. The **Welcome to Vault Curate** modal will guide you through rebuilding the index.\n\nEmbeddings are not reused across versions — a from-scratch rebuild takes ~1–2 minutes on WebGPU for a few hundred notes. Frontmatter descriptions and tags already in your notes are preserved (they live in the `.md` files, not in the index).\n\nIf you had keybindings set on `vault-search:*` commands, redo them under `vault-curate:*` in **Settings → Hotkeys** (9 commands total — see [Commands](#commands)).\n\n---\n\n## Features\n\n### Search (Hybrid Fusion)\n\nThree signals combined via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) (k=60):\n\n| Path | Catches |\n|---|---|\n| **BM25** (pure TS, CJK trigram) | Exact phrases, keyword combinations |\n| **Semantic embedding** | Different wording, same meaning |\n| **Fuzzy title** (Jaro–Winkler) | Typos, spelling variants |\n\nTwo entry points:\n\n- Cmd/Ctrl+P → `Vault Curate: Semantic search (modal)` for quick jump\n- Sidebar → **Search** tab for persistent results\n\n![Search results + Canvas drag](./docs/search-canvas.png)\n\n### Discover\n\nDiscover works on **notes**, not query strings — it surfaces semantically related **Cold notes** you haven't touched recently:\n\n- **Current note**: when you open a file, related notes appear automatically, with Cold notes visually highlighted (\"you haven't read this one\")\n- **Global**: Cold notes most related to your entire Hot pool — intentional blind-spot mining\n- Results can be exported to a topic-grouped Map of Content via **Generate MOC** (falls back to a flat MOC if results are too few or too similar)\n\n![Discover sidebar — current note](./docs/discover-current-note.png)\n\n### Hot / Cold auto-tiering\n\nNotes are auto-classified by **internal links + recency**:\n\n- **Hot**: linked to / recently touched\n- **Cold**: orphan / untouched for a while\n\nThe \"recent\" cutoff is tunable in **Settings → Advanced → Hot window (days)**. Cold notes don't get buried in Discover — they're exactly the content you should be re-seeing.\n\n### Find similar notes\n\nRight-click any `.md` → **VC: Find similar notes** → results show up in the sidebar; you can drag them straight to Canvas.\n\n### AI curation (off by default)\n\nTurn it on under **Settings → AI Curation → Enable AI curation** to unlock three actions:\n\n- Generate a description + tags into a single note's frontmatter\n- Run description generation across the sidebar's search / discover results\n- Generate a **topic-grouped MOC** via HDBSCAN clustering + LLM naming\n\nThe LLM provider is configured separately under **Settings → AI Curation** (local Ollama or any OpenAI-compatible endpoint).\n\n---\n\n## Commands\n\nFrom Command Palette (Cmd/Ctrl+P), type `Vault Curate:` to see them all.\n\n| Command | What it does | Requires |\n|---|---|---|\n| `Semantic search (modal)` | Modal-style semantic search with quick jump | always available |\n| `Open search panel` | Open the sidebar panel | always available |\n| `Find similar notes` | Find semantically related notes to the active `.md` | always available |\n| `Rebuild index` | Wipe the existing index and re-index everything | always available |\n| `Update index` | Incremental update (re-index files with newer mtime) | always available |\n| `Discover related Cold notes` | Global discover: Cold notes most related to your Hot pool | always available |\n| `Generate description for active note` | LLM-write description + tags to the active file's frontmatter | AI curation on |\n| `Generate descriptions for current results` | Batch description for the current sidebar results | AI curation on |\n| `Generate MOC (topic-grouped)` | HDBSCAN cluster + LLM-name each group | AI curation on |\n\nRight-click menus expose two of these directly on a `.md`:\n\n- **VC: Find similar notes**\n- **VC: Generate description** (AI curation on)\n\n---\n\n## Settings\n\nThe settings panel is split into three sections:\n\n### Quick setup\n\n| Setting | Default | Note |\n|---|---|---|\n| Embedding provider | Built-in (on-device, WebGPU) | One of three: Built-in / Ollama / OpenAI-compatible |\n| Excluded folders | (empty) | Folder globs that won't be indexed |\n\nChanging the embedding provider or model triggers a confirmation modal — the index is wiped and rebuilt.\n\n### AI Curation\n\n| Setting | Default | Note |\n|---|---|---|\n| Enable AI curation | off | When off, description / MOC commands stay hidden |\n| LLM provider | Ollama | The endpoint used for description + MOC naming |\n| LLM model | qwen3:1.7b | Recommended default; any Ollama model works |\n\n### Advanced\n\nCollapsible `\u003cdetails\u003e` block: top results / min score / Hot window (days) / default search scope (Hot / Cold / All) / chunk size + overlap / synonym list / auto-index toggle / rebuild + update buttons / index stats.\n\n---\n\n## Privacy\n\nThree embedding modes, picked from **Quick setup → Embedding provider**:\n\n| Mode | Where embeddings run | Where note text goes |\n|---|---|---|\n| **Built-in** | On-device WebGPU / WASM | Stays on your device |\n| **Ollama (local daemon)** | Local Ollama daemon on 127.0.0.1 | Stays on your device |\n| **OpenAI-compatible API** | Any endpoint you point it at — could be local (LM Studio, llama.cpp, …) **or** remote (OpenAI etc.) | Depends on the endpoint you choose; may leave your device |\n\nThe same applies to AI curation (description / MOC naming), which uses an independently-configured LLM endpoint.\n\n**No telemetry. No usage tracking. Nothing is sent to any server unless you configure a remote endpoint.**\n\n### Audit disclosures\n\nThe Obsidian Developer Dashboard's automated audit may flag the following items on this plugin. They are intentional and disclosed here for transparency:\n\n- **Vault enumeration** (`vault.getMarkdownFiles()`): The indexer needs to walk the full list of markdown files in your vault to build the semantic embedding index. The `excludePatterns` setting (Settings → Advanced) lets you scope this — e.g. excluding `_templates/`, `.trash/`, or any folder you don't want indexed. No file is read until it's in the included set.\n- **Dynamic code execution** (`new Function` in bundled `@huggingface/transformers`): The Hugging Face Transformers library uses `new Function` internally to create type-safe method dispatchers during model loading. Vault Curate's own source code contains **zero** `eval()` or `new Function()`. We bundle the upstream library as-is to avoid divergence; the dynamic dispatch happens only inside the embedding model's tokenizer/inference setup, not on any vault content.\n- **Direct filesystem access**: The bundled `sql.js` ships an Emscripten output with a Node.js fallback path that imports `node:fs` / `node:crypto`. These branches are dead code in Obsidian's renderer process (gated by `process.type !== \"renderer\"`). As of v1.0.3, the esbuild config strips those `require()` strings from the released bundle so the audit no longer sees them.\n\n### 🔒 About API key storage\n\nVault Curate, like every Obsidian plugin, stores its settings (including any OpenAI API key) as plain text in `\u003cvault\u003e/.obsidian/plugins/vault-curate/data.json`. This is Obsidian's plugin storage mechanism, not a vault-curate-specific design choice.\n\nIf your vault syncs to a cloud service (iCloud / Dropbox / Google Drive) or pushes to a public Git repository, you should:\n\n1. Add `.obsidian/plugins/vault-curate/data.json` to your sync exclusion list or `.gitignore`\n2. Or use the **Built-in** model / **Ollama** path — neither requires an API key\n\n---\n\n## Tech Stack\n\n- **TypeScript** + **esbuild** (two-stage bundle for worker + main)\n- **sql.js** (SQLite via WASM) for the storage layer — replaces v0.x's `data.json` / `index.json`\n- **Pure-TS BM25+** (`src/storage/bm25.ts`) for CJK-aware full-text search (no native FTS5 dependency)\n- **`@huggingface/transformers`** + **`bge-small-zh-v1.5` q8** (~110 MB, WebGPU/WASM) for on-device embeddings\n- **`hdbscan-ts`** for topic clustering (MOC)\n- **Reciprocal Rank Fusion** (k=60) combining BM25 + semantic + fuzzy\n- **Optional**: [Ollama](https://ollama.com/) / any OpenAI-compatible endpoint for higher-end embedding or LLM models\n\n---\n\n## Development\n\n```bash\ngit clone https://github.com/notoriouslab/vault-curate.git\ncd vault-curate\nnpm install\nnpm run dev    # watch mode\nnpm run build  # production build\nnpm test       # vitest unit tests (59 tests)\n```\n\n---\n\n## License\n\n[MIT](./LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotoriouslab%2Fvault-curate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnotoriouslab%2Fvault-curate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnotoriouslab%2Fvault-curate/lists"}