{"id":49675478,"url":"https://github.com/openclaw/notcrawl","last_synced_at":"2026-05-07T02:01:31.076Z","repository":{"id":353223595,"uuid":"1218415805","full_name":"openclaw/notcrawl","owner":"openclaw","description":"Local-first Notion crawler into SQLite and normalized Markdown","archived":false,"fork":false,"pushed_at":"2026-05-05T09:26:18.000Z","size":454,"stargazers_count":76,"open_issues_count":1,"forks_count":5,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-05T11:23:50.150Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openclaw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["moltbot"]}},"created_at":"2026-04-22T21:19:58.000Z","updated_at":"2026-05-05T09:24:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/openclaw/notcrawl","commit_stats":null,"previous_names":["vincentkoc/notcrawl","openclaw/notcrawl"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/openclaw/notcrawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fnotcrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fnotcrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fnotcrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fnotcrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openclaw","download_url":"https://codeload.github.com/openclaw/notcrawl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openclaw%2Fnotcrawl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32719572,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T00:29:05.620Z","status":"online","status_checked_at":"2026-05-07T02:00:07.170Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-07T02:00:43.169Z","updated_at":"2026-05-07T02:01:31.069Z","avatar_url":"https://github.com/openclaw.png","language":"Go","funding_links":["https://github.com/sponsors/moltbot"],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"docs/notcrawl_banner.jpg\" alt=\"notcrawl banner\"/\u003e\n\n# 🗞️ notcrawl\n\n`notcrawl` mirrors Notion workspace data into local SQLite and normalized\nMarkdown so you can search, query, diff, and share your Notion memory without\ndepending on the Notion UI.\n\nIt has two ingestion paths:\n\n- `desktop`: read-only snapshots of the local Notion desktop cache\n- `api`: official Notion API sync with rate-limit aware crawling\n\nSQLite is the canonical archive. Markdown is the durable human/agent surface.\nGit share mode publishes normalized snapshots that other machines can subscribe\nto without holding Notion credentials.\n\n## Current Scope\n\n- local SQLite storage with FTS5\n- read-only local desktop cache ingestion from macOS Notion\n- official API page/block/user/comment ingestion\n- Notion database metadata and row ingestion through the official API\n- current Notion data-source API support plus legacy database endpoint support\n- normalized Markdown export organized by Unicode-safe workspace, teamspace, and page paths\n- CSV/TSV export for crawled Notion database rows\n- compressed JSONL git-share snapshots plus import/update workflows\n- terminal archive browser for quick local page/database inspection\n- archive status, activity reporting, and SQLite maintenance commands\n- read-only SQL access for ad hoc inspection\n\n## Install\n\n```bash\nbrew tap vincentkoc/tap\nbrew install notcrawl\n```\n\nYou can also download archives, `.deb`, or `.rpm` packages from the\n[latest release](https://github.com/vincentkoc/notcrawl/releases/latest).\n\n## Quick Start\n\nUse the local Notion Desktop cache:\n\n```bash\nnotcrawl init\nnotcrawl doctor\nnotcrawl status\nnotcrawl report\nnotcrawl sync --source desktop\nnotcrawl export-md\nnotcrawl search \"launch plan\"\nnotcrawl tui\n```\n\nOr use the official Notion API:\n\n```bash\nexport NOTION_TOKEN=\"secret_...\"\nnotcrawl sync --source api\nnotcrawl databases\nnotcrawl export-db --database DATABASE_ID --format csv --output roadmap.csv\nnotcrawl export-db --all --dir exports/csv\n```\n\nDefault paths:\n\n- config: `~/.notcrawl/config.toml`\n- database: `~/.notcrawl/notcrawl.db`\n- cache: `~/.notcrawl/cache`\n- Markdown archive: `~/.notcrawl/pages`\n- git share repo: `~/.notcrawl/share`\n\n## Commands\n\n- `init` writes a starter config\n- `doctor` checks config, SQLite, desktop cache, and token presence\n- `status` prints archive counts, last sync time, and database/WAL size\n- `metadata --json`, `status --json`, and `doctor --json` expose crawlkit\n  control/status payloads for launchers, automation, and CI\n- `report` summarizes recent page, database, space, and comment activity\n- `maintain` rebuilds FTS, optimizes SQLite indexes, and can run `VACUUM`\n- `sync` ingests from `desktop`, `api`, or `all`\n- `export-md` renders normalized Markdown files from SQLite\n- `databases` lists crawled Notion databases\n- `export-db` exports one crawled Notion database, or all databases with `--all --dir`, to CSV or TSV\n- `search` searches page and comment text through FTS5\n- `tui` opens the terminal archive browser for pages and databases\n- `sql` runs read-only SQL against the archive\n- `publish` exports SQLite tables and Markdown into a git share repo\n- `subscribe` clones a share repo and imports the latest snapshot\n- `update` pulls and imports a subscribed share repo\n\n## Shared crawlkit surfaces\n\n`notcrawl` uses `crawlkit` for standard config paths, SQLite open/read helpers,\nsnapshot packing/import, git-backed archive sharing, output formatting, status\npayloads, and the shared terminal explorer. Notion API/Desktop parsing,\nMarkdown rendering, page/comment/database schemas, and Notion FTS bodies remain\nowned by `notcrawl`.\n\nThe TUI follows the gitcrawl-style three-pane model: workspace/teamspace/page or\ndatabase groups on the left, pages/databases in the middle, and a readable\ndocument preview plus comments and metadata on the right. It supports pane\nfocus, sortable headers, mouse selection, right-click actions, and a\nlocal/remote footer.\n\n## Distribution\n\nRelease packaging is managed with GoReleaser. Tagged releases build tarballs,\nchecksums, `.deb`, `.rpm`, GitHub release notes, and a Homebrew tap update.\n\nSee [`docs/distribution.md`](docs/distribution.md) for release operations.\n\n## Safety Model\n\nDesktop mode is read-only. It snapshots Notion's local SQLite database before\nreading it and never writes to Notion application storage.\n\nAPI mode uses the official Notion API. It stores raw API payloads alongside\nnormalized rows so renderers can improve without recrawling.\n\nSecrets are never exported into Markdown or git-share snapshots.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenclaw%2Fnotcrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenclaw%2Fnotcrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenclaw%2Fnotcrawl/lists"}