{"id":51193941,"url":"https://github.com/bokuweb/lawrenceanum","last_synced_at":"2026-06-27T18:03:40.910Z","repository":{"id":356732060,"uuid":"1232109538","full_name":"bokuweb/lawrenceanum","owner":"bokuweb","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-26T17:25:09.000Z","size":2078,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-26T19:14:57.267Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://bokuweb.github.io/lawrenceanum/","language":"Rust","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bokuweb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-07T15:43:00.000Z","updated_at":"2026-06-26T17:25:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bokuweb/lawrenceanum","commit_stats":null,"previous_names":["bokuweb/lawrenceanum"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bokuweb/lawrenceanum","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bokuweb%2Flawrenceanum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bokuweb%2Flawrenceanum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bokuweb%2Flawrenceanum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bokuweb%2Flawrenceanum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bokuweb","download_url":"https://codeload.github.com/bokuweb/lawrenceanum/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bokuweb%2Flawrenceanum/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34862646,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-27T02:00:06.362Z","response_time":126,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-27T18:03:39.814Z","updated_at":"2026-06-27T18:03:40.898Z","avatar_url":"https://github.com/bokuweb.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lawrenceanum\n\nA static-hosted JSON API + WASM-SQLite-powered SPA for Japanese statute data\n(法令), built on top of e-Gov 法令API. GitHub Actions periodically pulls the\nupstream data, the Rust CLI (`lawpub`) normalizes it into stable JSON, and the\nresult is served from GitHub Pages.\n\nDetailed design: [docs/plan.md](docs/plan.md).\n\n## What you get\n\n- Static JSON API at `https://\u003cowner\u003e.github.io/\u003crepo\u003e/...`\n  - `index.json`, `manifest.json`, `health.json`\n  - `laws/index.json`, `laws/{law_id}/{current,versions,timeline}.json`\n  - `laws/{law_id}/revisions/{rev_id}.json`, `laws/{law_id}/articles/{art_id}.json`\n  - `updates/latest.json`, `updates/{YYYY-MM-DD}.json`\n  - `kanpo/{YYYY-MM-DD}/index.json`\n  - `sitemap.xml`, `robots.txt`, `laws/all.ndjson`\n- A React SPA at the same origin that consumes those JSON files\n  - HashRouter, deep-linkable to any law / article / version\n  - Browser-side full-text search via **WASM SQLite (sql.js) + FTS5**\n  - Cross-reference graph: `第○条` → article links, backlinks panel,\n    cross-law jumps for `民法第七百九条` style references\n\n## Local quickstart\n\n```bash\ncargo build --release -p lawpub-cli\n./target/release/lawpub update --public public --cache .cache --provider mock\n./target/release/lawpub validate --public public\n\n# Serve the SPA on top\ncd figma \u0026\u0026 pnpm install\npnpm dev          # http://localhost:5173/   (a custom Vite middleware\n                  # serves ../public/*.json so the SPA reads live JSON)\n# or production build:\npnpm build        # writes index.html + assets/ next to the JSON\n```\n\nGenerated files under `public/`:\n\n```\npublic/\n├── index.json / manifest.json / health.json / sitemap.xml / robots.txt\n├── laws/\n│   ├── index.json\n│   ├── all.ndjson\n│   └── {law_id}/\n│       ├── current.json\n│       ├── versions.json\n│       ├── timeline.json\n│       ├── revisions/{rev_id}.json\n│       └── articles/{art_id}.json\n├── updates/{latest.json,{YYYY-MM-DD}.json}\n├── kanpo/{YYYY-MM-DD}/index.json\n├── schema/{law-document,manifest,updates}.json\n├── search.db                                  # SQLite + FTS5\n├── index.html / assets/                       # SPA build output\nstate/latest.json                              # cron-managed pointer\n```\n\n## CLI surface\n\n```text\nlawpub update         --public public --cache .cache [--provider http|mock] [--date YYYY-MM-DD] [--force]\nlawpub fetch-update   --date YYYY-MM-DD --cache .cache\nlawpub fetch-range    --from YYYY-MM-DD --to YYYY-MM-DD --cache .cache [--provider http|mock]\nlawpub fetch-bulk     --category N [--limit M] --cache .cache [--provider http|mock]\nlawpub build-json     --input .cache --output public\nlawpub build-index    --output public\nlawpub kanpo-fetch    --date YYYY-MM-DD --cache .cache\nlawpub kanpo-link     --output public\nlawpub validate       --public public\nlawpub status         --public public --cache .cache\n```\n\nThe provider defaults to `http` and uses `https://laws.e-gov.go.jp/api/1` (v1\nAPI; v2 has a different path scheme — `/api/2/laws`, `/api/2/law_data/{id}`).\nOverride with `LAWPUB_PROVIDER` and `LAWPUB_EGOV_BASE_URL`.\n\n## Workspace layout\n\n| crate | purpose |\n|---|---|\n| `crates/egov-client`     | e-Gov fetcher (`HttpProvider`, `MockProvider`) |\n| `crates/law-normalizer`  | LawXML → normalized `LawDocument` |\n| `crates/kanpo-client`    | 官報 site scraper (Phase 3, mock for now) |\n| `crates/kanpo-linker`    | amendment ↔ 官報 PDF matching with confidence score |\n| `crates/search-index`    | bigram tokenizer + SQLite FTS5 builder + ref-graph extractor |\n| `crates/lawpub-cli`      | the `lawpub` binary |\n\n## Browser search (WASM SQLite + FTS5 over Cloudflare R2)\n\n`lawpub` emits `public/search.db` (SQLite + FTS5, ~1.5 GB at full bulk) at\nbuild time. The SPA reads it through **sql.js-httpvfs** (sqlite.org's\nEmscripten WASM build + an HTTP-Range VFS). Each query downloads only the\nSQLite **pages** (4 KB) it needs — typically 100-300 KB / query — so the\n1.5 GB DB stays remote.\n\nHosting options:\n\n| Option | search.db location | When to use |\n|---|---|---|\n| **GitHub Pages only (default)** | `public/search.db` (same origin) | OK for tiny demos (\u003c50 MB), hard limit 100 MB git |\n| **Cloudflare R2 (recommended)**  | `https://\u003cr2-pub\u003e/search.db` via `VITE_SEARCH_DB_URL` | Production / full bulk. R2 free tier (10 GB storage + free egress) covers personal use indefinitely |\n| Turso / D1                       | Their HTTP API | Only if edge-replicated reads matter |\n\n### R2 setup (one-time)\n\n1. Sign up for Cloudflare (free). R2 dashboard → **Create bucket** (e.g.\n   `lawrenceanum`).\n2. Bucket settings → **Public access** → enable \"r2.dev subdomain\". Note the\n   public URL `https://pub-\u003chash\u003e.r2.dev`.\n3. Bucket settings → **CORS policy** → allow your Pages origin:\n\n   ```json\n   [\n     {\n       \"AllowedOrigins\": [\"https://\u003cowner\u003e.github.io\"],\n       \"AllowedMethods\": [\"GET\"],\n       \"AllowedHeaders\": [\"range\", \"if-match\", \"if-none-match\"],\n       \"ExposeHeaders\": [\"content-length\", \"content-range\", \"etag\"],\n       \"MaxAgeSeconds\": 86400\n     }\n   ]\n   ```\n\n4. R2 → **Manage R2 API tokens** → create token with **Object Read \u0026 Write**\n   on that single bucket.\n5. GitHub repo → Settings → Secrets and variables → Actions → add:\n\n   | Secret | Example |\n   |---|---|\n   | `R2_ACCOUNT_ID`       | your account id |\n   | `R2_ACCESS_KEY_ID`    | from step 4 |\n   | `R2_SECRET_ACCESS_KEY`| from step 4 |\n   | `R2_BUCKET`           | `lawrenceanum` |\n   | `R2_ENDPOINT`         | `https://\u003cR2_ACCOUNT_ID\u003e.r2.cloudflarestorage.com` |\n   | `R2_PUBLIC_URL`       | `https://pub-\u003chash\u003e.r2.dev` |\n\nWhen all of `R2_BUCKET` / `R2_ENDPOINT` are set, the workflow uploads\n`search.db` to R2 after `validate`, removes it from the Pages artifact, and\nbuilds the SPA with `VITE_SEARCH_DB_URL=$R2_PUBLIC_URL/search.db`. With the\nsecrets unset, everything still works (search.db stays in `public/`).\n\n- Indexed at the article level. The FTS5 virtual table has columns\n  `law_id` / `article_id` / `article_no` / `caption` / `title_tokens` /\n  `content_tokens`.\n- Japanese is pre-tokenized as **character bigrams**\n  (`crates/search-index::tokenize` and\n  `figma/src/app/data/search-engine::tokenize` are kept in lockstep).\n- Queries go through the same bigram tokenizer; FTS5 `snippet()` produces\n  highlighted excerpts.\n- A `meta` table stores `built_at` / `law_count` / `article_count` /\n  `ref_count`.\n- A `refs` table stores cross-references between articles:\n\n  ```sql\n  CREATE TABLE refs (\n    from_law_id TEXT, from_article_id TEXT,\n    to_law_id   TEXT, to_article_id   TEXT,\n    ref_text TEXT,\n    ref_type TEXT  -- 'self_article' | 'previous_article' | 'next_article' | 'cross_law'\n  );\n  ```\n\n  Extraction uses Aho-Corasick (`MatchKind::LeftmostLongest`) to keep build\n  time linear in body length × match count even with thousands of laws.\n\nThe browser exposes `getOutgoingRefs` / `getIncomingRefs` / `getRefsForLaw` and\nthe Browse detail view linkifies article text in place. Clicking a reference\nscrolls to `#article_id`; cross-law references navigate to\n`/laws/{other_id}#{article_id}`. Each article header also lists incoming\nreferences as backlinks.\n\n`/search` lazy-loads sql-wasm (~320KB gzip) + `search.db` on first navigation;\nfalling back to a mock filter when the DB is unreachable so local dev still\nworks.\n\nInspired by ellisii's [`jp-tokenizer-bigram`](../ellisii/crates/jp-tokenizer-bigram/)\nand [`store-sqlite`](../ellisii/crates/store-sqlite/).\n\n## Web UI (static SPA)\n\n`figma/` doubles as the design source-of-truth and the actual UI implementation\n(Vite + React + Tailwind v4 + shadcn/ui). It builds straight into the same\n`public/` directory the JSON lives in.\n\n- `base: './'` so assets are relative — works on any GitHub Pages sub-path\n- `outDir: ../public`, `emptyOutDir: false` so the JSON survives a Vite build\n- `publicDir: false` to avoid copying assets into themselves\n- Dev mode: `lawpubJsonDevServer` Vite middleware serves `../public/*.json`\n  on the fly so `pnpm dev` sees live data without a separate server\n- Lazy-loaded chart bundle (recharts ≈ 420 KB) via `React.lazy`, kept out of\n  the initial dashboard render\n\n### CI step order\n\n1. `lawpub update` writes JSON via atomic `public.tmp/` → rename\n2. `lawpub kanpo-link` overlays 官報 matches on each `timeline.json`,\n   recomputes `manifest.json`\n3. **Change detection**: read `state/last_run.json.changed`; if `false`, skip\n   the rest\n4. `pnpm build` adds `index.html` + `assets/` to `public/` (JSON untouched)\n5. `lawpub validate` cross-checks every manifest entry's sha256\n6. `actions/configure-pages` → `actions/upload-pages-artifact`\n7. `git commit \u0026\u0026 git push` (`public/` plus `state/latest.json`)\n8. Separate `deploy` job runs `actions/deploy-pages`\n\n## Auto-update via GitHub Actions\n\n`update-law-data.yml` is driven by three triggers:\n\n| Trigger | Behaviour |\n|---|---|\n| `schedule` (JST 06:30 / 12:30 / 18:30 / 00:30) | Pull latest e-Gov diff, commit + deploy if anything changed |\n| `push` (merge to `main`) | Rebuild SPA over the existing committed `public/` and redeploy. **No** e-Gov fetch, **no** auto-commit |\n| `workflow_dispatch` | Pick `provider` / `date` / `force` / `from_date` / `to_date` / `bulk_category` / `bulk_limit` |\n\nAuto-commits use `GITHUB_TOKEN`, which by GitHub policy does not re-trigger\nworkflows — so a cron auto-commit cannot create a deploy loop.\n\n### Change detection (no-op suppression)\n\n`lawpub update` writes `state/last_run.json` (gitignored) on every run:\n\n```json\n{\n  \"version\": 1,\n  \"ran_at\": \"2026-05-09T03:30:00Z\",\n  \"provider\": \"http\",\n  \"dates\": [\"2026-05-06\", \"2026-05-07\", \"2026-05-08\", \"2026-05-09\"],\n  \"new_xmls\": 0,\n  \"errors\": [],\n  \"changed\": false\n}\n```\n\nIf the sha256-deduped revision store (`.cache/revisions/`) gained no new XMLs\n**and** `public/manifest.json` already exists, the run reports `changed=false`\nand every downstream step (build / commit / deploy) is skipped. So idle hours\non the e-Gov side do not bloat git history.\n\n### Failure handling\n\n- HttpProvider retries each request three times with exponential backoff. A\n  failed date is logged in `errors` and other dates keep going (plan §14).\n- `public/` is replaced atomically via `public.tmp/` → `public.bak/` →\n  rename. A failure mid-swap is rolled back from the backup.\n- `concurrency: update-law-json` serializes overlapping schedule + dispatch\n  runs.\n\n### Manual triggers\n\n```bash\n# Single date (overrides the auto state-based range)\ngh workflow run update-law-data.yml -f date=2026-05-01\n\n# Range backfill (fill in dates before cron started)\ngh workflow run update-law-data.yml \\\n  -f from_date=2024-04-01 -f to_date=2026-05-09\n\n# Bulk fetch (one-shot collection of every law in a category)\n#   1 = 憲法・法律\n#   2 = 政令・勅令\n#   3 = 府省令・規則\ngh workflow run update-law-data.yml -f bulk_category=1\ngh workflow run update-law-data.yml -f bulk_category=2 -f bulk_limit=500\n\n# Force a redeploy without touching e-Gov\ngh workflow run update-law-data.yml -f force=true\n```\n\n### One-time amendment-history backfill (e-Gov API v2)\n\n`/api/1/lawdata/{id}` (currently used by bulk/cron) only returns the law's *current*\nsnapshot — no historical revisions. To populate the timeline with actual amendment\nhistory (e.g. 民法 has ~33 revisions back to Heisei era), we use e-Gov API v2's\n`/law_revisions/{id}` endpoint. This is a one-time backfill done **locally** (not in\nActions) because it makes ~9000 requests and would be slow / risky in CI.\n\n```bash\n# 1. Smoke-test on a few laws first. ID 源は public/laws/index.json (auto-committed\n#    by Actions) なので fresh checkout でも .cache 不要で回せる。\n./target/release/lawpub fetch-revisions --from-public ./public --limit 5\n\n# 2. Full backfill. Concurrency 2 is e-Gov-friendly (CloudFront rate-limits at ~4+).\n#    Resumes if interrupted; existing per-law JSONs are skipped (use --force to redo).\n./target/release/lawpub fetch-revisions --from-public ./public --concurrency 2\n\n#    Alternative: when .cache/revisions/ is already populated locally:\n# ./target/release/lawpub fetch-revisions --all --concurrency 2\n\n# 3. Pack the per-law JSONs into a single jsonl for shipping.\n./target/release/lawpub bundle-revisions-meta --mode pack \\\n  --dir .cache/revisions_meta --file .cache/revisions_meta.jsonl\n\n# 4. Upload to R2 via wrangler (uses your `wrangler login` session — no R2\n#    access key needed locally). CI's \"Restore revisions_meta from R2\" step\n#    later pulls the same object back via the S3 API.\nexport R2_BUCKET=\u003cbucket\u003e\npnpm install               # installs wrangler (root devDependency)\npnpm upload-revisions-meta # = wrangler r2 object put \"$R2_BUCKET/revisions_meta.jsonl\" ...\n\n# 5. Trigger a force rebuild so build-json picks up the new meta.\ngh workflow run \"Update law JSON\" -f force=true\n```\n\nThe upload uses `wrangler` (a root `devDependency`); `pnpm upload-revisions-meta`\nwraps `wrangler r2 object put ... --remote`. CI reads the object back with the\nS3 API + `R2_*` secrets — upload and download paths differ but hit the same\nR2 object.\n\nAfter this, the cron path (`lawpub update`) refreshes the meta for *only* the\nlaws updated that day, so the timeline stays fresh without re-running the\nfull backfill.\n\nPriority is `bulk_category \u003e from_date/to_date \u003e date \u003e automatic state-based`.\nBulk runs do thousands of requests × 200 ms throttle, so the workflow's\n`timeout-minutes` is 360. If a bulk run dies partway through, the in-job\n`.cache/revisions/` still holds whatever it managed to fetch and `build-json`\nwill produce a partial `public/`.\n\n## Status\n\nUp and running on Pages. The cron is incremental from the moment it starts;\nhistorical revisions only accumulate going forward unless you explicitly\nbackfill via `bulk_category=N` or `from_date=…/to_date=…`. There is no e-Gov\nendpoint that returns historical revisions of a single law (only the current\nversion + a daily-update list), so deeper history requires the daily snapshots\nto keep stacking up over time.\n\n## License\n\n- **個人利用**: 非商用の個人利用（学習・研究・検証・趣味目的での利用・改変・\n  再配布を含む）は無償で自由に行えます。\n- **商用利用**: 商用目的での利用は事前のお問い合わせ・許諾が必要です。\n\n詳細は [LICENSE](LICENSE) を参照してください。\n\nなお、本リポジトリが扱う法令データは e-Gov 法令API 等の公的データに由来します。\nデータ自体の利用条件は各提供元の規約に従ってください。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbokuweb%2Flawrenceanum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbokuweb%2Flawrenceanum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbokuweb%2Flawrenceanum/lists"}