{"id":50725008,"url":"https://github.com/zero3kw/jp-address-datasets","last_synced_at":"2026-06-10T03:03:39.551Z","repository":{"id":363292893,"uuid":"1262677499","full_name":"zero3kw/jp-address-datasets","owner":"zero3kw","description":null,"archived":false,"fork":false,"pushed_at":"2026-06-08T08:28:15.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-08T10:14:10.694Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zero3kw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-08T08:04:32.000Z","updated_at":"2026-06-08T08:28:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zero3kw/jp-address-datasets","commit_stats":null,"previous_names":["zero3kw/jp-address-datasets"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/zero3kw/jp-address-datasets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero3kw%2Fjp-address-datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero3kw%2Fjp-address-datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero3kw%2Fjp-address-datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero3kw%2Fjp-address-datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zero3kw","download_url":"https://codeload.github.com/zero3kw/jp-address-datasets/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zero3kw%2Fjp-address-datasets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34134642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-10T02:00:07.152Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-10T03:03:37.969Z","updated_at":"2026-06-10T03:03:39.543Z","avatar_url":"https://github.com/zero3kw.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# jp-address-datasets\n\n日本の住所処理ツール（ジオコーダー / 住所正規化など）の評価用テストデータを集約するレポジトリ。\n\n## 用途\n\n- **ベンチマーク**: ジオコーダーのスループット計測\n- **正規化精度評価**: 既存ライブラリ の出力比較\n- **公開用データセット**: 出典・ライセンスを明示した再現可能なテストセット\n\n## データセット一覧\n\n| ID | 名称 | 出典 | 件数 | ライセンス |\n|----|------|------|-----:|------------|\n| `jat` | Japanese Address Testdata | [t-sagara/Japanese-Address-testdata](https://github.com/t-sagara/Japanese-Address-testdata) | 76 | [MIT](https://github.com/t-sagara/Japanese-Address-testdata/blob/main/LICENSE) |\n| `nja` | Normalize Japanese Addresses test corpus | [geolonia/normalize-japanese-addresses](https://github.com/geolonia/normalize-japanese-addresses) | 7,191 | [MIT](https://github.com/geolonia/normalize-japanese-addresses/blob/master/LICENSE.txt) |\n| `school` | 国土数値情報 学校等 (P29、幼稚園〜大学・専修学校) | [国土交通省 国土数値情報](https://nlftp.mlit.go.jp/ksj/gml/datalist/KsjTmplt-P29.html) | 53,842 | [国土数値情報利用約款](https://nlftp.mlit.go.jp/ksj/other/agreement.html) |\n| `nta-houjin` | 法人番号公表サイト 全件データの所在地 | [国税庁](https://www.houjin-bangou.nta.go.jp/download/zenken/) | 4,456,565 | [利用規約](https://www.houjin-bangou.nta.go.jp/riyokiyaku/) |\n| `abr` | アドレス・ベース・レジストリ | [デジタル庁 ABR](https://dataset.address-br.digital.go.jp/) | 211,761,061 | [利用規約](https://www.digital.go.jp/policies/base_registry_address_tos) |\n\n各データの利用にあたっては必ず出典元のライセンス・利用規約を確認してください。\n\n## 前提\n\nすべてのスクリプトは Docker コンテナ内 で実行されます。\n\n### DevContainer など、コンテナ内で実行する場合\n\nホスト側 Docker を呼ぶ環境では、コンテナ内のパスをホスト側のパスに置き換える必要があります:\n\n```bash\nHOST_DIR=/Users/foo/path/to/jp-address-datasets make download\n```\n\n## 使い方\n\n### 全件取得\n\n```bash\nmake download\n```\n\n### 個別取得\n\n```bash\nmake download-jat\nmake download-nja\nmake download-nta-houjin\nmake download-school\nmake download-abr\n```\n\n#### ABR の取得範囲を絞る\n\n| 環境変数 | 効果 |\n|---|---|\n| `PREF=13` | 当該都道府県（2桁コード）のみ取得 |\n| `SKIP_RSDTDSP=1` | 住居表示マスター（rsdtdsp_blk + rsdtdsp_rsdt）をスキップ |\n| `SKIP_PARCEL=1` | 地番マスター（約 1,900 ファイル）をスキップ |\n| `PARALLEL=16` | 並列ダウンロード数（デフォルト 16） |\n\n```bash\n# 東京都のみ、住居表示と地番をスキップして軽量に動作確認\nPREF=13 SKIP_RSDTDSP=1 SKIP_PARCEL=1 make download-abr\n```\n\n### Lint\n\n```bash\nmake lint\n```\n\n## ファイル構成\n\n`data/` 配下を raw（取得元データ）と prc（住所抽出済み）に分離します:\n\n- `data/raw/{dataset}.{ext}` — 取得元データそのまま（CSV / GeoJSON）\n- `data/prc/{dataset}.txt` — 住所文字列のみを 1 行 1 件で抽出（重複排除済み）\n\n| データセット | Raw | 住所のみ |\n|---|---|---|\n| jat | `raw/jat.csv` | `prc/jat.txt` |\n| nja | `raw/nja.csv` | `prc/nja.txt` |\n| nta-houjin | `raw/nta-houjin.csv` | `prc/nta-houjin.txt` |\n| school | `raw/school.geojson` | `prc/school.txt` |\n| abr (pref) | `raw/abr/mt_city/*.csv` | `prc/abr_pref.txt` |\n| abr (city) | `raw/abr/mt_city/*.csv` | `prc/abr_city.txt` |\n| abr (town) | `raw/abr/mt_town_fullset/*.csv` | `prc/abr_town.txt` |\n| abr (blk) | `raw/abr/mt_rsdtdsp_blk/*.csv` | `prc/abr_blk.txt` |\n| abr (rsdt) | `raw/abr/mt_rsdtdsp_rsdt/*.csv` | `prc/abr_rsdt.txt` |\n| abr (parcel) | `raw/abr/mt_parcel/*.csv` | `prc/abr_parcel.txt` |\n\n## ライセンス\n\n- **スクリプト・ドキュメント**: MIT License ([LICENSE](LICENSE))\n- **データ**: 各出典元のライセンスに従います（上表参照）\n\nデータを再配布する場合は、各出典の表示要件（クレジット表記等）を必ず守ってください。","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzero3kw%2Fjp-address-datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzero3kw%2Fjp-address-datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzero3kw%2Fjp-address-datasets/lists"}