{"id":24370338,"url":"https://github.com/bitlap/geocoding","last_synced_at":"2025-04-04T13:13:54.752Z","repository":{"id":39992131,"uuid":"132923880","full_name":"bitlap/geocoding","owner":"bitlap","description":":globe_with_meridians: 地理编码技术，提供地址标准化和相似度计算。","archived":false,"fork":false,"pushed_at":"2024-08-02T15:39:24.000Z","size":5889,"stargazers_count":267,"open_issues_count":11,"forks_count":94,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-28T12:09:43.379Z","etag":null,"topics":["address","geocoding","kotlin","segmentation","similarity"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bitlap.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-10T16:03:26.000Z","updated_at":"2025-03-26T02:42:04.000Z","dependencies_parsed_at":"2022-08-09T15:48:17.735Z","dependency_job_id":"4a3470da-170c-4438-a532-b2cf91006ba9","html_url":"https://github.com/bitlap/geocoding","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitlap%2Fgeocoding","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitlap%2Fgeocoding/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitlap%2Fgeocoding/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitlap%2Fgeocoding/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bitlap","download_url":"https://codeload.github.com/bitlap/geocoding/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247182420,"owners_count":20897381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["address","geocoding","kotlin","segmentation","similarity"],"created_at":"2025-01-19T04:24:11.884Z","updated_at":"2025-04-04T13:13:54.733Z","avatar_url":"https://github.com/bitlap.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n[![Project stage](https://img.shields.io/badge/Project%20Stage-Production%20Ready-brightgreen.svg)](https://github.com/bitlap/bitlap/wiki/Project-Stages)\n[![Java 8 CI](https://github.com/IceMimosa/geocoding/actions/workflows/java8.yml/badge.svg)](https://github.com/IceMimosa/geocoding/actions/workflows/java8.yml)\n[![Maven Central](https://img.shields.io/maven-central/v/org.bitlap/geocoding)](https://central.sonatype.com/artifact/org.bitlap/geocoding)\n\n# 介绍\n本项目旨在将不规范(或者连续)的文本地址进行尽可能的**标准化**, 以及对两个地址进行**相似度的计算**。\n\n地理编码技术, 主要分为如下步骤\n * 地址标准库\n * 地址标准化\n * 相似度计算\n\n## pom\n\n```xml\n\u003cdependencies\u003e\n    \u003cdependency\u003e\n        \u003cgroupId\u003eorg.bitlap\u003c/groupId\u003e \n        \u003cartifactId\u003egeocoding\u003c/artifactId\u003e\n        \u003cversion\u003e1.3.1\u003c/version\u003e\n    \u003c/dependency\u003e\n\u003c/dependencies\u003e\n```\n\n# 1. 数据测试\n\n方法调用: `Geocoding` 类\n * normalizing: 标准化\n * analyze: 解析成分词文档\n * similarity: 相似度计算\n * similarityWithResult: 相似度计算, 返回包含更多丰富的数据\n\n## 1.1 标准化\n\n```java\n\u003e\u003e 输入: 山东青岛市北区山东省青岛市市北区水清沟街道九江路20号大都会3号楼2单元1303\n\u003e\u003e 输出:\nAddress(\n\tprovinceId=370000000000, province=山东省, \n\tcityId=370200000000, city=青岛市, \n\tdistrictId=370203000000, district=市北区, \n\tstreetId=370203030000, street=水清沟街道, \n\ttownId=null, town=null, \n\tvillageId=null, village=null, \n\troad=九江路, \n\troadNum=20号, \n\tbuildingNum=3号楼2单元1303, \n\ttext=大都会\n)\n```\n\n```java\n\u003e\u003e 输入: 上海上海宝山区宝山区【新沪路58弄11-802  水韵华庭 】 (水韵华庭附近)\n\u003e\u003e 输出: \nAddress(\n\tprovinceId=310000000000, province=上海, \n\tcityId=310100000000, city=上海市, \n\tdistrictId=310113000000, district=宝山区, \n\tstreetId=null, street=null, \n\ttownId=null, town=null, \n\tvillageId=null, village=null, \n\troad=新沪路, \n\troadNum=58弄, \n\tbuildingNum=11-802, \n\ttext=水韵华庭水韵华庭附近\n)\n```\n\n* 返回的对象解释\n    * province相关: 省\n    * city相关: 市\n    * district相关: 区、县\n    * street相关: 街道\n    * town相关: 乡镇\n    * village相关: 村\n    * road: 道路\n    * roadNum: 路号\n    * buildingNum: 建筑物号\n    * text: 标准化后为匹配的地址。一般包含小区, 商场名称等信息\n\n\u003e 注: 如果对text的结果不是很满意, 比如出现重复或不准确, 可以通过分词的手段解决\n\n## 1.2 相似度\n\n```java\n\u003e\u003e 输入:\n  浙江金华义乌市南陈小区8幢2号\n  浙江金华义乌市稠城街道浙江省义乌市宾王路99号后面南陈小区8栋2号\n\u003e\u003e 输出: \n  0.8451542547285166\n```\n\n```java\n\u003e\u003e 输入:\n  山东省沂水县四十里堡镇东艾家庄村206号\n  浙江金华义乌市南陈小区8幢2号\n\u003e\u003e 输出:\n  0.0\n```\n\n## 1.3 自定义地址文件设置\n\n```kotlin\n// 加载自定义地址文件\nval geocoding = GeocodingX(\"region_2021.dat\")\n\n// 添加自定义区县\"临平区\"\ngeocoding.addRegionEntry(330113000000, 330100000000, \"临平区\", RegionType.District, \"\", true)\n\n// 保存自定义字典文件\ngeocoding.save(\"xxx.dat\")\n```\n\n## 1.4 自定义地址设置\n\n```kotlin\n// 100000000000 代表中国的ID\nGeocoding.addRegionEntry(88888888, 100000000000, \"尼玛省\", RegionType.Province)\nGeocoding.addRegionEntry(8888888, 88888888, \"尼玛市\", RegionType.City)\nGeocoding.addRegionEntry(888888, 8888888, \"泥煤市\", RegionType.District)\n\n\u003e\u003e 输入: 中国尼玛省尼玛市泥煤市泥煤大道888号xxx\n\u003e\u003e 输出:\nAddress(\n\tprovinceId=88888888, province=尼玛省, \n\tcityId=8888888, city=尼玛市, \n\tdistrictId=888888, district=泥煤市, \n\tstreetId=null, street=null, \n\ttownId=null, town=null, \n\tvillageId=null, village=null, \n\troad=泥煤大道, \n\troadNum=888号, \n\tbuildingNum=null, \n\ttext=xxx\n)\n```\n\n\u003e Tips: 可以从「国家标准地址库」中获取「父级城市ID」 \n\n# 2. 说明\n\n## 2.1 标准地址库\n项目目前采用的是 [~~淘宝物流4级地址~~][1] （已过期，可通过淘宝收货地址获取实际调用地址）的标准地址库, 也可以采用`国家的标准地址库` (对应的github库, [中国5级行政区域mysql库][3]).\n* [国家标准地址库2023](http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2023)\n* [国家标准地址库2022](http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2022)\n* [国家标准地址库2021](http://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2021)\n\n### 导入中国5级行政区域mysql库注意事项\n\n[参考文档](https://github.com/bitlap/geocoding/blob/master/src/test/java/org/bitlap/geocoding/region/README.md)\n\n## 2.2 标准地址库（兼容本项目）\n\n| 标准库文件           | 描述          | 参考                                                          | 感谢                                                                                   |\n|-----------------|-------------|-------------------------------------------------------------|--------------------------------------------------------------------------------------|\n| region_2021.dat | 国家标准地址库2021 | [ISSUE-163](https://github.com/bitlap/geocoding/issues/163) | [TsLenMo](https://github.com/TsLenMo)、[weijiang.lin](https://github.com/linweijiang) |\n\n使用方式：文件下载到`classpath`，使用自定义的`GeocodingX`类即可。\n\n## 2.3 标准化\n1. 首先基于正则提取出道路、建筑物号等信息\n2. 省市区等匹配\n    1. 将标准的地址库建立**倒排索引**\n    2. 将文本从起始位置开始, 采用**最大长度优先**的方式匹配所有词条\n    3. 对所有匹配结果进行标准行政区域从属关系校验\n\n## 2.4 相似度计算\n1. 对输入的两个地址进行标准化\n2. 对省市区等信息分配不同的权重\n3. 对道路号, 建筑号进行语义处理, 分配权重\n4. 对剩余文本(text)使用**IK Analyzer**进行分词\n5. 对两个结果集使用**余弦相似度算法**计算相似度\n\n\n项目参考[address-semantic-search][4]，简化了流程，修复了各种不规则错误，使得使用更加方便。\n\n## 感谢\n\n* Python封装库：[casuallyName/Geocoding](https://github.com/casuallyName/Geocoding)\n\n\n## Release Log\n\n[Change Log](./CHANGES.md)\n\n## LICENSE\n\nMIT\n\n[1]:https://lsp.wuliu.taobao.com/locationservice/addr/output_address_town.do\n[3]:https://github.com/kakuilan/china_area_mysql\n[4]:https://github.com/liuzhibin-cn/address-semantic-search\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitlap%2Fgeocoding","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbitlap%2Fgeocoding","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitlap%2Fgeocoding/lists"}