{"id":45884055,"url":"https://github.com/Towdium/PinIn","last_synced_at":"2026-03-13T02:01:11.375Z","repository":{"id":50471009,"uuid":"246580302","full_name":"Towdium/PinIn","owner":"Towdium","description":" Java library for Chinese text match using Pinyin - 用于各类汉语拼音匹配问题的 Java 库 ","archived":false,"fork":false,"pushed_at":"2023-01-17T16:09:05.000Z","size":8114,"stargazers_count":46,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-03-09T05:35:48.506Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Towdium.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-11T13:35:21.000Z","updated_at":"2023-02-28T03:46:27.000Z","dependencies_parsed_at":"2023-01-31T03:32:14.682Z","dependency_job_id":null,"html_url":"https://github.com/Towdium/PinIn","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/Towdium/PinIn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Towdium%2FPinIn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Towdium%2FPinIn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Towdium%2FPinIn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Towdium%2FPinIn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Towdium","download_url":"https://codeload.github.com/Towdium/PinIn/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Towdium%2FPinIn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30454982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T21:31:01.033Z","status":"online","status_checked_at":"2026-03-13T02:00:07.565Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-27T15:00:30.025Z","updated_at":"2026-03-13T02:01:11.363Z","avatar_url":"https://github.com/Towdium.png","language":"Java","funding_links":[],"categories":["人工智能","Java"],"sub_categories":["自然语言处理"],"readme":"# PinIn [![Download][7]][8]\n\n一个用于解决各类汉语拼音匹配问题的 Java 库。对即时匹配提供基于 NFA 的实现，对索引匹配提供类后缀树的实现。除此之外，它还可以将汉字转换为拼音字符串，包括 ASCII，Unicode 和注音符号。\n\n## 特性\n\n- 极为灵活的简拼组合\n- 7 种模糊音选项\n- 支持全拼，双拼（自然码，小鹤），注音（大千）\n- 提供即时匹配逻辑和基于缓存的匹配逻辑\n- 允许实时的配置切换（包括模糊音以及键盘）\n\n\u003e 对于“中国”，可以允许的搜索串包括但不限于“中国”，“中guo”，“zhongguo”，“zhong国”，“zhong1国”，“zh1国”，“zh国”。\n  基于模糊音设置，还允许“zong国”，“z国”等。\n\n\u003e 双拼输入尚在测试阶段，并且不（也不会）支持字形码。重码过多时，可以使用声调作为辅助码。\n\n对于原理和思路，参见 [再谈拼音搜索][5] 系列。\n\n## 性能\n\n性能测试使用 Enigmatica 整合导出的 [测试样本][1]。共约 37k 词条，中英混合，约 400k 字符，容量约 900 KB。性能如下：\n\n__部分匹配__\n\n| 匹配逻辑 | 构建耗时 | 预热耗时 | 搜索耗时 | 内存使用 |\n|:------:|:------:|:--------:|:-------:|-------|\n| TreeSearcher | 210ms | N/A | 0.19ms | 9.50MB |\n| SimpleSearcher | 27ms | N/A | 9.1ms | 1.84MB |\n| CachedSearcher | 28ms | 16ms | 0.55ms | 见备注 |\n| 遍历拼音匹配 | N/A | N/A | 23ms | N/A |\n| 遍历 contains | N/A | N/A | 0.53ms | N/A |\n\n__前缀匹配__\n\n| 匹配逻辑 | 构建耗时 | 预热耗时 | 搜索耗时 | 内存使用 |\n|:------:|:------:|:--------:|:-------:|-------|\n| TreeSearcher | 62.5ms | N/A | 0.083ms | 2.80MB |\n| SimpleSearcher | 30ms | N/A | 2.4ms | 1.84MB |\n| CachedSearcher | 28ms | 2.8ms | 0.10ms | 见备注 |\n| 遍历拼音匹配 | N/A | N/A | 8.8ms | N/A |\n| 遍历 startsWith | N/A | N/A | 0.53ms | N/A |\n\n\u003e `CachedSearcher` 的内存使用和搜索速度在不同场景下可能会有明显波动，一般介于 `TreeSearcher` 和 `SimpleSearcher` 之间。\n\n对于 `TreeSearcher` 和 `CachedSearcher`，一些常量参数可以进一步调整，从而在速度与内存消耗间取得平衡。\n\n## 示例\n\n你可以轻松得使用 [JitPack][8] 将 PinIn 导入到你的 Gradle 项目中。\n\n```groovy\nrepositories {\n  maven { url 'https://jitpack.io' }\n}\n\ndependencies {\n  implementation 'com.github.Towdium:PinIn:Version'\n}\n```\n\n下面的代码展示了 PinIn 的一些基础接口的使用方式。更多示例参见 [测试代码][2]。\n\n```java\npublic static void main(String[]args){\n    // context\n    PinIn p=new PinIn();\n\n    // direct match\n    boolean result1=p.contains(\"测试文本\",\"ceshi\");\n\n    // indexed match\n    Searcher\u003cInteger\u003e searcher=new TreeSearcher\u003c\u003e(CONTAIN,p));\n    p.put(\"测试文本\",0);\n    boolean result2=searcher.search(\"ceshi\").contains(0);\n\n    // fuzzy spelling\n    p.config().fSh2S(true).commit();  // don't forget to commit config\n    boolean result3=p.contains(\"测试文本\",\"cesi\");\n\n    // pinyin format\n    Char c=p.genChar('圆');\n    Pinyin y=c.pinyins()[0];\n    String s1=y.format(UNICODE)  // yuán\n    String s2=y.format(PHONETIC)  // ㄩㄢˊ\n}\n```\n\n## 致谢\n\n本项目依赖于 [Fastutil][6]。在 shadow 版 Jar 文件中内置了一个裁剪过的实现，使用纯净版 Jar 文件时则需要用户自行配置。\n\n内置的拼音数据来自于 [地球拼音][3] 和 [pinyin-data][4]。\n\nHava fun!\n\n[1]: /src/test/resources/me/towdium/pinin/small.txt\n[2]: /src/test/java/me/towdium/pinin/PinInTest.java\n[3]: https://github.com/rime/rime-terra-pinyin\n[4]: https://github.com/mozillazg/pinyin-data\n[5]: https://www.towdium.me/2019/11/05/pinyin-search-again-1/\n[6]: http://fastutil.di.unimi.it/\n[7]: https://jitpack.io/v/Towdium/PinIn.svg\n[8]: https://jitpack.io/#Towdium/PinIn\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTowdium%2FPinIn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTowdium%2FPinIn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTowdium%2FPinIn/lists"}