{"id":18600616,"url":"https://github.com/houbb/word-checker","last_synced_at":"2025-04-09T11:08:42.661Z","repository":{"id":43476423,"uuid":"131351909","full_name":"houbb/word-checker","owner":"houbb","description":"🇨🇳🇬🇧Chinese and English word spelling corrector.(中文易错别字检测，中文拼写检测纠正。英文单词拼写校验工具)","archived":false,"fork":false,"pushed_at":"2024-12-08T07:26:35.000Z","size":4987,"stargazers_count":251,"open_issues_count":5,"forks_count":56,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-02T10:11:11.699Z","etag":null,"topics":["cc","csc","english-word","java","nlp","spelling","spelling-correction","word"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/houbb.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-27T22:56:48.000Z","updated_at":"2025-03-25T15:36:38.000Z","dependencies_parsed_at":"2024-12-08T08:19:44.946Z","dependency_job_id":"b9b383b9-f902-4325-85c7-4974275002ce","html_url":"https://github.com/houbb/word-checker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/houbb%2Fword-checker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/houbb%2Fword-checker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/houbb%2Fword-checker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/houbb%2Fword-checker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/houbb","download_url":"https://codeload.github.com/houbb/word-checker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248027407,"owners_count":21035594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cc","csc","english-word","java","nlp","spelling","spelling-correction","word"],"created_at":"2024-11-07T02:04:44.806Z","updated_at":"2025-04-09T11:08:42.634Z","avatar_url":"https://github.com/houbb.png","language":"Java","funding_links":[],"categories":["人工智能"],"sub_categories":["自然语言处理"],"readme":"# 项目简介\n\n\u003e [英文文档](README_EN.md)\n\n[word-checker](https://github.com/houbb/word-checker/) 本项目用于单词拼写检查。支持英文单词拼写检测，和中文拼写检测。\n\n附加编辑距离等常用算法。\n\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.houbb/word-checker/badge.svg)](http://mvnrepository.com/artifact/com.github.houbb/word-checker)\n[![Build Status](https://www.travis-ci.org/houbb/word-checker.svg?branch=master)](https://www.travis-ci.org/houbb/word-checker?branch=master)\n[![Coverage Status](https://coveralls.io/repos/github/houbb/word-checker/badge.svg?branch=master)](https://coveralls.io/github/houbb/word-checker?branch=master)\n[![](https://img.shields.io/badge/license-Apache2-FF0080.svg)](https://github.com/houbb/word-checker/blob/master/LICENSE.txt)\n[![Open Source Love](https://badges.frapsoft.com/os/v2/open-source.svg?v=103)](https://github.com/houbb/word-checker)\n\n# 特性说明\n\n### 编辑距离\n\n- 任意两个单词间的最短编辑距离\n\n- 任意两个单词间最短编辑距离推导过程\n\n### 支持英文的单词纠错\n\n- 可以迅速判断当前单词是否拼写错误\n\n- 可以返回最佳匹配结果\n\n- 可以返回纠正匹配列表，支持指定返回列表的大小\n\n- 错误提示支持 i18n\n\n- 支持大小写、全角半角格式化处理\n\n- 支持自定义词库\n\n- 内置 27W+ 的英文词库\n\n- 支持指定英文的编辑距离\n\n### 支持基本的中文拼写检测\n\n# 变更日志\n\n\u003e [变更日志](https://github.com/houbb/word-checker/blob/master/CHANGELOG.md)\n\n# 快速开始\n\n## JDK 版本\n\nJdk 1.7+\n\n## maven 引入\n\n```xml\n\u003cdependency\u003e\n     \u003cgroupId\u003ecom.github.houbb\u003c/groupId\u003e\n     \u003cartifactId\u003eword-checker\u003c/artifactId\u003e\n    \u003cversion\u003e1.2.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n## 编辑距离\n\n方法在工具类 `EditDistanceHelper`\n\n### 最短编辑距离\n\n`EditDistanceHelper.minDistance(source, target)` 任意两个单词间的最短编辑距离\n\n```java\n Assert.assertEquals(3, EditDistanceHelper.minDistance(\"horse\", \"ros\"));\n Assert.assertEquals(5, EditDistanceHelper.minDistance(\"intention\", \"execution\"));\n```\n\n### 最短编辑距离推导过程\n\n`EditDistanceHelper.minDistanceList(source, target)` 任意两个单词间的最短编辑距离推导过程\n\n```java\nAssert.assertEquals(\"[horse, hors, hos, ros]\", EditDistanceHelper.minDistanceList(\"horse\", \"ros\").toString());\nAssert.assertEquals(\"[intention, intenution, intecution, inecution, ixecution, execution]\", EditDistanceHelper.minDistanceList(\"intention\", \"execution\").toString());\n```\n\n## 测试案例\n\n会根据输入，自动返回最佳纠正结果。\n\n```java\nfinal String speling = \"speling\";\nAssert.assertEquals(\"spelling\", WordCheckerHelper.correct(speling));\n```\n\n# 核心 api 介绍\n\n核心 api 在 `WordCheckerHelper` 工具类下。\n\n`WordCheckerHelper` 工具类提供了长文本中英文混合的自动纠正功能，当然也支持单个单词。\n\n| 功能 | 方法                            | 参数 | 返回值                         | 备注                   |\n|:----|:------------------------------|:----|:----------------------------|:---------------------|\n| 文本拼写是否正确 | isCorrect(string)             | 待检测的文本 | boolean                     | 全部正确，才会返回 true       |\n| 返回最佳纠正结果 | correct(string)               | 待检测的单词 | String                      | 如果没有找到可以纠正的文本，则返回其本身 |\n| 判断文本拼写是否正确 | correctMap(string)            | 待检测的单词 | `Map\u003cString, List\u003cString\u003e\u003e` | 返回所有匹配的纠正列表 MAP      |\n| 判断文本拼写是否正确 | correctMap(string, int limit) | 待检测的文本, 返回列表的大小 | 返回指定大小的的纠正列表      MAP          | 列表大小 \u003c= limit        |\n| 判断文本拼写是否正确 | correctList(string)          | 待检测的单词 | `List\u003cString\u003e`              | 返回所有匹配的纠正列表          |\n| 判断文本拼写是否正确 | correctList(string, int limit) | 待检测的文本, 返回列表的大小 | 返回指定大小的的纠正列表                | 列表大小 \u003c= limit        |\n\n## 英文测试例子\n\n\u003e 参见 [EnWordCheckerTest.java](https://github.com/houbb/word-checker/tree/master/src/test/java/com/github/houbb/word/checker/util/WordCheckerHelperTest.java)\n\n### 是否拼写正确\n\n```java\nfinal String hello = \"hello\";\nfinal String speling = \"speling\";\nAssert.assertTrue(WordCheckerHelper.isCorrect(hello));\nAssert.assertFalse(WordCheckerHelper.isCorrect(speling));\n```\n\n### 返回最佳匹配结果\n\n```java\nfinal String hello = \"hello\";\nfinal String speling = \"speling\";\nAssert.assertEquals(\"hello\", WordCheckerHelper.correct(hello));\nAssert.assertEquals(\"spelling\", WordCheckerHelper.correct(speling));\n```\n\n### 默认纠正匹配列表\n\n```java\nfinal String word = \"goox\";\nList\u003cString\u003e stringList = WordCheckerHelper.correctList(word);\nAssert.assertEquals(\"[good, goo, goon, goof, gook, goop, goos, gox, goog, gool, goor]\", stringList.toString());\n```\n\n### 指定纠正匹配列表大小\n\n```java\nfinal String word = \"goox\";\nfinal int limit = 2;\nList\u003cString\u003e stringList = WordCheckerHelper.correctList(word, limit);\nAssert.assertEquals(\"[good, goo]\", stringList.toString());\n```\n\n## 中文拼写纠正\n\n### 是否拼写正确\n\n```java\nfinal String right = \"正确\";\nfinal String error = \"万变不离其中\";\n\nAssert.assertTrue(WordCheckerHelper.isCorrect(right));\nAssert.assertFalse(WordCheckerHelper.isCorrect(error));\n```\n\n### 返回最佳匹配结果\n\n```java\nfinal String right = \"正确\";\nfinal String error = \"万变不离其中\";\n\nAssert.assertEquals(\"正确\", WordCheckerHelper.correct(right));\nAssert.assertEquals(\"万变不离其宗\", WordCheckerHelper.correct(error));\n```\n\n### 默认纠正匹配列表\n\n```java\nfinal String word = \"万变不离其中\";\n\nList\u003cString\u003e stringList = WordCheckerHelper.correctList(word);\nAssert.assertEquals(\"[万变不离其宗]\", stringList.toString());\n```\n\n### 指定纠正匹配列表大小\n\n```java\nfinal String word = \"万变不离其中\";\nfinal int limit = 1;\n\nList\u003cString\u003e stringList = WordCheckerHelper.correctList(word, limit);\nAssert.assertEquals(\"[万变不离其宗]\", stringList.toString());\n```\n\n## 长文本中英文混合\n\n### 情景\n\n实际拼写纠正的话，最佳的使用体验是用户输入一个长文本，并且可能是中英文混合的。\n\n然后实现上述对应的功能。\n\n### 拼写是否正确\n\n```java\nfinal String hello = \"hello 你好\";\nfinal String speling = \"speling 你好 以毒功毒\";\nAssert.assertTrue(WordCheckers.isCorrect(hello));\nAssert.assertFalse(WordCheckers.isCorrect(speling));\n```\n\n### 返回最佳纠正结果\n\n```java\nfinal String hello = \"hello 你好\";\nfinal String speling = \"speling 你好以毒功毒\";\nAssert.assertEquals(\"hello 你好\", WordCheckers.correct(hello));\nAssert.assertEquals(\"spelling 你好以毒攻毒\", WordCheckers.correct(speling));\n```\n\n### 判断文本拼写是否正确\n\n每一个词，对应的纠正结果。\n\n```java\nfinal String hello = \"hello 你好\";\nfinal String speling = \"speling 你好以毒功毒\";\nAssert.assertEquals(\"{hello=[hello],  =[ ], 你=[你], 好=[好]}\", WordCheckers.correctMap(hello).toString());\nAssert.assertEquals(\"{ =[ ], speling=[spelling, spewing, sperling, seeling, spieling, spiling, speeling, speiling, spelding], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}\", WordCheckers.correctMap(speling).toString());\n```\n\n### 判断文本拼写是否正确\n\n同上，指定最多返回的个数。\n\n```java\nfinal String hello = \"hello 你好\";\nfinal String speling = \"speling 你好以毒功毒\";\n\nAssert.assertEquals(\"{hello=[hello],  =[ ], 你=[你], 好=[好]}\", WordCheckers.correctMap(hello, 2).toString());\nAssert.assertEquals(\"{ =[ ], speling=[spelling, spewing], 你=[你], 好=[好], 以毒功毒=[以毒攻毒]}\", WordCheckers.correctMap(speling, 2).toString());\n```\n\n# 格式化处理\n\n有时候用户的输入是各式各样的，本工具支持对于格式化的处理。\n\n## 大小写\n\n大写会被统一格式化为小写。\n\n```java\nfinal String word = \"stRing\";\n\nAssert.assertTrue(WordCheckerHelper.isCorrect(word));\n```\n\n## 全角半角\n\n全角会被统一格式化为半角。\n\n```java\nfinal String word = \"stｒing\";\n\nAssert.assertTrue(WordCheckerHelper.isCorrect(word));\n```\n\n# 自定义英文词库\n\n## 文件配置\n\n你可以在项目资源目录创建文件 `resources/data/define_word_checker_en.txt`\n\n内容如下：\n\n```\nmy-long-long-define-word,2\nmy-long-long-define-word-two\n```\n\n不同的词独立一行。\n\n每一行第一列代表单词，第二列代表出现的次数，二者用逗号 `,` 隔开。\n\n次数越大，在纠正的时候返回优先级就越高，默认值为 1。\n\n用户自定义的词库优先级高于系统内置词库。\n\n## 测试代码\n\n我们在指定了对应的单词之后，拼写检测的时候就会生效。\n\n```java\nfinal String word = \"my-long-long-define-word\";\nfinal String word2 = \"my-long-long-define-word-two\";\n\nAssert.assertTrue(WordCheckerHelper.isCorrect(word));\nAssert.assertTrue(WordCheckerHelper.isCorrect(word2));\n```\n\n# 自定义中文词库\n\n## 文件配置\n\n你可以在项目资源目录创建文件 `resources/data/define_word_checker_zh.txt`\n\n内容如下：\n\n```\n默守成规 墨守成规\n```\n\n使用英文空格分隔，前面是错误，后面是正确。\n\n\n# NLP 开源矩阵\n\n[pinyin 汉字转拼音](https://github.com/houbb/pinyin)\n\n[pinyin2hanzi 拼音转汉字](https://github.com/houbb/pinyin2hanzi)\n\n[segment 高性能中文分词](https://github.com/houbb/segment)\n\n[opencc4j 中文繁简体转换](https://github.com/houbb/opencc4j)\n\n[nlp-hanzi-similar 汉字相似度](https://github.com/houbb/nlp-hanzi-similar)\n\n[word-checker 拼写检测](https://github.com/houbb/word-checker)\n\n[sensitive-word 敏感词](https://github.com/houbb/sensitive-word)\n\n\n# 相关博客\n\n[NLP 中文拼写检测实现思路](https://houbb.github.io/2020/01/20/nlp-chinese-spelling-correct)\n\n[NLP 中文拼写检测纠正算法整理](https://houbb.github.io/2020/01/20/nlp-chinese-spelling-correct-02)\n\n[NLP 英文拼写算法，如果提升 100W 倍的性能？](https://houbb.github.io/2020/01/20/nlp-chinese-spelling-correct-03-100w-faster)\n\n[NLP 中文拼写检测纠正 Paper](https://houbb.github.io/2020/01/20/nlp-chinese-spelling-correct-paper)\n\n[java 实现中英文拼写检查和错误纠正？可我只会写 CRUD 啊！](https://houbb.github.io/2020/01/20/nlp-chinese-word-checker)\n\n[一个提升英文单词拼写检测性能 1000 倍的算法？](https://houbb.github.io/2020/01/20/nlp-chinese-word-checker-02-1000x)\n\n[单词拼写纠正-03-leetcode edit-distance 72.力扣编辑距离](https://houbb.github.io/2020/01/20/nlp-chinese-word-checker-03-edit-distance-intro)\n\n# 后期 Road-Map\n\n- [x] 支持英文分词，处理整个英文句子\n\n- 支持中文分词拼写检测\n\n- 引入中文纠错算法，同音字和形近字处理。\n\n- 支持中英文混合拼写检测\n\n- [ ] 其他常用的编辑距离算法\n\n# 技术鸣谢\n\n[Words](https://github.com/atebits/Words) 提供的原始英语单词数据。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoubb%2Fword-checker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhoubb%2Fword-checker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhoubb%2Fword-checker/lists"}