{"id":37615758,"url":"https://github.com/jiangnanboy/jcorrector","last_synced_at":"2026-01-16T10:30:39.776Z","repository":{"id":43307716,"uuid":"467529303","full_name":"jiangnanboy/jcorrector","owner":"jiangnanboy","description":"jcorrector 中文文本纠错工具， Text Error Correction Tool，Spelling Check","archived":false,"fork":false,"pushed_at":"2025-02-28T14:13:45.000Z","size":5086,"stargazers_count":66,"open_issues_count":6,"forks_count":16,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-18T13:49:37.946Z","etag":null,"topics":["java","language-model","macbert","spelling-check","text-error-correction"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jiangnanboy.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-08T13:45:16.000Z","updated_at":"2025-04-26T08:33:26.000Z","dependencies_parsed_at":"2023-02-10T15:01:07.783Z","dependency_job_id":null,"html_url":"https://github.com/jiangnanboy/jcorrector","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jiangnanboy/jcorrector","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiangnanboy%2Fjcorrector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiangnanboy%2Fjcorrector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiangnanboy%2Fjcorrector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiangnanboy%2Fjcorrector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jiangnanboy","download_url":"https://codeload.github.com/jiangnanboy/jcorrector/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jiangnanboy%2Fjcorrector/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","language-model","macbert","spelling-check","text-error-correction"],"created_at":"2026-01-16T10:30:39.554Z","updated_at":"2026-01-16T10:30:39.733Z","avatar_url":"https://github.com/jiangnanboy.png","language":"Java","funding_links":[],"categories":["人工智能"],"sub_categories":["自然语言处理"],"readme":"# jcorrector\n\n中文文本纠错工具。音似、形似错字（或变体字）纠正，可用于中文拼音、笔画输入法的错误纠正。项目为java开发。\n\n**jcorrector**\n\n1.利用n-gram语言模型检测错别字位置，通过拼音音似特征、笔画五笔编辑距离特征及语言模型句子概率值特征纠正错别字。\n\n2.利用深度学习模型（如macbert等）进行中文end2end拼写纠错。\n\n\n**Guide**\n\n- [Question](#Question)\n- [Solution](#Solution)\n- [Feature](#Feature)\n- [Usage](#usage)\n- [Test](#Test)\n- [Dataset](#Dataset)\n- [Neural-Net](#Neural-Net)\n- [Todo](#Todo)\n- [QQ](#QQ)\n- [Cite](#Cite)\n- [License](#License)\n- [Reference](#reference)\n\n## Question\n\n中文文本纠错任务，常见错误类型包括：\n\n- 谐音字词，如 配副眼睛-配副眼镜\n- 混淆音字词，如 流浪织女-牛郎织女\n- 字词顺序颠倒，如 伍迪艾伦-艾伦伍迪\n- 字词补全，如 爱有天意-假如爱有天意\n- 形似字错误，如 高梁-高粱\n- 中文拼音全拼，如 xingfu-幸福\n- 中文拼音缩写，如 sz-深圳\n- 语法错误，如 想象难以-难以想象\n\n当然，针对不同业务场景，这些问题并不一定全部存在，比如输入法中需要处理前四种，搜索引擎需要处理所有类型，语音识别后文本纠错只需要处理前两种，\n其中'形似字错误'主要针对五笔或者笔画手写输入等。本项目重点解决其中的谐音、混淆音、形似字错误、中文拼音全拼、语法错误带来的纠错任务。\n\n\n## Solution\n### 规则的解决思路\n1. 中文纠错分为两步走，第一步是错误检测，第二步是错误纠正；\n2. 错误检测部分先通过hanlp分词器切词，由于句子中含有错别字，所以切词结果往往会有切分错误的情况，这样从字粒度和词粒度两方面检测错误，\n整合这两种粒度的疑似错误结果，形成疑似错误位置候选集；\n3. 错误纠正部分，是遍历所有的疑似错误位置，并使用音似、形似词典替换错误位置的词，然后通过语言模型计算句子概率值，对所有候选集结果比较并排序，得到最优纠正词。\n\n\n## Feature\n### 模型\n* berkeleylm：berkeleylm统计语言模型工具，规则方法，语言模型纠错，利用混淆集，扩展性强\n\n### 错误检测\n* 字粒度：语言模型句子概率值检测某字的似然概率值低于句子文本平均值，则判定该字是疑似错别字的概率大。\n* 词粒度：切词后不在词典中的词是疑似错词的概率大。\n\n### 错误纠正\n* 通过错误检测定位所有疑似错误后，取所有疑似错字的音似、形似候选词，\n* 使用候选词替换，基于语言模型得到类似翻译模型的候选排序结果，得到最优纠正词。\n\n#### ngram模型\n* berkeleylm源码已集成到项目中\n\n## Usage\n见【examples/ngram】\n\n模型下载\n\n\u003c!DOCTYPE html\u003e\n\u003chtml\u003e\n\u003chead\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003ctable style=\"width: 80%;\"\u003e\n  \u003ctr\u003e\n      \u003ctd style=\"width: 20%;\"\u003e\u003cdiv align=\"center\"\u003e\u003cstrong\u003e模型\u003c/strong\u003e\u003c/div\u003e\u003c/td\u003e\n      \u003ctd style=\"width: 30%;\"\u003e\u003cdiv align=\"center\"\u003e\u003cstrong\u003eHuggingFace\u003c/strong\u003e\u003c/div\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \n  \u003ctr\u003e\n      \u003ctd\u003e\u003ccenter\u003ejcorrector_ngram\u003c/center\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ccenter\u003e🤗\u003ca href=\"https://huggingface.co/jiangnanboy/jcorrector_ngram\"\u003ejcorrector_ngram\u003c/a\u003e\u003c/center\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n\u003c/table\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n\n### 文本纠错\n\n``` java\nimport sy.core.spelling.Corrector;\n\nCorrector corrector = new Corrector();\nString result = corrector.correct(\"少先队员因该为老人让坐\");\nSystem.out.println(result);\n```\n\noutput:\n```\n[{\"endIdx\":6,\"correct\":\"应该\",\"startIdx\":4,\"type\":\"拼写错误\",\"error\":\"因该\"},{\"endIdx\":11,\"correct\":\"让座\",\"startIdx\":9,\"type\":\"拼写错误\",\"error\":\"让坐\"}]\n```\n\n### 错误检测\n\n``` java\nimport sy.core.spelling.Detector;\n\nDetector detector = new Detector();\nString result = detector.detect(sentence);\nSystem.out.println(result);\n```\n\noutput:\n```\n[[因该, 4, 6, confusion], [让坐, 9, 11, confusion]]\n```\n\n默认字粒度、词粒度的纠错都打开，可以通过enableCharError()和enableWordError()进行设置。\n\n### 加载自定义混淆集\n\n通过加载自定义混淆集，支持用户纠正已知的错误\n\n示例[LoadDetectorDict.setCustomConfusionDict(\"/path\")](sy/spelling/LoadDetectorDict.java)\n\n### 成语、专名纠错\n\n见【examples/proper】\n\n``` java\nList\u003cString\u003e testLine = List.of(\n                \"报应接中迩来\",\n                \"这块名表带带相传\",\n                \"这块名表代代相传\",\n                \"他贰话不说把牛奶喝完了\",\n                \"这场比赛我甘败下风\",\n                \"这场比赛我甘拜下封\",\n                \"这家伙还蛮格尽职守的\",\n                \"报应接中迩来\",  // 接踵而来\n                \"人群穿流不息\",\n                \"这个消息不径而走\",\n                \"这个消息不胫儿走\",\n                \"眼前的场景美仑美幻简直超出了人类的想象\",\n                \"看着这两个人谈笑风声我心理不由有些忌妒\",\n                \"有了这一番旁证博引\",\n                \"有了这一番旁针博引\",\n                \"这群鸟儿迁洗到远方去了\",\n                \"这群鸟儿千禧到远方去了\",\n                \"美国前总统特琅普给普京点了一个赞，特朗普称普金做了一个果断的决定\"\n        );\n        for(String line : testLine) {\n            System.out.println(properCorrector.correct(line));\n        }\n```\n\noutput:\n```\n[{\"endIdx\":6,\"correct\":\"接踵而来\",\"startIdx\":2,\"type\":\"专名错误\",\"error\":\"接中迩来\"}]\n[{\"endIdx\":8,\"correct\":\"代代相传\",\"startIdx\":4,\"type\":\"专名错误\",\"error\":\"带带相传\"}]\n[]\n[{\"endIdx\":5,\"correct\":\"二话不说\",\"startIdx\":1,\"type\":\"专名错误\",\"error\":\"贰话不说\"}]\n[{\"endIdx\":9,\"correct\":\"甘拜下风\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"甘败下风\"}]\n[{\"endIdx\":9,\"correct\":\"甘拜下风\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"甘拜下封\"}]\n[{\"endIdx\":9,\"correct\":\"恪尽职守\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"格尽职守\"}]\n[{\"endIdx\":6,\"correct\":\"接踵而来\",\"startIdx\":2,\"type\":\"专名错误\",\"error\":\"接中迩来\"}]\n[{\"endIdx\":6,\"correct\":\"川流不息\",\"startIdx\":2,\"type\":\"专名错误\",\"error\":\"穿流不息\"}]\n[{\"endIdx\":8,\"correct\":\"不胫而走\",\"startIdx\":4,\"type\":\"专名错误\",\"error\":\"不径而走\"}]\n[{\"endIdx\":8,\"correct\":\"不胫而走\",\"startIdx\":4,\"type\":\"专名错误\",\"error\":\"不胫儿走\"}]\n[{\"endIdx\":9,\"correct\":\"美轮美奂\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"美仑美幻\"}]\n[{\"endIdx\":10,\"correct\":\"谈笑风生\",\"startIdx\":6,\"type\":\"专名错误\",\"error\":\"谈笑风声\"}]\n[{\"endIdx\":9,\"correct\":\"旁征博引\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"旁证博引\"}]\n[{\"endIdx\":9,\"correct\":\"旁征博引\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"旁针博引\"}]\n[{\"endIdx\":6,\"correct\":\"迁徙\",\"startIdx\":4,\"type\":\"专名错误\",\"error\":\"迁洗\"}]\n[{\"endIdx\":6,\"correct\":\"迁徙\",\"startIdx\":4,\"type\":\"专名错误\",\"error\":\"千禧\"}]\n[{\"endIdx\":8,\"correct\":\"特朗普\",\"startIdx\":5,\"type\":\"专名错误\",\"error\":\"特琅普\"},{\"endIdx\":23,\"correct\":\"普京\",\"startIdx\":21,\"type\":\"专名错误\",\"error\":\"普金\"}]\n```\n### 基于模板中文语法纠错\n\n见【examples/gec】\n\n``` java\nString templatePath = GecCheck.class.getClassLoader().getResource(PropertiesReader.get(\"template\")).getPath().replaceFirst(\"/\", \"\");\nGecCheck gecRun = new GecCheck();\ngecRun.init(templatePath);\nString sentence;\nwhile (true) {\n    System.out.println(\"Please input a sentence:\");\n    Scanner scanner = new Scanner(System.in);\n    sentence = scanner.next();\n    String infoStr = gecRun.checkCorrect(sentence);\n    if(StringUtils.isNotBlank(infoStr)) {\n        System.out.println(infoStr);\n    }\n}\n```\n\noutput:\n```\n爸爸看完小品后忍俊不禁笑了起来。\n爸爸看完小品后忍俊不禁。\n[{\"correct\":\"忍俊不禁\",\"start\":7,\"end\":15,\"error\":\"忍俊不禁笑了起来\"}]\n\n孙中山辞职后不再过问政治，决心尽瘁社会上事业，开始着手社会革命。\n孙中山辞职后不再过问政治，决心尽瘁社会上事业，开始社会革命。\n[{\"correct\":\"开始\",\"start\":23,\"end\":27,\"error\":\"开始着手\"}]\n\n在俄国社会民主工党第二次代表大会中，列宁提出效仿民意党，建立一套围绕少数“职业革命家”为核心、党员对核心高度服从的集权化的组织模式，即民主集中制。\n在俄国社会民主工党第二次代表大会中，列宁提出效仿民意党，建立一套少数“职业革命家”为核心、党员对核心高度服从的集权化的组织模式，即民主集中制。\n[{\"correct\":\"少数“职业革命家”为核心\",\"start\":32,\"end\":46,\"error\":\"围绕少数“职业革命家”为核心\"}]\n```\n\n#### 训练\n\nngram语言模型的训练及加载见sy/core/ngram/NGramModel\n\n## Test\n以下是部分测试结果：\n\n### detector:\n```\n少先队员因该为老人让坐\n[[因该, 4, 6, confusion], [让坐, 9, 11, confusion]]\n\n机七学习是人工智能领遇最能体现智能的一个分知\n[[机, 0, 1, char], [领, 9, 10, char], [遇, 10, 11, char], [分, 20, 21, char], [知, 21, 22, char]]\n\n一只小鱼船浮在平净的河面上\n[[平净, 7, 9, word]]\n\n我的家乡是有明的渔米之乡\n[[渔米之乡, 8, 12, confusion]]\n```\n\n### corrector:\n```\n少先队员因该为老人让坐\n[{\"endIdx\":6,\"correct\":\"应该\",\"startIdx\":4,\"type\":\"拼写错误\",\"error\":\"因该\"},{\"endIdx\":11,\"correct\":\"让座\",\"startIdx\":9,\"type\":\"拼写错误\",\"error\":\"让坐\"}]\n\n真麻烦你了。希望你们好好的跳无\n[{\"endIdx\":14,\"correct\":[\"条\",\"桃\",\"跳\",\"调\",\"挑\",\"逃\",\"眺\",\"兆\",\"窕\",\"佻\",\"笤\"],\"startIdx\":13,\"type\":\"拼写错误\",\"error\":\"跳\"},{\"endIdx\":15,\"correct\":[\"舞\",\"物\",\"污\",\"武\",\"务\",\"屋\",\"无\",\"乌\",\"于\",\"恶\",\"伍\",\"误\",\"午\",\"雾\",\"吴\",\"悟\",\"亡\",\"勿\",\"巫\",\"五\",\"坞\",\"梧\",\"芜\",\"捂\",\"侮\",\"呜\",\"诬\",\"晤\",\"蜈\",\"鹉\"],\"startIdx\":14,\"type\":\"拼写错误\",\"error\":\"无\"}]\n\n机七学习是人工智能领遇最能体现智能的一个分知\n[{\"endIdx\":1,\"correct\":[\"其\",\"继\",\"既\",\"冀\",\"吉\",\"奇\",\"鸡\",\"急\",\"几\",\"给\",\"即\",\"基\",\"骑\",\"祭\",\"挤\",\"借\",\"激\",\"寄\",\"吃\",\"疾\",\"际\",\"击\",\"积\",\"棘\",\"己\",\"忌\",\"籍\",\"寂\",\"集\",\"系\",\"机\",\"计\",\"济\",\"剂\",\"季\",\"迹\",\"稽\",\"脊\",\"记\",\"辑\",\"荠\",\"肌\",\"极\",\"饥\",\"级\",\"及\",\"箕\",\"期\",\"居\",\"嫉\",\"畸\",\"绩\",\"讥\",\"叽\",\"唧\",\"鲫\",\"技\",\"妓\",\"圾\",\"纪\"],\"startIdx\":0,\"type\":\"拼写错误\",\"error\":\"机\"},{\"endIdx\":11,\"correct\":[\"域\",\"语\",\"与\",\"于\",\"羽\",\"愈\",\"郁\",\"育\",\"偶\",\"浴\",\"淤\",\"喻\",\"芋\",\"雨\",\"尉\",\"宇\",\"愉\",\"愚\",\"吁\",\"遇\",\"狱\",\"逾\",\"欲\",\"寓\",\"隅\",\"榆\",\"裕\",\"娱\",\"御\",\"鱼\",\"余\",\"藕\",\"豫\",\"迂\",\"誉\",\"玉\",\"屿\",\"蔚\",\"予\",\"谷\",\"渔\",\"预\",\"舆\"],\"startIdx\":10,\"type\":\"拼写错误\",\"error\":\"遇\"},{\"endIdx\":21,\"correct\":[\"粪\",\"纷\",\"芬\",\"愤\",\"坟\",\"粉\",\"分\",\"奋\",\"氛\",\"焚\",\"忿\",\"吩\",\"份\"],\"startIdx\":20,\"type\":\"拼写错误\",\"error\":\"分\"},{\"endIdx\":22,\"correct\":[\"支\",\"值\",\"指\",\"质\",\"枝\",\"芝\",\"氏\",\"旨\",\"殖\",\"治\",\"织\",\"汁\",\"肢\",\"植\",\"掷\",\"制\",\"址\",\"挚\",\"直\",\"至\",\"只\",\"知\",\"滞\",\"侄\",\"秩\",\"致\",\"趾\",\"脂\",\"置\",\"吱\",\"帜\",\"智\",\"纸\",\"职\",\"识\",\"窒\",\"执\",\"志\",\"稚\",\"征\",\"之\",\"止\",\"蜘\"],\"startIdx\":21,\"type\":\"拼写错误\",\"error\":\"知\"}]\n\n一只小鱼船浮在平净的河面上\n[{\"endIdx\":9,\"correct\":[\"平静\",\"平经\",\"冯净\",\"屏净\",\"坪净\",\"评净\",\"平精\",\"凭净\",\"平净\",\"萍净\",\"瓶净\",\"平京\",\"乒净\",\"苹净\",\"平景\",\"平晶\",\"平靖\",\"平井\",\"平挣\",\"平争\",\"平峥\",\"平铮\",\"平劲\",\"平径\",\"平惊\",\"平鲸\",\"平睁\",\"平镜\",\"平颈\",\"平敬\",\"平茎\",\"平睛\",\"平诤\",\"平警\",\"平兢\",\"平荆\",\"平筝\",\"平境\",\"平竟\",\"平狰\",\"平阱\",\"平更\",\"平竞\"],\"startIdx\":7,\"type\":\"拼写错误\",\"error\":\"平净\"}]\n\n我的家乡是有明的渔米之乡\n[{\"endIdx\":12,\"correct\":\"鱼米之乡\",\"startIdx\":8,\"type\":\"拼写错误\",\"error\":\"渔米之乡\"}]\n\n遇到逆竟时，我们必须勇于面对，而且要愈挫愈勇，这样我们才能朝著成功之路前进。\n[{\"endIdx\":31,\"correct\":\"朝着\",\"startIdx\":29,\"type\":\"拼写错误\",\"error\":\"朝著\"}]\n\n人生就是如此，经过磨练才能让自己更加拙壮，才能使自己更加乐观。\n[{\"endIdx\":11,\"correct\":\"磨炼\",\"startIdx\":9,\"type\":\"拼写错误\",\"error\":\"磨练\"},{\"endIdx\":20,\"correct\":[\"茁壮\",\"拙装\",\"拙庄\",\"旺壮\",\"卓壮\",\"捉壮\",\"出壮\",\"着壮\",\"咄壮\",\"屈壮\",\"汪壮\",\"酌壮\",\"浊壮\",\"拙状\",\"桌壮\",\"拙壮\",\"拙桩\",\"啄壮\",\"琢壮\",\"灼壮\",\"拙妆\",\"倔壮\",\"掘壮\",\"拙撞\"],\"startIdx\":18,\"type\":\"拼写错误\",\"error\":\"拙壮\"}]\n```\n\n## Dataset\n\n数据集来自人民日报2014版，将每句进行分字处理。\n\n## Neural-Net\n利用深度学习进行端到端中文拼写纠错。\n\n   ### Macbert\n    \n    这里利用java加载macbert模型，并进行中文拼写纠错，具体见【https://github.com/jiangnanboy/macbert-java-onnx】\n    \n   ##### usage\n   \n   模型下载\n\u003c!DOCTYPE html\u003e\n\u003chtml\u003e\n\u003chead\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003ctable style=\"width: 80%;\"\u003e\n  \u003ctr\u003e\n      \u003ctd style=\"width: 20%;\"\u003e\u003cdiv align=\"center\"\u003e\u003cstrong\u003e模型\u003c/strong\u003e\u003c/div\u003e\u003c/td\u003e\n      \u003ctd style=\"width: 30%;\"\u003e\u003cdiv align=\"center\"\u003e\u003cstrong\u003eHuggingFace\u003c/strong\u003e\u003c/div\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \n  \u003ctr\u003e\n      \u003ctd\u003e\u003ccenter\u003ejcorrector_macbert_v1\u003c/center\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ccenter\u003e🤗\u003ca href=\"https://huggingface.co/jiangnanboy/jcorrector_macbert_v1\"\u003ejcorrector_macbert_v1\u003c/a\u003e\u003c/center\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n  \u003ctr\u003e\n      \u003ctd\u003e\u003ccenter\u003ejcorrector_macbert_v2\u003c/center\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ccenter\u003e🤗\u003ca href=\"https://huggingface.co/jiangnanboy/jcorrector_macbert_v2\"\u003ejcorrector_macbert_v2\u003c/a\u003e\u003c/center\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n\u003ctr\u003e\n      \u003ctd\u003e\u003ccenter\u003ejcorrector_macbert_v3\u003c/center\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ccenter\u003e🤗\u003ca href=\"https://huggingface.co/jiangnanboy/jcorrector_macbert_v3\"\u003ejcorrector_macbert_v3\u003c/a\u003e\u003c/center\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\n\u003c/table\u003e\n\u003c/body\u003e\n\u003c/html\u003e\n   \n   1.examples/dl/MacBertCorrect\n    \n   ```\n    String text = \"今天新情很好。\";\n    Pair\u003cBertTokenizer, Map\u003cString, OnnxTensor\u003e\u003e pair = null;\n    try {\n        pair = parseInputText(text);\n    } catch (Exception e) {\n        e.printStackTrace();\n    }\n    var predString = predCSC(pair);\n    List\u003cPair\u003cString, String\u003e\u003e resultList = getErrors(predString, text);\n    for(Pair\u003cString, String\u003e result : resultList) {\n        System.out.println(text + \" =\u003e \" + result.getLeft() + \" \" + result.getRight());\n    }\n   ```\n    \n   2.output\n   \n   ```\n    String text = \"今天新情很好。\";\n    \n    tokens -\u003e [[CLS], 今, 天, 新, 情, 很, 好, 。, [SEP]]\n    今天新情很好。 =\u003e 今天心情很好。 新,心,2,3\n    \n    String text = \"你找到你最喜欢的工作，我也很高心。\";\n    \n    tokens -\u003e [[CLS], 你, 找, 到, 你, 最, 喜, 欢, 的, 工, 作, ，, 我, 也, 很, 高, 心, 。, [SEP]]\n    你找到你最喜欢的工作，我也很高心。 =\u003e 你找到你最喜欢的工作，我也很高兴。 心,兴,15,16\n   ```\n\n## Todo\n\n此项目会持续优化，后期会持续尝试加入一些其它的纠错模型，特别是深度学习方面。\n\n(1).升级有漏洞的jar包\n\n(2).剔除长期没更新有漏洞的jar包，重写相关方法\n\n(3).对berttokenizer进行改写\n\n## contact\n\n如有搜索、推荐、nlp以及大数据挖掘等问题或合作，可联系我：\n\n1、我的github项目介绍：https://github.com/jiangnanboy\n\n2、我的博客园技术博客：https://www.cnblogs.com/little-horse/\n\n3、我的QQ号:2229029156\n\n## Cite\n\n如果你在研究中使用了jcorrector，请按如下格式引用：\n\n```latex\n@{jcorrector,\n  author = {Shi Yan},\n  title = {jcorrector: Text Error Correction Tool},\n  year = {2022},\n  url = {https://github.com/jiangnanboy/jcorrector},\n}\n```\n\n## License\njcorrector 的授权协议为 Apache License 2.0，可免费用做商业用途。请在产品说明中附加jcorrector的链接和授权协议。\n\n## Reference\nhttps://github.com/shibing624/pycorrector\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiangnanboy%2Fjcorrector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjiangnanboy%2Fjcorrector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjiangnanboy%2Fjcorrector/lists"}