{"id":13470313,"url":"https://github.com/NLPchina/ansj_seg","last_synced_at":"2025-03-26T11:31:25.700Z","repository":{"id":4542578,"uuid":"5683140","full_name":"NLPchina/ansj_seg","owner":"NLPchina","description":"ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典","archived":false,"fork":false,"pushed_at":"2023-11-19T06:15:23.000Z","size":352518,"stargazers_count":6518,"open_issues_count":53,"forks_count":2314,"subscribers_count":654,"default_branch":"master","last_synced_at":"2025-03-25T22:09:56.156Z","etag":null,"topics":["ansj","chinese","java","nlp"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NLPchina.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2012-09-05T05:56:24.000Z","updated_at":"2025-03-24T09:37:12.000Z","dependencies_parsed_at":"2023-12-25T13:06:43.189Z","dependency_job_id":null,"html_url":"https://github.com/NLPchina/ansj_seg","commit_stats":{"total_commits":602,"total_committers":38,"mean_commits":"15.842105263157896","dds":0.3438538205980066,"last_synced_commit":"50787ca2c7031d2a18c5393a97d1a54432790785"},"previous_names":["ansjsun/ansj_seg"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPchina%2Fansj_seg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPchina%2Fansj_seg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPchina%2Fansj_seg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NLPchina%2Fansj_seg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NLPchina","download_url":"https://codeload.github.com/NLPchina/ansj_seg/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245644307,"owners_count":20649175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansj","chinese","java","nlp"],"created_at":"2024-07-31T16:00:28.195Z","updated_at":"2025-03-26T11:31:25.690Z","avatar_url":"https://github.com/NLPchina.png","language":"Java","readme":"Ansj中文分词\n==================\n\n [![1.X Build Status](https://travis-ci.org/NLPchina/ansj_seg.svg?branch=master)](https://travis-ci.org/NLPchina/ansj_seg) [![Gitter](https://badges.gitter.im/NLPchina/ansj_seg.svg)](https://gitter.im/NLPchina/ansj_seg?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge)\n\n\n#####  使用帮助\n* 开发文档：[3.x版本及之前](http://nlpchina.github.io/ansj_seg/)，[5.x版本及之后](https://github.com/NLPchina/ansj_seg/wiki)\n\n\n\n##### 摘要\n\u003e 这是一个基于n-Gram+CRF+HMM的中文分词的java实现。\n\n\u003e 分词速度达到每秒钟大约200万字左右（mac air下测试），准确率能达到96%以上。\n\n\u003e 目前实现了中文分词、中文姓名识别、用户自定义词典、关键字提取、自动摘要、关键字标记等功能。\n\n\u003e 可以应用到自然语言处理等方面，适用于对分词效果要求高的各种项目。\n\n\n\n\n#####  maven\n\n````\n        \n        \u003cdependency\u003e\n            \u003cgroupId\u003eorg.ansj\u003c/groupId\u003e\n            \u003cartifactId\u003eansj_seg\u003c/artifactId\u003e\n            \u003cversion\u003e5.1.1\u003c/version\u003e\n        \u003c/dependency\u003e\n    \n````\n\n#####  调用demo\n\n如果你第一次下载只想测试测试效果可以调用这个简易接口\n\n\u003cpre\u003e\u003ccode\u003e\n String str = \"欢迎使用ansj_seg,(ansj中文分词)在这里如果你遇到什么问题都可以联系我.我一定尽我所能.帮助大家.ansj_seg更快,更准,更自由!\" ;\n System.out.println(ToAnalysis.parse(str));\n \n ﻿欢迎/v,使用/v,ansj/en,_,seg/en,,,(,ansj/en,中文/nz,分词/n,),在/p,这里/r,如果/c,你/r,遇到/v,什么/r,问题/n,都/d,可以/v,联系/v,我/r,./m,我/r,一定/d,尽我所能/l,./m,帮助/v,大家/r,./m,ansj/en,_,seg/en,更快/d,,,更/d,准/a,,,更/d,自由/a,!\n\u003c/code\u003e\u003c/pre\u003e\n\n\n#####  Join Us\n\n想了很久，不管有没有人帮忙吧。我写上来，如果你有兴趣，有热情可以联系我。\n\n* 补充文档，增加调用实例和说明\n* 增加一些规则性Recognition，举例[身份证号码识别](https://github.com/NLPchina/ansj_seg/blob/master/src/main/java/org/ansj/recognition/impl/IDCardRecognition.java)，目前未完成的有 `时间识别`，`IP地址识别`，`邮箱识别`，`网址识别`，`词性识别`等...\n* 提供更加优化的CRF模型。替换ansj的默认模型。\n* 补充测试用例，n多地方测试不完全。如果你有兴趣可以帮忙啦！\n* 重构人名识别模型。增加机构名识别等模型。\n* 增加句法文法分析\n* 实现lstm的分词方式\n* 拾遗补漏...\n\n","funding_links":[],"categories":["Java","搜索相关","其他_NLP自然语言处理","人工智能","Chinese NLP Toolkits 中文NLP工具"],"sub_categories":["其他_文本生成、文本对话","Chinese Word Segment 中文分词"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNLPchina%2Fansj_seg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNLPchina%2Fansj_seg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNLPchina%2Fansj_seg/lists"}