Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mayabot/mynlp
一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)
https://github.com/mayabot/mynlp
fasttext nlp pinyin segment starspace
Last synced: 3 months ago
JSON representation
一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)
- Host: GitHub
- URL: https://github.com/mayabot/mynlp
- Owner: mayabot
- License: apache-2.0
- Created: 2017-12-10T05:35:01.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-12-21T16:14:59.000Z (about 1 year ago)
- Last Synced: 2024-07-27T08:44:12.124Z (6 months ago)
- Topics: fasttext, nlp, pinyin, segment, starspace
- Language: Java
- Homepage: https://mynlp.mayabot.com/
- Size: 6.63 MB
- Stars: 671
- Watchers: 30
- Forks: 90
- Open Issues: 18
-
Metadata Files:
- Readme: README.adoc
- Changelog: CHANGES.md
- License: LICENSE
Awesome Lists containing this project
README
= Mynlp: 高性能、可扩展的中文NLP工具包
:version: 4.0.0
:icons: fontimage:https://img.shields.io/github/license/mayabot/mynlp.svg[]
image:https://maven-badges.herokuapp.com/maven-central/com.mayabot.mynlp/mynlp/badge.svg[link=https://maven-badges.herokuapp.com/maven-central/com.mayabot.mynlp/mynlp]
image:https://img.shields.io/github/release/mayabot/mynlp/all.svg[link=https://github.com/mayabot/mynlp/releases/latest]
image:https://img.shields.io/github/repo-size/mayabot/mynlp[link=https://github.com/mayabot/mynlp]
image:https://img.shields.io/github/issues-raw/mayabot/mynlp.svg[link=https://github.com/mayabot/mynlp/issues]image::https://cdn.mayabot.com/mynlp/mynlp-banner.png[,500,align=center,link=https://mynlp.mayabot.com]
[NOTE]
访问完整在线文档link:https://mynlp.mayabot.com/[ mynlp.mayabot.com]== QQ群(2):747892793
== 安装
该章节介绍如何安装和简单使用mynlp的基础功能。
mynlp已经发布在Maven中央仓库中,所以只需要在Maven或者Gradle中引入mynlp.jar依赖即可。
.Gradle
[subs="attributes+"]
----
compile 'com.mayabot.mynlp:mynlp:{version}'
----.Maven
[source,xml,subs="attributes+"]
----com.mayabot.mynlp
mynlp
{version}----
因为资源文件较大,所以mynlp.jar包默认不包括资源文件(词典和模型文件)依赖。
懒人方案,通过引用mynlp-all依赖默认提供的资源词典,满足大部分需求。
.依赖 mynlp-all
[subs="attributes+"]
----
compile 'com.mayabot.mynlp:mynlp-all:{version}'
----=== 词典和模型资源
.词典&模型资源列表
[cols="6,^1,^1,4"]
|===
|Gradle 坐标 | mynlp-all依赖 |文件大小 |说明|com.mayabot.mynlp.resource:mynlp-resource-coredict:1.0.0
|Y
|18.2M
|核心词典(20w+词,500w+二元)|com.mayabot.mynlp.resource:mynlp-resource-pos:1.0.0
|Y
|17.5M
|词性标注模型(感知机模型)|com.mayabot.mynlp.resource:mynlp-resource-ner:1.0.0
|Y
|13.4M
|命名实体识别(人名识别、其他NER)|com.mayabot.mynlp.resource:mynlp-resource-pinyin:1.1.0
|Y
|272K
|拼音词典、拼音切分模型|com.mayabot.mynlp.resource:mynlp-resource-transform:1.0.0
|Y
|478K
|繁简体词典|com.mayabot.mynlp.resource:mynlp-resource-cws:1.0.0
|N
|62.4M
|感知机分词模型|com.mayabot.mynlp.resource:mynlp-resource-custom:1.0.0
|N
|2.19M
|自定义扩展词库|===
根据实际的需要,按需引入资源包。
[source]
.一个Gradle引用的例子
----
compile 'com.mayabot.mynlp:mynlp:3.2.0'// 核心词典
implementation 'com.mayabot.mynlp.resource:mynlp-resource-coredict:1.0.0'// 词性标注
implementation 'com.mayabot.mynlp.resource:mynlp-resource-pos:1.0.0'// 命名实体
implementation 'com.mayabot.mynlp.resource:mynlp-resource-ner:1.0.0'// 拼音
implementation 'com.mayabot.mynlp.resource:mynlp-resource-pinyin:1.1.0'// 繁简体转换
implementation 'com.mayabot.mynlp.resource:mynlp-resource-transform:1.0.0'// 感知机分词模型
// implementation 'com.mayabot.mynlp.resource:mynlp-resource-cws:1.0.0'// 自定义扩展词库
// implementation 'com.mayabot.mynlp.resource:mynlp-resource-custom:1.0.0'
----== 访问完整在线文档
link:https://mynlp.mayabot.com/[mynlp.mayabot.com]
== 致谢以下优秀开源项目
- HanLP
- ansj_segmynlp实现参考了他们算法实现和部分代码