Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/liuzl/ling
Natural Language Processing Toolkit in Golang
https://github.com/liuzl/ling
corenlp lemmatization nlp normalization opencc spacy tokenization
Last synced: 4 months ago
JSON representation
Natural Language Processing Toolkit in Golang
- Host: GitHub
- URL: https://github.com/liuzl/ling
- Owner: liuzl
- License: apache-2.0
- Created: 2018-02-06T06:07:13.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-05-09T03:11:08.000Z (over 4 years ago)
- Last Synced: 2024-09-29T13:01:16.980Z (4 months ago)
- Topics: corenlp, lemmatization, nlp, normalization, opencc, spacy, tokenization
- Language: Go
- Homepage:
- Size: 496 KB
- Stars: 63
- Watchers: 4
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ling is a golang toolkit for natural language processing
[![GoDoc](https://godoc.org/github.com/liuzl/ling?status.svg)](https://godoc.org/github.com/liuzl/ling)[![Go Report Card](https://goreportcard.com/badge/github.com/liuzl/ling)](https://goreportcard.com/report/github.com/liuzl/ling)# Implementation references
## Similar NLP tools
* [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/index.html) Java
* [spaCy](https://spacy.io/) Python
* [lingo](https://github.com/chewxy/lingo) Golang
## Multilingual text toknization
* [Unicode Standard Annex #29](http://www.unicode.org/reports/tr29/)
* [blevesearch segment](https://github.com/liuzl/segment)
## Text normalization
* [Text normalization in Go](https://blog.golang.org/normalization)
## Lemmatization
> 词干提取(stemming)和词形还原(lemmatization)* [Stemming and lemmatization](https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html)
* [Lemmatization Lists](http://www.lexiconista.com/datasets/lemmatization/)*Datasets by MBM *
* [The UniMorph Project](https://unimorph.github.io/)
* 中文繁简转换
* [gocc](https://github.com/liuzl/gocc) Golang version OpenCC
* [OpenCC](https://github.com/BYVoid/OpenCC)
* [Chinese-Character Jian<=>Fan converting library in Go](https://github.com/go-cc/cc-jianfan)
* [Traditional and Simplified Chinese Conversion in Go](https://github.com/siongui/gojianfan)
* [Han unification](https://en.wikipedia.org/wiki/Han_unification)
## Tagging
* Regex tagger
* [commonregex](https://github.com/mingrammer/commonregex), a collection of common regular expressions for Go.
* [xurls](https://github.com/mvdan/xurls), a Go package of regex for urls.
## Natural language Detection`getlang` is much slower than `franco`
* [getlang](https://github.com/rylans/getlang)
* [franco](https://github.com/liuzl/franco)
* [test scripts](https://github.com/liuzl/org_name_parser/blob/master/parse/pprof.sh)
* franco: Duration: 5.12s, 26.93%
* getlang: Duration: 11.58s, 59.54%