https://github.com/cshum/levi-chinese
Chinese text processing plugins for Levi
https://github.com/cshum/levi-chinese
Last synced: about 1 month ago
JSON representation
Chinese text processing plugins for Levi
- Host: GitHub
- URL: https://github.com/cshum/levi-chinese
- Owner: cshum
- License: mit
- Created: 2015-09-10T03:07:39.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-09-19T02:57:47.000Z (over 9 years ago)
- Last Synced: 2025-03-01T17:06:42.710Z (about 2 months ago)
- Language: JavaScript
- Homepage:
- Size: 230 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Levi Chinese
Chinese text processing plugins for [Levi](https://github.com/cshum/levi).
[](https://travis-ci.org/cshum/levi-chinese)
Levi Chinese aims to facilitate Chinese support in [Levi](https://github.com/cshum/levi) full-text search.
This is under active development but I am no expert in Chinese NLP.
Any comments or PRs are appreciated.```
npm install levi-chinese
```Levi Chinese provides text processing plugins `chinese.converter()` and `chinese.segmenter()`.
Mount them under the default plugins of Levi.```js
var levi = require('levi')
var chinese = require('levi-chinese')var lv = levi('db')
.use(levi.tokenizer())
.use(levi.stemmer())
.use(levi.stopword())
.use(chinese.converter()) // chinese plugin
.use(chinese.segmenter()) // chinese pluginlv.pipeline('Lorem Ipsum is dummy text我是拖拉機學院手扶拖拉機專業的。', function (err, tokens) {
// tokens
['lorem', 'ipsum', 'dummi', 'text',
'手扶拖拉机', '拖拉机', '学院', '专业' ]
})
```### chinese.converter()
Convert Traditional Chinese into Simplified Chinese text tokens.
Based on dictionary from [Tongwen](http://tongwen.openfoundry.org/)### chinese.segmenter()
Chinese words segmentation using [nodejieba](https://github.com/yanyiwu/nodejieba).
This requires native bindings so it only works on Node.js.## License
MIT