https://github.com/kemingy/handict
https://github.com/kemingy/handict
chinese-word-segmentation mmseg tokenization tokenizer
Last synced: 22 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kemingy/handict
- Owner: kemingy
- Created: 2019-07-06T07:22:11.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-04-10T02:19:22.000Z (about 5 years ago)
- Last Synced: 2025-03-11T06:54:42.421Z (about 2 months ago)
- Topics: chinese-word-segmentation, mmseg, tokenization, tokenizer
- Language: Python
- Size: 2.1 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Handict

Yet another word segmentation tools.
## Tutorial
**Install**: `pip install handict`
```python
from handict import Handicthan = Handict('path_of_user_dict_file') # the same format as jieba dict
han.segment('中文自然语言处理太难了')
```## References
* http://technology.chtsai.org/mmseg/
* http://yongsun.me/2013/06/simple-implementation-of-mmseg-with-python/