https://github.com/kemingy/handict

chinese-word-segmentation mmseg tokenization tokenizer

Last synced: 22 days ago
JSON representation

Host: GitHub
URL: https://github.com/kemingy/handict
Owner: kemingy
Created: 2019-07-06T07:22:11.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-04-10T02:19:22.000Z (about 5 years ago)
Last Synced: 2025-03-11T06:54:42.421Z (about 2 months ago)
Topics: chinese-word-segmentation, mmseg, tokenization, tokenizer
Language: Python
Size: 2.1 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Handict

![Python package](https://github.com/kemingy/handict/workflows/Python%20package/badge.svg)

Yet another word segmentation tools.

## Tutorial

**Install**: `pip install handict`

```python

from handict import Handict

han = Handict('path_of_user_dict_file') # the same format as jieba dict 

han.segment('中文自然语言处理太难了')

```

## References

* http://technology.chtsai.org/mmseg/

* http://yongsun.me/2013/06/simple-implementation-of-mmseg-with-python/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kemingy/handict

Awesome Lists containing this project

README