An open API service indexing awesome lists of open source software.

https://github.com/kemingy/handict


https://github.com/kemingy/handict

chinese-word-segmentation mmseg tokenization tokenizer

Last synced: 22 days ago
JSON representation

Awesome Lists containing this project

README

        

# Handict

![Python package](https://github.com/kemingy/handict/workflows/Python%20package/badge.svg)

Yet another word segmentation tools.

## Tutorial

**Install**: `pip install handict`

```python
from handict import Handict

han = Handict('path_of_user_dict_file') # the same format as jieba dict
han.segment('中文自然语言处理太难了')
```

## References

* http://technology.chtsai.org/mmseg/
* http://yongsun.me/2013/06/simple-implementation-of-mmseg-with-python/