Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/taozhijiang/chinese_correct_wsd
简易的中文纠错和消歧
https://github.com/taozhijiang/chinese_correct_wsd
Last synced: 3 months ago
JSON representation
简易的中文纠错和消歧
- Host: GitHub
- URL: https://github.com/taozhijiang/chinese_correct_wsd
- Owner: taozhijiang
- Created: 2015-08-19T02:31:31.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2015-08-19T02:55:34.000Z (about 9 years ago)
- Last Synced: 2024-05-02T09:15:21.469Z (6 months ago)
- Language: Python
- Size: 691 KB
- Stars: 283
- Watchers: 11
- Forks: 132
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - taozhijiang/chinese_correct_wsd
README
用于用户输入语句的同音自动纠错.
依赖于pynlpir
https://github.com/tsroten/pynlpir数据是抓取的某东客服语料训练的,可以后续把自己抓取的语料共享
给大家训练用./proc.py 训练产生数据
使用方法:
./server.py 或者 /usr/bin/python3 server.py 启动服务端然后./client_run.py 或者 /usr/bin/python3 client_run.py XXX来进行纠错测试
➜ utf-8 ./client_run.py "我想买哥苹果手机" "对京东新人度大打折扣"
原语句:我想买哥苹果手机
纠正句:我想买个苹果手机
原语句:对京东新人度大打折扣
纠正句:对京东信任度大打折扣
➜ utf-8中文词义消歧也是这个思路,但是效果比较差,有时间后续研究。使用到了
同义词词林(没有加入库,utils目录中的只做参考)。