https://github.com/howl-anderson/rasa_chinese
rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件
https://github.com/howl-anderson/rasa_chinese
rasa rasa-chatbot rasa-chinese
Last synced: 4 months ago
JSON representation
rasa_chinese 专门针对中文语言的 rasa 组件扩展包,提供了许多针对中文语言的组件
- Host: GitHub
- URL: https://github.com/howl-anderson/rasa_chinese
- Owner: howl-anderson
- License: apache-2.0
- Created: 2020-12-18T08:37:29.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2023-05-11T15:55:22.000Z (over 2 years ago)
- Last Synced: 2025-03-26T22:11:20.153Z (8 months ago)
- Topics: rasa, rasa-chatbot, rasa-chinese
- Language: Python
- Homepage:
- Size: 15.5 MB
- Stars: 145
- Watchers: 6
- Forks: 36
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# rasa_chinese
rasa_chinese 是专门针对中文语言的 [rasa](https://github.com/RasaHQ/rasa) 组件扩展包。提供了一些针对中文语言的组件。
**本软件包得到了 Rasa 官方的认可,官方博客中推荐中文 Rasa 用户使用: **
## 安装
```bash
pip install rasa_chinese
```
## 当前包含的组件
### LanguageModelTokenizer
基于 HuggingFace's transformers 的分词组件。
pipeline 使用:
```yaml
pipeline:
- name: "rasa_chinese.nlu.tokenizers.lm_tokenizer.LanguageModelTokenizer"
```
LanguageModelTokenizer 的分词方法必须和 LanguageModelFeaturizer 保持一致。
如果用户在 pipeline 中指定了 LanguageModelFeaturizer 的参数,那么也需要为 LanguageModelFeaturizer 设置相同的参数。如下所示:
```yaml
pipeline:
- name: "rasa_chinese.nlu.tokenizers.lm_tokenizer.LanguageModelTokenizer"
# 以下的参数必须和 LanguageModelFeaturizer 的参数保持完全一致
model_name: "roberta"
model_weights: "roberta-base"
- name: LanguageModelFeaturizer
model_name: "roberta"
model_weights: "roberta-base"
```