https://github.com/thinkwee/eda_zh_bert
Chinese version code for the paper "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks"
https://github.com/thinkwee/eda_zh_bert
augmentation bert chinese-nlp eda nlp nlp-toolkit
Last synced: 4 months ago
JSON representation
Chinese version code for the paper "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks"
- Host: GitHub
- URL: https://github.com/thinkwee/eda_zh_bert
- Owner: thinkwee
- Created: 2019-07-25T09:44:10.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-07-25T09:50:32.000Z (almost 6 years ago)
- Last Synced: 2024-12-29T13:24:10.161Z (5 months ago)
- Topics: augmentation, bert, chinese-nlp, eda, nlp, nlp-toolkit
- Language: Python
- Homepage:
- Size: 7.81 KB
- Stars: 11
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 介绍
- 原仓库[eda_nlp](https://github.com/jasonwei20/eda_nlp),原论文[EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks](https://arxiv.org/abs/1901.11196)
- 改成了支持中文,并且默认处理成用于BERT模型输入格式,即句子对,可见样例```/data/original.csv```
- 用于增强数据,包含了同义词替换、随机插入、随机删除、打乱顺序,至少用在BERT+分类任务上是可行的。其他模型和任务未测试# 依赖
- synonyms
- pandas
- 请手动安装# 输入输出样例
- 见```./data/```# 运行
- ```./run_augment.sh```