Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rz-zhang/SeqMix
The repository for our EMNLP'20 paper SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup.
https://github.com/rz-zhang/SeqMix
Last synced: about 1 month ago
JSON representation
The repository for our EMNLP'20 paper SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup.
- Host: GitHub
- URL: https://github.com/rz-zhang/SeqMix
- Owner: rz-zhang
- Created: 2020-09-28T09:42:26.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-09-05T07:20:20.000Z (over 3 years ago)
- Last Synced: 2024-08-03T09:07:26.725Z (5 months ago)
- Language: Python
- Size: 1.85 MB
- Stars: 42
- Watchers: 3
- Forks: 6
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - rz-zhang/SeqMix
- Awesome-Mixup - [Code
README
# SeqMix
The repository of our EMNLP'20 paper
**SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup**
[[paper]](https://rongzhizhang.org/pdf/emnlp20_SeqMix.pdf) [[slides]](https://rongzhizhang.org/slides/EMNLP20_SeqMix_Slides.pdf)
![Illustration of the three variants of SeqMix](SeqMix.png)# Requirements
- pytorch-transformers==1.2.0
- torch==1.2.0
- seqeval==0.0.5
- tqdm==4.31.1
- nltk==3.4.5
- Flask==1.1.1
- Flask-Cors==3.0.8
- pytorch_pretrained_bert==0.6.2Install the required packages:
```
pip install -r requirements.txt
```# Key Parameters
- `data_dir`: specify the data file, we provide CoNLL-03 dataset here
- `max_seq_length`: maximum length of each sequence
- `num_train_epochs`: number of training epochs
- `train_batch_size`: batch size during model training
- `active_policy`: query policy of active learning
- `augment_method`: augmenting method
- `augment_rate`: augmenting rate
- `hyper_alpha`: parameter of Beta distribution# Run
## Active learning part
Random Sampling
```
python active_learn.py --active_policy=random
```
Least Confidence Sampling
```
python active_learn.py --active_policy=lc
```
Normalized Token Entropy sampling
```
python active_learn.py --active_policy=nte
```## Seqmix part
Whole sequence mixup
```
python active_learn.py --augment_method=soft
```
Sub-sequence mixup
```
python active_learn.py --augment_method=slack
```
Label-constrained sub-sequence mixup
```
python active_learn.py --augment_method=lf
```