https://github.com/tanyuqian/learning-data-manipulation
NeurIPS 2019 - Learning Data Manipulation for Augmentation and Weighting
https://github.com/tanyuqian/learning-data-manipulation
bert data-augmentation data-manipulation meta-learning
Last synced: 6 months ago
JSON representation
NeurIPS 2019 - Learning Data Manipulation for Augmentation and Weighting
- Host: GitHub
- URL: https://github.com/tanyuqian/learning-data-manipulation
- Owner: tanyuqian
- Created: 2019-10-24T01:56:10.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-09-05T23:17:25.000Z (over 5 years ago)
- Last Synced: 2025-04-14T12:57:15.473Z (about 1 year ago)
- Topics: bert, data-augmentation, data-manipulation, meta-learning
- Language: Python
- Homepage:
- Size: 75.2 KB
- Stars: 109
- Watchers: 4
- Forks: 16
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Learning Data Manipulation
This repo contains preliminary code of the following paper:
[Learning Data Manipulation for Augmentation and Weighting](http://www.cs.cmu.edu/~zhitingh/data/neurips19_data_manip_preprint.pdf)
Zhiting Hu*, Bowen Tan*, Ruslan Salakhutdinov, Tom Mitchell, Eric P. Xing
NeurIPS 2019 (equal contribution)
## Requirements
- `python3.6`
- `pytorch==1.0.1`
- `pytorch_pretrained_bert==0.6.1`
- `torchvision==0.2.2`
## Code
* ```baseline_main.py```: Vanilla BERT Classifier.
* ```ren_main.py```: Described in [(Ren et al.)](https://arxiv.org/pdf/1803.09050.pdf).
* ```weighting_main.py```: Our weighting algorithm.
* ```augmentation_main.py```: Our augmentation algorithm.
## Running
Running scripts for experiments are available in [scripts/](scripts/).
## Results
All the detailed training logs are availble in [results/](results/).
*(Note: The result numbers may be slightly different from those in the paper due to slightly different implementation details and random seeds, while the improvements over comparison methods are consistent.)*
### low data
##### SST-5
|Base Model: BERT|Ren et al.| Weighting | Augmentation |
|:-:|:-:|:-:|:-:|
| 33.32 ± 4.04 | 36.09 ± 2.26 | 36.51 ± 2.54 | 37.55 ± 2.63 |
##### CIFAR-10
| | Pretrained | Not Pretrained |
|------------------|----------------|----------------|
|Base Model: ResNet| 34.58 ± 4.13 | 24.68 ± 3.29 |
| Ren et al. | 23.29 ± 5.95 | 22.26 ± 2.80 |
| Weighting | 36.75 ± 3.09 | 26.47 ± 1.69 |
### imbalanced data
##### SST-2
|| 20 : 1000 | 50 : 1000 | 100 : 1000
|:-:|:-:|:-:|:-:|
|Base Model: BERT| 54.91 ± 5.98 | 67.73 ± 9.20 | 75.04 ± 4.51 |
|Ren et al.| 74.61 ± 3.54 | 76.89 ± 5.07 | 80.73 ± 2.19 |
|Weighting| 75.08 ± 4.98 | 79.35 ± 2.59 | 81.82 ± 1.88 |
##### CIFAR-10
| | 20 : 1000 | 50 : 1000 | 100 : 1000 |
|------------------|--------------|--------------|--------------|
|Base Model: ResNet| 70.65 ± 4.98 | 79.52 ± 4.81 | 86.12 ± 3.37 |
| Ren et al. | 76.68 ± 5.35 | 77.34 ± 7.38 | 78.57 ± 5.61 |
| Weighting | 79.07 ± 5.02 | 82.65 ± 5.13 | 87.63 ± 3.72 |