Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ngxbac/Kaggle-Jigsaw
https://github.com/ngxbac/Kaggle-Jigsaw
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/ngxbac/Kaggle-Jigsaw
- Owner: ngxbac
- Created: 2019-05-31T06:25:03.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-06-27T06:35:08.000Z (over 5 years ago)
- Last Synced: 2024-08-01T13:32:19.287Z (3 months ago)
- Language: Python
- Size: 115 KB
- Stars: 24
- Watchers: 4
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Kaggle-Jigsaw
Our solution is described in [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/97425#latest-562308)
# Preprocessing
### Extract data for BERT/GPT2/XLNET
```bash
bash bin/extract_data.sh
```### Extract features (11 features)
```bash
bash bin/extract_features.sh
```### Create targets
```bash
bash bin/extract_target.sh
```# Train models
```bash
seed=17493
depth=12 #11, 12 for Bert base, 23, 24 for Bert large
maxlen=220
batch_size=32
accumulation_steps=4
model_name=bert #gpt2, xlnetCUDA_VISIBLE_DEVICES=3 python main_catalyst.py train --seed=$seed \
--depth=$depth \
--maxlen=$maxlen \
--batch_size=$batch_size \
--accumulation_steps=$accumulation_steps \
--model_name=$model_name
```# Predictions
Change the settings as same as training phase.
Ex:
```
seed=17493
depth=12
maxlen=220
batch_size=32
accumulation_steps=4
model_name=bert
```Then
```bash
python make_submission.py
```