https://github.com/myurasov/kaggle-toxic
Experiments with Kaggle "Toxic Comment Classification Challenge" (https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)
https://github.com/myurasov/kaggle-toxic
Last synced: 24 days ago
JSON representation
Experiments with Kaggle "Toxic Comment Classification Challenge" (https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)
- Host: GitHub
- URL: https://github.com/myurasov/kaggle-toxic
- Owner: myurasov
- Created: 2020-12-29T07:56:36.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-02-26T07:50:23.000Z (about 4 years ago)
- Last Synced: 2025-04-06T16:45:34.949Z (27 days ago)
- Language: Jupyter Notebook
- Size: 483 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Code for Toxic Comment Classification Challenge
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/
## Setup
### Download Dataset
`$ docker/docker.sh "src/download_dataset.sh"`
`kaggle.json` with valid API token needs to be placed in the application directory prior to downloading data or uploading submissions.
### Download Pre-trained BERT
`$ docker/docker.sh "src/download_bert.sh"`
### Prepare Dataset for BERT
`$ docker/docker.sh "src/preprocess_for_bert.py"`
### Train BERT-based Classifier
`$ docker/docker.sh "src/train_bert.py [arguments]"`
Options available:
-h, --help
--run RUN
--max_items MAX_ITEMS
--epochs EPOCHS
--batch BATCH
--lr_start LR_START
--val_split VAL_SPLIT
--early_stop_patience EARLY_STOP_PATIENCE
--samples_per_epoch SAMPLES_PER_EPOCHTo run with Horovod:
`$ docker/docker.sh --gpus='"device=0,1,2,###"' "horovodrun -np ### src/train_bert_hvd.py"`
### Generating submission with BERT-based Classifier
`$ docker/docker.sh "src/infer_bert.py"`
Options available:
-h, --help
--model_file MODEL_FILE
--submission_file SUBMISSION_FILE
--batch BATCH
--max_items MAX_ITEMS## Starting Jupyter Lab and TensorBoard
`$ docker/docker-forever.sh [--jupyter_port=####|8888] [--tensorboard_port=####|6006]`
## @see
- https://github.com/google-research/bert
- https://arxiv.org/pdf/1810.04805.pdf - BERT
- https://arxiv.org/pdf/1902.00751.pdf - Adapter-BERT
- https://arxiv.org/pdf/1909.11942.pdf - ALBERT