Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gatenlp/toxicclassifier
Toxic Classifier
https://github.com/gatenlp/toxicclassifier
Last synced: 7 days ago
JSON representation
Toxic Classifier
- Host: GitHub
- URL: https://github.com/gatenlp/toxicclassifier
- Owner: GateNLP
- License: apache-2.0
- Created: 2021-01-24T20:12:11.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-10-13T16:41:40.000Z (over 3 years ago)
- Last Synced: 2024-12-06T22:35:50.693Z (about 1 month ago)
- Language: Python
- Size: 8.32 MB
- Stars: 0
- Watchers: 13
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ToxicClassifier
Toxic Classifiers developed for the GATE Cloud. Two models are available:
- kaggle: trained on the [Kaggle Toxic Comments Challenge dataset](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview).
- olid: trained on the [OLIDv1 dataset from OffensEval 2019](https://sites.google.com/site/offensevalsharedtask/olid) ([paper](https://aclanthology.org/N19-1144/))We fine-tuned a `Roberta-base` model using the [`simpletransformers`](https://simpletransformers.ai/) toolkit.
## Requirements
`python=3.8`
`pandas`
`tqdm`
`pytorch`
`simpletransformers`## Pre-defined environments
conda
`conda env create -f environment.yml`
pip
`pip install -r requirements.txt`
(if the above does not work or if you want to use GPUs, you can try to follow the installation steps of `simpletransformers`: [https://simpletransformers.ai/docs/installation/](https://simpletransformers.ai/docs/installation/)
## Models
1. Download models from [the latest release of this repository](https://github.com/GateNLP/ToxicClassifier/releases/latest) (currently available `kaggle.tar.gz`, `olid.tar.gz`)
2. Decompress file inside `models/en/` (which will create `models/en/kaggle` or `models/en/olid` respectively)## Basic Usage
`python __main__.py -t "This is a test"` (should return 0 = non-toxic)
`python __main__.py -t "Bastard!"` (should return 1 = toxic)
## Options
- `t`: text
- `l`: language (currently only supports "en")
- `c`: classifier (currently supports "kaggle" and "olid" -- default="kaggle")
- `g`: gpu (default=False)## Output
The output is composed by the predicted class and the probabilities of each class.## REST Service
Pre-built Docker images are available for a REST service that accepts text and returns a classification according to the relevant model - see the "packages" section for more details.