Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gatenlp/toxicclassifier

Toxic Classifier
https://github.com/gatenlp/toxicclassifier

Last synced: 7 days ago
JSON representation

Toxic Classifier

Host: GitHub
URL: https://github.com/gatenlp/toxicclassifier
Owner: GateNLP
License: apache-2.0
Created: 2021-01-24T20:12:11.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2021-10-13T16:41:40.000Z (over 3 years ago)
Last Synced: 2024-12-06T22:35:50.693Z (about 1 month ago)
Language: Python
Size: 8.32 MB
Stars: 0
Watchers: 13
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # ToxicClassifier

Toxic Classifiers developed for the GATE Cloud. Two models are available:

- kaggle: trained on the [Kaggle Toxic Comments Challenge dataset](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview).

- olid: trained on the [OLIDv1 dataset from OffensEval 2019](https://sites.google.com/site/offensevalsharedtask/olid) ([paper](https://aclanthology.org/N19-1144/))

We fine-tuned a `Roberta-base` model using the [`simpletransformers`](https://simpletransformers.ai/) toolkit.

## Requirements

`python=3.8`

`pandas`

`tqdm`

`pytorch`

`simpletransformers`

## Pre-defined environments

conda

`conda env create -f environment.yml`

pip

`pip install -r requirements.txt`

(if the above does not work or if you want to use GPUs, you can try to follow the installation steps of `simpletransformers`: [https://simpletransformers.ai/docs/installation/](https://simpletransformers.ai/docs/installation/)

## Models

1. Download models from [the latest release of this repository](https://github.com/GateNLP/ToxicClassifier/releases/latest) (currently available `kaggle.tar.gz`, `olid.tar.gz`)

2. Decompress file inside `models/en/` (which will create `models/en/kaggle` or `models/en/olid` respectively)

## Basic Usage

`python __main__.py -t "This is a test"` (should return 0 = non-toxic)

`python __main__.py -t "Bastard!"` (should return 1 = toxic)

## Options

- `t`: text

- `l`: language (currently only supports "en")

- `c`: classifier (currently supports "kaggle" and "olid" -- default="kaggle")

- `g`: gpu (default=False)

## Output

The output is composed by the predicted class and the probabilities of each class. 

## REST Service

Pre-built Docker images are available for a REST service that accepts text and returns a classification according to the relevant model - see the "packages" section for more details.