https://github.com/jonad/toxicity_comments

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/jonad/toxicity_comments
Owner: jonad
Created: 2020-06-04T22:57:17.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2022-12-27T15:35:12.000Z (over 3 years ago)
Last Synced: 2025-05-08T21:13:44.331Z (about 1 year ago)
Language: Python
Size: 3.05 MB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 11
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Recurrent Neural Network for sentence-level Text Classification

This project is about building and evaluating recurrent neural network models

for sentence-level text classification. The final models detect toxicity in

short texts as well as the type of toxicity, which include the following

categories: severe toxicity, obscene, identity attack, insult, and threat.

The final models can be used for filtering online posts and comments,

social media policing, and user education.




### Links

- [The deployed models](TODO)

### Sections

- [Dataset Summary](#dataset-summary)

- [Exploratory Data Analysis](#exploratory-data-analysis)

- [Models](#models)

- [Training](#training)

- [Evaluation](#evaluation)

- [Testing](#testing)

## Dataset Summary

[back to top](#sections)

-  1.8+ million user comments dataset was downloaded from the [Kaggle competition labeled 'Jigsaw Unintended Bias in Toxicity Classification'](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification).

- The dataset consists of 1.8+ million user comments that have been hand-labeled by human raters for toxicity levels.

- The dataset also includes the following toxicity types: severe toxicity,  obscene, threat, insult, and identity attack.







## Exploratory Data Analysis

[back to top](#sections)

###  Toxicity class distribution

![](./images/data_distribution.png)

### Correlation heatmap of types







![](./images/correlations.png)







## Models

[back to top](#sections)

### Long Short-Term Memory Model (LSTM)

![](./images/lstm.jpg)




### Bidirectional Long Short-Term Memory Model (BiLSTM)




![](./images/bilstm.jpg)




### BiLSTM with Attention Mechanism

![](./images/attention.jpg)







## Training

[back to top](#sections)

### Learning Curves

![](./images/training.png)







## Evaluation

[back to top](#sections)




### ROC-AUC Toxicity

![](./images/toxicity.png)

### ROC-AUC Severe Toxicity

![](./images/severe_toxicity.png)

### ROC-AUC Obscene

![](./images/obscene.png)

### ROC-AUC Identity Attack

![](./images/identity_attack.png)

### ROC-AUC Insult

![](./images/insult.png)

### ROC-AUC Threat

![](./images/threat.png)




## Testing

[back to top](#sections)




![](./images/test1.png)

![](./images/t2.png)

![](./images/t3.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jonad/toxicity_comments

Awesome Lists containing this project

README