https://github.com/ritvik19/toxic-comment-classification

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/ritvik19/toxic-comment-classification
Owner: Ritvik19
Created: 2020-05-10T06:53:40.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-07-28T16:48:19.000Z (almost 6 years ago)
Last Synced: 2025-03-16T19:48:25.314Z (about 1 year ago)
Language: Jupyter Notebook
Size: 9.7 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Toxic-Comment-Classification

Discussing things you care about can be difficult. The threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments.

Some characteristics that can signify that a text is toxic:

* Has a non-neutral tone
* Has an exaggerated tone to underscore a point about a group of people
* Is rhetorical and meant to imply a statement about a group of people
* Is disparaging or inflammatory
* Suggests a discriminatory idea against a protected class of people, or seeks confirmation of a stereotype
* Makes disparaging attacks/insults against a specific person or group of people
* Based on an outlandish premise about a group of people
* Disparages against a characteristic that is not fixable and not measurable
* Isn't grounded in reality
* Based on false information, or contains absurd assumptions
* Uses sexual content (incest, bestiality, pedophilia) for shock value

**Problem Statement:** to build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate

**Sources:** [Kaggle-Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/) and [Kaggle-Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/)

**Project Objective:** a model to prerform advanced sentiment analysis

___

### Approach Summary

**Performance Measure:** Area Under Receiver Operating Characteristic

**Feature Extraction:** Sublinear Smoothed TFIDF

**Algorithm:** OVR Logistic Regression

___

### Performance Summary

Approach | Algorithm | Mean AUROC | Mean Accuracy | Mean F1
:---|:---|---:|---:|---:
Sampled Data | Logistic Regression | 0.9745 | 0.7812 | 0.8926
Sampled Data | Bagging Classifier | 0.9680 | 0.7616 | 0.8793
Complete Data | Logistic Regression | 0.9717 | 0.8687 | 0.9046
Complete Data | Stacking Classifier | 0.9729 | 0.7940 | 0.8903

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ritvik19/toxic-comment-classification

Awesome Lists containing this project

README