https://github.com/gaurav-van/toxic-comment-web_app

Data Science Project to classify a comment into several toxicity categories. This Repository is used for deployment of the project.
https://github.com/gaurav-van/toxic-comment-web_app

classification data-science datacleaning exploratory-data-analysis machine-learning nlp nlp-machine-learning python streamlit

Last synced: 2 months ago
JSON representation

Data Science Project to classify a comment into several toxicity categories. This Repository is used for deployment of the project.

Host: GitHub
URL: https://github.com/gaurav-van/toxic-comment-web_app
Owner: Gaurav-Van
Created: 2022-08-20T13:06:45.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-05-10T04:54:24.000Z (about 2 years ago)
Last Synced: 2025-02-02T15:14:08.969Z (over 1 year ago)
Topics: classification, data-science, datacleaning, exploratory-data-analysis, machine-learning, nlp, nlp-machine-learning, python, streamlit
Language: Python
Homepage:
Size: 4.05 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Threat model: threat_model.pkl

Awesome Lists containing this project

README

# Toxic-Comment-App

Note: This Repository is required for deployment of this project on Streamlit Cloud.

Web App Link :- https://gaurav-van-toxic-comment-web-app-app-24y37c.streamlitapp.com/

Project Repo: https://github.com/Gaurav-Van/Data_Science__Machine_Learning-Projects

Classifying Comments in Six different Categories including their Neutral Cases Using Concepts of NLP and ML
- Toxic
- Severe Toxic
- Threat
- Obscene
- Insult
- Identity Hate

## Concept Used
Instead of Multiclass classification, Binary Classification of Each Category is performed
1. Data Collection - From Kaggle: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

2. Data Pre-Procesing - Text Pre-Processing Using Regular Expressions

* Removing \n characters
* Removing Aplha-Numeric Characters
* Removing Punctuations
* Removing Non Ascii Characters

3. EDA - Performaing Data analysis to Discover some Issues and trend of the Data

- Through Bar charts of Each Category :- Prob = Class Imbalance -> Solution = Making Frequency of 0s equal to Frequency of 1s by Making Different Dataset of each Category [ id, comment_text, category].
- Helps to solve the Issue of Class Imbalance and Helps in Binary Classification of Each Category

4. Model Building

* VECTORIZATION :- Using TF-IDF and Unigram Approach
* Model Used For Each Category :- KNN, Logistic Regression, SVM, CNB, BNB, DT and RF
* Model Selected/b> - Logistic Regression
* Exporting Trained ML Models as 6 pickle files [ one of each category ]
* Exporting Trained Vectorized Models as 6 pickle files [ one for each category ]

5. Deployment - Building web app with the help of streamlit and deploying it on Streamlit cloud

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gaurav-van/toxic-comment-web_app

Awesome Lists containing this project

README

Note: This Repository is required for deployment of this project on Streamlit Cloud.