An open API service indexing awesome lists of open source software.

https://github.com/gaurav-van/toxic-comment-web_app

Data Science Project to classify a comment into several toxicity categories. This Repository is used for deployment of the project.
https://github.com/gaurav-van/toxic-comment-web_app

classification data-science datacleaning exploratory-data-analysis machine-learning nlp nlp-machine-learning python streamlit

Last synced: about 2 months ago
JSON representation

Data Science Project to classify a comment into several toxicity categories. This Repository is used for deployment of the project.

Awesome Lists containing this project

README

          

# Toxic-Comment-App



Note: This Repository is required for deployment of this project on Streamlit Cloud.




Web App Link :- https://gaurav-van-toxic-comment-web-app-app-24y37c.streamlitapp.com/


Project Repo: https://github.com/Gaurav-Van/Data_Science__Machine_Learning-Projects

Classifying Comments in Six different Categories including their Neutral Cases Using Concepts of NLP and ML
- Toxic
- Severe Toxic
- Threat
- Obscene
- Insult
- Identity Hate


## Concept Used
Instead of Multiclass classification, Binary Classification of Each Category is performed
1. Data Collection - From Kaggle: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge


2. Data Pre-Procesing - Text Pre-Processing Using Regular Expressions

* Removing \n characters
* Removing Aplha-Numeric Characters
* Removing Punctuations
* Removing Non Ascii Characters

3. EDA - Performaing Data analysis to Discover some Issues and trend of the Data

- Through Bar charts of Each Category :- Prob = Class Imbalance -> Solution = Making Frequency of 0s equal to Frequency of 1s by Making Different Dataset of each Category [ id, comment_text, category].
- Helps to solve the Issue of Class Imbalance and Helps in Binary Classification of Each Category

4. Model Building

* VECTORIZATION :- Using TF-IDF and Unigram Approach
* Model Used For Each Category :- KNN, Logistic Regression, SVM, CNB, BNB, DT and RF
* Model Selected/b> - Logistic Regression
* Exporting Trained ML Models as 6 pickle files [ one of each category ]
* Exporting Trained Vectorized Models as 6 pickle files [ one for each category ]

5. Deployment - Building web app with the help of streamlit and deploying it on Streamlit cloud