https://github.com/xinbinhuang/toxicflaskapp
A toxic comment classfication Flask App
https://github.com/xinbinhuang/toxicflaskapp
deep-learning deplyment devops docker flask python test-driven-development text-classification
Last synced: 2 months ago
JSON representation
A toxic comment classfication Flask App
- Host: GitHub
- URL: https://github.com/xinbinhuang/toxicflaskapp
- Owner: xinbinhuang
- License: other
- Created: 2018-12-05T07:30:32.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-12-18T04:50:21.000Z (over 5 years ago)
- Last Synced: 2023-03-04T12:32:35.092Z (about 3 years ago)
- Topics: deep-learning, deplyment, devops, docker, flask, python, test-driven-development, text-classification
- Language: Python
- Size: 36.8 MB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Toxic Comment Classification Flask App
This project was originally developed for the **NLP Project 3 workshop@Vancouver School of AI** - [Github](https://github.com/SchoolofAI-Vancouver/NLP_Project_3) & [Meetup page](https://www.meetup.com/Vancouver-School-of-AI/events/256839305/) - to demonstrate how to deploy a toxic comment text classification model using Heroku and Flask. However, to keep things simple, we built this in a hackthon way and did not put much effort in enforcing good practices.
For fun and still for fun, I am going to take it a step further by adding test-driven development (TDD), DevOps (CI/CD), container (Docker), and deployment (I am sure this was in the original workshop, but I just mention it to make it sounds fancy :).
## Project Overview
Build a classification model that can distinguish between toxic and non-toxic comments and use the model in a real-life application.
The meetups serve as guidance. The goal is for all attendees to build a good machine learning model that can be used in a real-life application. We encourage all attendees to apply creativity to this project. There are no limits.
## Installation Requirements
All code is written in Python. Please use [this guide](http://nbviewer.jupyter.org/github/johannesgiorgis/school_of_ai_vancouver/blob/master/intro_to_data_science_tools/01_introduction_to_conda_and_jupyter_notebooks.ipynb) to get Python and Jupyter Notebook up and running.
## Project Setup
### Training
For those who want to walk though the whole process from training to deployment, you need to download the data to train the model
To download the data, run:
```python
python src/download.py
```
This will download the training data and pre-trained embedding file in :
- ./assets/data/train.csv
- ./assets/embedding/fasttext-crawl-300d-2m/crawl-300d-2M.vec
To train the model, run:
```python
python src/train_classifier.py
```
This will train a pooled GRU with FastText embedding. The text preprocessor and the model will be seriallized and stored in:
- ./assets/model/preprocessor.pkl
- ./assets/model/model.h5
*Note*: This took ~1 hour to train on Intel Core i7-HQ CPU
### Prediction
To get a feeling of doing predictions, run:
```python
python src/predict.py
# output:
# Corgi is stupid - Toxicity: [0.99293655]
# good boy - Toxicity: [0.02075008]
# School of AI is awesome - Toxicity: [0.01223523]
# F**K - Toxicity: [0.90747666]
```
### Deployment
TODO
## Resources
The project uses data from Kaggle's [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge). The data can be found [here](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data).
If you are struggling with implementing some of the concepts discussed at the meetup, check out [the slides notebook](https://github.com/SchoolofAI-Vancouver/NLP_Project_2/blob/master/src/slides.ipynb) as guidance. There are also many [kernels](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/kernels) specific to the toxic comment challenge that you can refer to get some inspiration or help.
Alternatively, ask for assistance on Slack. That's what this community is all about :)
## Meetup Contributors
[Akshi Chaudhary](https://github.com/akshi8)
[Johannes Harmse](https://github.com/johannesharmse)
[Xinbin Huang](https://github.com/xinbinhuang)
[Peter Lin](https://github.com/peter0083)
[Johannes G](https://github.com/johannesgiorgis)