https://github.com/danieldacosta/disaster-webapp
The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.
https://github.com/danieldacosta/disaster-webapp
disaster-messages random-forest-classifier sklearn webapplication
Last synced: 11 months ago
JSON representation
The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.
- Host: GitHub
- URL: https://github.com/danieldacosta/disaster-webapp
- Owner: DanielDaCosta
- License: mit
- Created: 2020-04-24T00:40:41.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-10-29T03:00:13.000Z (over 1 year ago)
- Last Synced: 2024-10-29T04:16:41.734Z (over 1 year ago)
- Topics: disaster-messages, random-forest-classifier, sklearn, webapplication
- Language: Python
- Homepage:
- Size: 10.7 MB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Multi Label Text Classifier Project
The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Scikit-learn.
The dataset consists of disaster messages that are classified into 36 different classes. The model aims to classify an input message into these different classes.
A Web Application was developed, allowing you to analyze the dataset and write your own message to be classified.
## Dataset
The dataset consists of disaster messages classified into 36 different classes. The dataset is highly imbalanced, with different distributions for each class. To reduce this problem, a class-weighted approach was used, where we made the classifier aware of the imbalanced data by incorporating the weights of classes into the cost function.
In the **Random Forest** model, the parameter *class_weight* was set to *'balanced'*, using the values of y to automatically adjust weights inversely proportional to class frequencies in the input data
## Web Application

**Message Classifier**

## Usage
- data/ : ETL folder. Data preparation. To load the data from scratch:
```python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db ```
- models/ : Machine Learning models. To train the model:
```python train_classifier.py ../data/DisasterResponse.db classifier.pkl```
- app/ : Contains the scripts for the web application. In order to run de application go into the app/ folder an run the command:
``` python run.py```
### File Structure
```
.
├── LICENSE
├── README.md
├── app
│ ├── run.py # Flask file that runs app
│ └── templates
│ ├── go.html # classification result page of web app
│ └── master.html # main page of web app
├── data
│ ├── DisasterResponse.db # database to save clean data to
│ ├── disaster_categories.csv # data to process
│ ├── disaster_messages.csv # data to process
│ └── process_data.py
├── models
│ ├── classifier.pkl # saved model
│ └── train_classifier.py
└── requirements.txt
```
## Installation
```
pip install -r requirements.py
```
## Development
Other models of architectures were also explored. You can check the solution for the same problem using **RNN with keras** in this other GitHub Repo: [Multi-Label Text classification problem with Keras](https://github.com/DanielDaCosta/RNN-Keras/blob/master/ML-Pipeline-RNN.ipynb)
## Acknowledgments and References
Special thanks to [Figure Eight](https://appen.com/) for the dataset.
- https://towardsdatascience.com/another-twitter-sentiment-analysis-bb5b01ebad90
- https://www.kaggle.com/gunesevitan/nlp-with-disaster-tweets-eda-cleaning-and-bert#3.-Target-and-N-grams
- https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9