An open API service indexing awesome lists of open source software.

https://github.com/danieldacosta/disaster-webapp

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.
https://github.com/danieldacosta/disaster-webapp

disaster-messages random-forest-classifier sklearn webapplication

Last synced: 11 months ago
JSON representation

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.

Awesome Lists containing this project

README

          

# Multi Label Text Classifier Project
The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Scikit-learn.

The dataset consists of disaster messages that are classified into 36 different classes. The model aims to classify an input message into these different classes.

A Web Application was developed, allowing you to analyze the dataset and write your own message to be classified.

## Dataset
The dataset consists of disaster messages classified into 36 different classes. The dataset is highly imbalanced, with different distributions for each class. To reduce this problem, a class-weighted approach was used, where we made the classifier aware of the imbalanced data by incorporating the weights of classes into the cost function.

In the **Random Forest** model, the parameter *class_weight* was set to *'balanced'*, using the values of y to automatically adjust weights inversely proportional to class frequencies in the input data

## Web Application

**Message Classifier**

## Usage

- data/ : ETL folder. Data preparation. To load the data from scratch:

```python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db ```

- models/ : Machine Learning models. To train the model:

```python train_classifier.py ../data/DisasterResponse.db classifier.pkl```

- app/ : Contains the scripts for the web application. In order to run de application go into the app/ folder an run the command:

``` python run.py```

### File Structure

```
.
├── LICENSE
├── README.md
├── app
│ ├── run.py # Flask file that runs app
│ └── templates
│ ├── go.html # classification result page of web app
│ └── master.html # main page of web app
├── data
│ ├── DisasterResponse.db # database to save clean data to
│ ├── disaster_categories.csv # data to process
│ ├── disaster_messages.csv # data to process
│ └── process_data.py
├── models
│ ├── classifier.pkl # saved model
│ └── train_classifier.py
└── requirements.txt
```

## Installation

```
pip install -r requirements.py
```
## Development
Other models of architectures were also explored. You can check the solution for the same problem using **RNN with keras** in this other GitHub Repo: [Multi-Label Text classification problem with Keras](https://github.com/DanielDaCosta/RNN-Keras/blob/master/ML-Pipeline-RNN.ipynb)

## Acknowledgments and References
Special thanks to [Figure Eight](https://appen.com/) for the dataset.
- https://towardsdatascience.com/another-twitter-sentiment-analysis-bb5b01ebad90
- https://www.kaggle.com/gunesevitan/nlp-with-disaster-tweets-eda-cleaning-and-bert#3.-Target-and-N-grams
- https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9