https://github.com/danieldacosta/disaster-webapp

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.
https://github.com/danieldacosta/disaster-webapp

disaster-messages random-forest-classifier sklearn webapplication

Last synced: about 1 year ago
JSON representation

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.

Host: GitHub
URL: https://github.com/danieldacosta/disaster-webapp
Owner: DanielDaCosta
License: mit
Created: 2020-04-24T00:40:41.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2024-10-29T03:00:13.000Z (over 1 year ago)
Last Synced: 2024-10-29T04:16:41.734Z (over 1 year ago)
Topics: disaster-messages, random-forest-classifier, sklearn, webapplication
Language: Python
Homepage:
Size: 10.7 MB
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Multi Label Text Classifier Project
The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Scikit-learn.

The dataset consists of disaster messages that are classified into 36 different classes. The model aims to classify an input message into these different classes.

A Web Application was developed, allowing you to analyze the dataset and write your own message to be classified.

## Dataset
The dataset consists of disaster messages classified into 36 different classes. The dataset is highly imbalanced, with different distributions for each class. To reduce this problem, a class-weighted approach was used, where we made the classifier aware of the imbalanced data by incorporating the weights of classes into the cost function.

In the **Random Forest** model, the parameter *class_weight* was set to *'balanced'*, using the values of y to automatically adjust weights inversely proportional to class frequencies in the input data

## Web Application

**Message Classifier**

## Usage

- data/ : ETL folder. Data preparation. To load the data from scratch:

```python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db ```

- models/ : Machine Learning models. To train the model:

```python train_classifier.py ../data/DisasterResponse.db classifier.pkl```

- app/ : Contains the scripts for the web application. In order to run de application go into the app/ folder an run the command:

``` python run.py```

### File Structure

```
.
├── LICENSE
├── README.md
├── app
│ ├── run.py # Flask file that runs app
│ └── templates
│ ├── go.html # classification result page of web app
│ └── master.html # main page of web app
├── data
│ ├── DisasterResponse.db # database to save clean data to
│ ├── disaster_categories.csv # data to process
│ ├── disaster_messages.csv # data to process
│ └── process_data.py
├── models
│ ├── classifier.pkl # saved model
│ └── train_classifier.py
└── requirements.txt
```

## Installation

```
pip install -r requirements.py
```
## Development
Other models of architectures were also explored. You can check the solution for the same problem using **RNN with keras** in this other GitHub Repo: [Multi-Label Text classification problem with Keras](https://github.com/DanielDaCosta/RNN-Keras/blob/master/ML-Pipeline-RNN.ipynb)

## Acknowledgments and References
Special thanks to [Figure Eight](https://appen.com/) for the dataset.
- https://towardsdatascience.com/another-twitter-sentiment-analysis-bb5b01ebad90
- https://www.kaggle.com/gunesevitan/nlp-with-disaster-tweets-eda-cleaning-and-bert#3.-Target-and-N-grams
- https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/danieldacosta/disaster-webapp

Awesome Lists containing this project

README