Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hrolive/disaster-response-pipeline
A machine learning pipeline that categorizes disaster related messages so that they can be sent to the appropriate disaster relief agency
https://github.com/hrolive/disaster-response-pipeline
flask machine-learning natural-language-processing nltk pandas plotly python scikit-learn sql sqlalchemy
Last synced: 25 days ago
JSON representation
A machine learning pipeline that categorizes disaster related messages so that they can be sent to the appropriate disaster relief agency
- Host: GitHub
- URL: https://github.com/hrolive/disaster-response-pipeline
- Owner: HROlive
- Created: 2020-04-12T17:19:20.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-11T15:13:31.000Z (over 3 years ago)
- Last Synced: 2024-11-09T13:32:31.861Z (3 months ago)
- Topics: flask, machine-learning, natural-language-processing, nltk, pandas, plotly, python, scikit-learn, sql, sqlalchemy
- Language: Jupyter Notebook
- Homepage:
- Size: 9.34 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Disaster Response Pipeline
![Banner](screenshots/banner.jpg)
## Table of Contents
1. [Description](#description)
2. [Dependencies](#dependencies)
3. [Installing](#installation)
4. [Executing Program](#execution)
5. [Exploration Files](#exfiles)
6. [Web App Pipeline Files](#appfiles)
7. [Screenshots](#screenshots)In this project, we analyze disaster data from Figure Eight(a dataset that contains real messages and tweets that were sent during disaster events). We will also build a model for an API that categorizes these event messages so that they can sent to an appropriate disaster relief agency.
We also developed a include a web app where an emergency worker can input a new message and get classification results in several categories and display visualizations of the data.
This project is divided in the following key sections:
1. An ETL pipeline that loads the messages and categories datasets, merges the two datasets, cleans the dataand stores it in a SQLite database;
2. A machine learning pipeline that uses NLTK, as well as scikit-learn's Pipeline and GridSearchCV to output a final model that predicts the classes for 36 different categories (multi-output classification);
3. A web app that shows the classification results and visuzlizations for any user given input, in real time.
## Dependencies
* Python 3.5+
* Machine Learning Libraries: NumPy, Pandas, Sciki-Learn
* Natural Language Process Libraries: NLTK
* SQLlite Database Libraqries: SQLalchemy
* Model Loading and Saving Library: Pickle
* Web App and Data Visualization: Flask, Plotly
## Installing
To clone the git repository:
```
git clone https://github.com/HROlive/disaster-response-pipeline.git
```
## Executing Program
1. You can run the following commands in the project's directory to set up the database, train model and save the model.- To run ETL pipeline to clean data and store the processed data in the database
`python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/disaster_response_db.db`
- To run the ML pipeline that loads data from DB, trains classifier and saves the classifier as a pickle file
`python models/train_classifier.py data/disaster_response_db.db models/classifier.pkl`2. Run the following command in the app's directory to run your web app.
`python run.py`3. Go to http://0.0.0.0:3001/
In the **data** and **models** folder you can find two jupyter notebook that will help you understand how the model works step by step:
1. **ETL Preparation Notebook**: learn everything about the implemented ETL pipeline
2. **ML Pipeline Preparation Notebook**: look at the Machine Learning Pipeline developed with NLTK and Scikit-LearnYou can use **ML Pipeline Preparation Notebook** to re-train the model or tune it through a dedicated Grid Search section.
## Web App Pipeline Files
**app/templates/***: templates/html files for web app**data/process_data.py**: Extract Train Load (ETL) pipeline used for data cleaning, feature extraction, and storing data in a SQLite database
**models/train_classifier.py**: A machine learning pipeline that loads data, trains a model, and saves the trained model as a .pkl file for later use
**app/run.py**: This file can be used to launch the Flask web app used to classify disaster messages
1. The homepage shows some graphs about training dataset, provided by Figure Eight
![Homepage](screenshots/homepage.png)
2. After we input the desired message and click **Classify Message**, we can see the categories which the message belongs to highlighted in green
![Corona](screenshots/corona_food_test.png)
![Rain](screenshots/rain_test.png)