{"id":19545382,"url":"https://github.com/heschmat/project-disaster-response","last_synced_at":"2025-02-26T05:43:18.474Z","repository":{"id":237254830,"uuid":"347023491","full_name":"heschmat/project-disaster-response","owner":"heschmat","description":"Analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.","archived":false,"fork":false,"pushed_at":"2021-03-14T14:02:27.000Z","size":2145,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-08T19:26:07.129Z","etag":null,"topics":["class-imbalance","deployment","etl-pipeline","machine-learning","machine-learning-pipeline"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heschmat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-12T10:14:26.000Z","updated_at":"2021-03-25T11:14:12.000Z","dependencies_parsed_at":"2024-04-30T15:47:53.385Z","dependency_job_id":"55f5bcaf-df59-4da5-bc9b-9c86f6b0a220","html_url":"https://github.com/heschmat/project-disaster-response","commit_stats":null,"previous_names":["heschmat/project-disaster-response"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heschmat%2Fproject-disaster-response","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heschmat%2Fproject-disaster-response/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heschmat%2Fproject-disaster-response/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heschmat%2Fproject-disaster-response/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heschmat","download_url":"https://codeload.github.com/heschmat/project-disaster-response/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240801040,"owners_count":19859727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["class-imbalance","deployment","etl-pipeline","machine-learning","machine-learning-pipeline"],"created_at":"2024-11-11T03:38:19.325Z","updated_at":"2025-02-26T05:43:18.439Z","avatar_url":"https://github.com/heschmat.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Disaster Response Project\n\nAnalyze disaster data from [Figure Eight][https://appen.com/] to build a model for an API that classifies disaster messages.\n\n\n# Project Components\nThere are three components in this project.\n\n1. ETL Pipeline\n`process_data.py` is responsible for data cleaning pipeline:\n\n- Loads the messages and categories datasets\n- Merges the two datasets\n- Cleans the data\n- Stores it in a SQLite database\n\n2. ML Pipeline\n`train_classifier.py` builds a machine learning pipeline that:\n\n- Loads data from the SQLite database\n- Splits the dataset into training and test sets\n- Builds a text processing and machine learning pipeline\n- Trains and tunes a model using GridSearchCV\n- Outputs results on the test set\n- Exports the final model as a pickle file\n\n3. Flask Web App\nThe results will be displayed in a flask web app. To run the app, go to the `app` directory and run `python run.py`. Then you have to go to the following link: `http://localhost:3001/`.\n```\n(disaster_env) C\u003e python run.py\nData Done!\n * Serving Flask app \"run\" (lazy loading)\n * Environment: production\n   WARNING: This is a development server. Do not use it in a production deployment.\n   Use a production WSGI server instead.\n * Debug mode: on\n * Restarting with stat\nData Done!\n * Debugger is active!\n * Debugger PIN: 101-974-328\n * Running on http://0.0.0.0:3001/ (Press CTRL+C to quit)\n```\n\n\n# Project Interface\n1. To run the app, simply go to the `app` directory and run `python run.py` in the command line.\n\n2. To process a new dataset you need to go to the `datasets` directory. An example to do ETL would be `python process_data.py messages.csv categories.csv DisasterResponse.db`. \nHere, first and 2nd arguments - after the script - are the message and categories dataset. The last argument is the path to save the cleaned data into a database. \n```\n(disaster_env) C\u003e python process_data.py messages.csv categories.csv DisasterResponseDB.db\nLoading data...\n    MESSAGES: messages.csv\n    CATEGORIES: categories.csv\nCleaning data...\nSaving data...\n    DATABASE: DisasterResponseDB.db\nCleaned data saved to database!\n```\n\n3. To train the model, go to the `models` directory. To train the classifier run `python train_classifier.py ../datasets/DisasterResponse.db classifier.pkl`.\nHere, first argument is where the cleaned data stored in the database, and the last argument is the name with which you want to save the newly created classifier. \n\n# Libraries\n- sqlalchemy\n- pandas\n- sklearn\n- flask\n- nltk\n\n# Challenges\nThere are 36 categories in this dataset; 33 of the categories are flaged less than 20%. And 29 of them are flaged less than 10%. For this reason, `accuracy` is not a viable metric to judge the model performance. In this case, depending on the nature of the task, we may want to optimize the performance for `precision` or `recall`; or simply use `f1score` as a singular metric that takes both into account. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheschmat%2Fproject-disaster-response","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheschmat%2Fproject-disaster-response","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheschmat%2Fproject-disaster-response/lists"}