https://github.com/leftcoastnerdgirl/disaster_tweet_classification_nlp
  
  
    This repository holds the final project for the Natural Language Processing project for the Intel AI Learning and Certification Program. 
    https://github.com/leftcoastnerdgirl/disaster_tweet_classification_nlp
  
decision-tree-classifier knearest-neighbor-classifier knn-classifier logistic-regression nlp pandas random-forest-classifier sklearn sklearn-library support-vector-machine svm-classifier
        Last synced: 11 days ago 
        JSON representation
    
This repository holds the final project for the Natural Language Processing project for the Intel AI Learning and Certification Program.
- Host: GitHub
 - URL: https://github.com/leftcoastnerdgirl/disaster_tweet_classification_nlp
 - Owner: LeftCoastNerdGirl
 - Created: 2024-08-25T21:06:54.000Z (about 1 year ago)
 - Default Branch: main
 - Last Pushed: 2024-08-25T21:35:54.000Z (about 1 year ago)
 - Last Synced: 2025-01-31T02:15:58.119Z (9 months ago)
 - Topics: decision-tree-classifier, knearest-neighbor-classifier, knn-classifier, logistic-regression, nlp, pandas, random-forest-classifier, sklearn, sklearn-library, support-vector-machine, svm-classifier
 - Language: Jupyter Notebook
 - Homepage:
 - Size: 615 KB
 - Stars: 0
 - Watchers: 1
 - Forks: 0
 - Open Issues: 0
 - 
            Metadata Files:
            
- Readme: README.md
 
 
Awesome Lists containing this project
README
          # Classification of Twitter Disaster Tweets
From the Kaggle website:
"Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).  
But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:  

The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine. In this competition, you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t."   
https://www.kaggle.com/competitions/nlp-getting-started/overview  
First step was to download the data files provided and explore them. I used Pandas to load them into dataframes and then view the headings and the data.  
Next I cleaned up the data  
  The keyword and location columns did not add value to our analysis and had many empty fields so I removed them.  
  I used kpgtalkie to clean the data removing characters and information that does not add value to the training such as urls and email addresses, etc  
I made a quick pie chart to confirm that the ratio of real disaster tweets vs metaphorical use of disaster words would give us a good model. I also checked the basic features using the same tool.  
Next I explored which classification model results in the highest accuracy. I downloaded the necessary modules from sklearn, set up the test data, and ran the models showing the confusion matrix for each.  
Results, best to lesser models:  
Logistic Regression: 82.01%  
SVM: 81.81%  
Kernal SVM: 81.09%  
Random Forest: 78.53%  
Decision Tree: 73.15%  
KNN: 65.59%  
Conclusion of the model exploration - Logistic Regression provided the best results however SVM and Random Forest models were successful as well.  
I then trained a model using linear regression and exported the results.