Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/elmezianech/kaggle-competition-nlp-disastertweets

🚀 Welcome to my Kaggle submission for "Natural Language Processing with Disaster Tweets." In this challenge, we explore tweets, using NLP to distinguish between those about real disasters and those that aren't. The goal is to build a robust model for accurate disaster-related tweet prediction. 🏆 Impressive F1 score of 0.79926 on the public leader
https://github.com/elmezianech/kaggle-competition-nlp-disastertweets

disaster-tweets kaggle kaggle-competition machine-learning nlp

Last synced: about 8 hours ago
JSON representation

🚀 Welcome to my Kaggle submission for "Natural Language Processing with Disaster Tweets." In this challenge, we explore tweets, using NLP to distinguish between those about real disasters and those that aren't. The goal is to build a robust model for accurate disaster-related tweet prediction. 🏆 Impressive F1 score of 0.79926 on the public leader

Awesome Lists containing this project

README

        

# Kaggle-Competition-NLP-disasterTweets

Competition Overview:

🚀 Welcome to my submission for the Kaggle competition "Natural Language Processing with Disaster Tweets." In this challenge, we delve into the fascinating world of tweets, leveraging Natural Language Processing (NLP) to distinguish between tweets about real disasters and those that aren't. The objective is to create a robust model that excels in predicting disaster-related content.

Achievements:

🏆 Attained an impressive F1 score of 0.79926 on the public leaderboard.
🌟 Demonstrated the efficacy of SVM in accurately identifying disaster-related tweets.

Link : https://www.kaggle.com/code/elmezianech/notebook86470c7043

Solution Highlights:

🔍 Data Exploration & Preprocessing:

Meticulous handling of missing values, including the creation of a 'has_location' binary feature.
Strategic handling of NaN values in the 'keyword' column, ensuring data integrity.

✨ Text Preprocessing Mastery:

Utilized NLTK for advanced text preprocessing, encompassing URL removal, special character handling, punctuation removal, tokenization, and stemming.

🌐 Feature Extraction with TF-IDF:

Extracted meaningful features from processed text using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization.
Ensured the model comprehends the significance of each word in tweets.

🚄 Modeling with SVM:

Employed a robust Support Vector Machine (SVM) model for classification.
Fine-tuned hyperparameters using GridSearchCV, optimizing the SVM configuration.

Next Steps:

Open to collaborative discussions and feedback for continuous improvement.
🙌 Happy coding! 🚀📊 #NLP #KaggleCompetition #DisasterTweetsPrediction #DataScienceWin