Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/parneet-sandhu/nlp-kaggle-competition
Natural Language Processing with Disaster Tweets Predict which Tweets are about real disasters and which ones are not.I achieved Rank 15 with 0.87618 accuracy.
https://github.com/parneet-sandhu/nlp-kaggle-competition
Last synced: about 1 month ago
JSON representation
Natural Language Processing with Disaster Tweets Predict which Tweets are about real disasters and which ones are not.I achieved Rank 15 with 0.87618 accuracy.
- Host: GitHub
- URL: https://github.com/parneet-sandhu/nlp-kaggle-competition
- Owner: Parneet-Sandhu
- Created: 2024-07-27T15:13:17.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-20T02:55:00.000Z (5 months ago)
- Last Synced: 2024-08-21T06:02:30.508Z (5 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 1.81 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NLP-Kaggle-Competition
I achieved Rank 15 with 0.87618 accuracy.
# Natural Language Processing with Disaster Tweets
Predict which Tweets are about real disasters and which ones are not# Competition Description
Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies). But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:
The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.
In this competition, you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. You’ll have access to a dataset of 10,000 tweets that were hand classified. If this is your first time working on an NLP problem, we've created a quick tutorial to get you up and running.
Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.# Submission File
For each ID in the test set, you must predict 1 if the tweet is describing a real disaster, and 0 otherwise. The file should contain a header and have the following format:#id,target
(0,0)
(2,0).....# Visualization Using YData Profiling
https://github.com/user-attachments/assets/9557c7ba-3b04-4f0a-b531-aec2c28572c1
Using YData Profiling, I was able to quickly get a complete understanding of the dataset. The tool gave me a clear snapshot of important statistics, highlighted any data quality issues like missing values or duplicates, and provided visualizations that made it easier to see how the data was distributed.