Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/parneet-sandhu/nlp-kaggle-competition

Natural Language Processing with Disaster Tweets Predict which Tweets are about real disasters and which ones are not.I achieved Rank 15 with 0.87618 accuracy.
https://github.com/parneet-sandhu/nlp-kaggle-competition

Last synced: about 1 month ago
JSON representation

Natural Language Processing with Disaster Tweets Predict which Tweets are about real disasters and which ones are not.I achieved Rank 15 with 0.87618 accuracy.

Awesome Lists containing this project

README

        

# NLP-Kaggle-Competition
I achieved Rank 15 with 0.87618 accuracy.
# Natural Language Processing with Disaster Tweets
Predict which Tweets are about real disasters and which ones are not

# Competition Description
Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies). But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:
The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.
In this competition, you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. You’ll have access to a dataset of 10,000 tweets that were hand classified. If this is your first time working on an NLP problem, we've created a quick tutorial to get you up and running.
Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.

# Submission File
For each ID in the test set, you must predict 1 if the tweet is describing a real disaster, and 0 otherwise. The file should contain a header and have the following format:

#id,target
(0,0)
(2,0).....

# Visualization Using YData Profiling

https://github.com/user-attachments/assets/9557c7ba-3b04-4f0a-b531-aec2c28572c1

Using YData Profiling, I was able to quickly get a complete understanding of the dataset. The tool gave me a clear snapshot of important statistics, highlighted any data quality issues like missing values or duplicates, and provided visualizations that made it easier to see how the data was distributed.