https://github.com/veivel/f1-sentiment-analysis
An entiment analysis project on tweets about Formula 1. To be reworked.
https://github.com/veivel/f1-sentiment-analysis
data f1 nlp-library nlp-machine-learning
Last synced: 3 months ago
JSON representation
An entiment analysis project on tweets about Formula 1. To be reworked.
- Host: GitHub
- URL: https://github.com/veivel/f1-sentiment-analysis
- Owner: Veivel
- Created: 2022-06-16T08:48:23.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2022-08-24T02:55:30.000Z (about 3 years ago)
- Last Synced: 2023-03-04T13:52:15.015Z (over 2 years ago)
- Topics: data, f1, nlp-library, nlp-machine-learning
- Language: Python
- Homepage:
- Size: 151 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# F1 SENTIMENT ANALYSIS
-----
Understanding the Internet's Opinions on Formula 1by Givarrel Veivel

-----
### WORK IN PROGRESS**PROBLEM STATEMENT**: On the internet, it's much easier to pay attention to either toxicity & hate, or notions that only reflect our own (confirmation bias). This project looks at the data more objectively to REALLY unveil Reddit & Twitter's sentiment on different Formula 1 drivers, while also trying to make sense of the different factors of opinions in F1.
This is a TEXT CLASSIFICATION and OPINION MINING project, where data is retrieved from replies under official @F1 tweets (and possibly Reddit comments in the future). Each tweet will be classified based on topic (the subject driver or team), and then we will label the tweet's sentiment (positive vs negative opinion).
-----
### DOCUMENTATION(1) Pull replies from specified tweet -> (2) Label opinion/sentiment -> (3) Clean text content of tweets (for bag of words model)-> (4) Evaluate & train model -> (5) Test model on unlabeled data
First I use `train/twitterer.py` to pull my train/test data. This is done by pulling tweets that are subreplies of my target tweet, although with the method I use it is limited to retrieving **recent** ones. Second, with the help of `train/sentiment_labeler.py`, I label my data: negative, positive, or neutral. Then I deploy `model.ipynb` to clean and evaluate the model, before using it to predict the sentiment of my test data.
The biggest limitation is the amount of data I have and can obtain. I would need to try a different method, or perhaps use a different platform, to try and obtain more training & testing data.
-----
### TO-DO LIST
- Obtain metadata (likes, retweets, etc)
- Gather more training data (!!!)
- Implement RNN model