Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/brayvid/tweet-sentiment-classifier
Flatiron School Data Science Bootcamp Phase 3 Project
https://github.com/brayvid/tweet-sentiment-classifier
classification data-science kaggle machine-learning scikit-learn sentiment-analysis twitter
Last synced: 5 days ago
JSON representation
Flatiron School Data Science Bootcamp Phase 3 Project
- Host: GitHub
- URL: https://github.com/brayvid/tweet-sentiment-classifier
- Owner: brayvid
- Created: 2024-07-09T23:07:56.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-17T22:45:38.000Z (6 months ago)
- Last Synced: 2024-11-12T12:18:54.770Z (2 months ago)
- Topics: classification, data-science, kaggle, machine-learning, scikit-learn, sentiment-analysis, twitter
- Language: Jupyter Notebook
- Homepage: https://colab.research.google.com/github/brayvid/tweet-sentiment-classifier/blob/main/tweet_sentiment_classifier.ipynb
- Size: 21.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Tweet Sentiment Classifier
Blake Rayvid - https://github.com/brayvid
Flatiron School Data Science Bootcamp Phase 3 ProjectPresentation slides
## Business problem
Brand reputation management
Monitor brand perception by correctly classifying new tweets as positive, negative or neutral.
- Analyze negative feedback for insights into product weaknesses and use this to drive improvements.
- Identify accounts with consistent positive sentiment and offer to collaborate.
- Time launches of new products during periods of high positive sentiment.
## Dataset
https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
- Three classes: positive, negative, neutral in column called
sentiment
. - 27,000 tweets formatted as strings in
text
column. -
selected_text
is an additional column containing the substring of each tweet relevant to classification.
## Results
I tried several model types, and a Support Vector Classifier (SVC) applied to ‘selected_text’ yielded the best performance. Test set results are summarized below, with precision and recall scores per class and a confusion matrix. Test accuracy was 83%.
Label
Precision
Recall
negative
83%
77%
neutral
78%
91%
positive
93%
80%
## Next steps
- Try Word2Vec semantic embedding instead of frequency-based TF-IDF.
- Investigate dimensionality reduction with UMAP or t-SNE.
- Deploy to a web service to classify new tweets in real time.
This project highlights the importance of sentiment analysis in brand reputation management and provides a foundation for further development and deployment in a real-world setting.