https://github.com/brayvid/tweet-sentiment-classifier

Flatiron School Data Science Bootcamp Phase 3 Project
https://github.com/brayvid/tweet-sentiment-classifier

classification data-science kaggle machine-learning scikit-learn sentiment-analysis twitter

Last synced: 4 months ago
JSON representation

Flatiron School Data Science Bootcamp Phase 3 Project

Host: GitHub
URL: https://github.com/brayvid/tweet-sentiment-classifier
Owner: brayvid
Created: 2024-07-09T23:07:56.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-07-17T22:45:38.000Z (12 months ago)
Last Synced: 2025-01-11T11:18:28.768Z (6 months ago)
Topics: classification, data-science, kaggle, machine-learning, scikit-learn, sentiment-analysis, twitter
Language: Jupyter Notebook
Homepage: https://colab.research.google.com/github/brayvid/tweet-sentiment-classifier/blob/main/tweet_sentiment_classifier.ipynb
Size: 21.6 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Tweet Sentiment Classifier

Blake Rayvid - https://github.com/brayvid

Flatiron School Data Science Bootcamp Phase 3 Project

Brand reputation management

Monitor brand perception by correctly classifying new tweets as positive, negative or neutral.

Analyze negative feedback for insights into product weaknesses and use this to drive improvements.

Identify accounts with consistent positive sentiment and offer to collaborate.

Time launches of new products during periods of high positive sentiment.

## Dataset

https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset

Three classes: positive, negative, neutral in column called sentiment.

27,000 tweets formatted as strings in text column.

selected_text is an additional column containing the substring of each tweet relevant to classification.

## Results
I tried several model types, and a Support Vector Classifier (SVC) applied to ‘selected_text’ yielded the best performance. Test set results are summarized below, with precision and recall scores per class and a confusion matrix. Test accuracy was 83%.

Label
Precision
Recall

negative
83%
77%

neutral
78%
91%

positive
93%
80%

## Next steps

Try Word2Vec semantic embedding instead of frequency-based TF-IDF.

Investigate dimensionality reduction with UMAP or t-SNE.

Deploy to a web service to classify new tweets in real time.

This project highlights the importance of sentiment analysis in brand reputation management and provides a foundation for further development and deployment in a real-world setting.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome