{"id":19838670,"url":"https://github.com/brayvid/tweet-sentiment-classifier","last_synced_at":"2025-02-28T19:14:53.488Z","repository":{"id":247678796,"uuid":"826543526","full_name":"brayvid/tweet-sentiment-classifier","owner":"brayvid","description":"Flatiron School Data Science Bootcamp Phase 3 Project","archived":false,"fork":false,"pushed_at":"2024-07-17T22:45:38.000Z","size":22681,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-11T11:18:28.768Z","etag":null,"topics":["classification","data-science","kaggle","machine-learning","scikit-learn","sentiment-analysis","twitter"],"latest_commit_sha":null,"homepage":"https://colab.research.google.com/github/brayvid/tweet-sentiment-classifier/blob/main/tweet_sentiment_classifier.ipynb","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brayvid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-09T23:07:56.000Z","updated_at":"2024-07-27T17:59:13.000Z","dependencies_parsed_at":"2024-07-18T00:39:55.596Z","dependency_job_id":"fe35e356-4122-4324-be49-5034abcddbb3","html_url":"https://github.com/brayvid/tweet-sentiment-classifier","commit_stats":null,"previous_names":["brayvid/tweet-sentiment-classifier"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brayvid%2Ftweet-sentiment-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brayvid%2Ftweet-sentiment-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brayvid%2Ftweet-sentiment-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brayvid%2Ftweet-sentiment-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brayvid","download_url":"https://codeload.github.com/brayvid/tweet-sentiment-classifier/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241209528,"owners_count":19927734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","data-science","kaggle","machine-learning","scikit-learn","sentiment-analysis","twitter"],"created_at":"2024-11-12T12:18:38.929Z","updated_at":"2025-02-28T19:14:53.466Z","avatar_url":"https://github.com/brayvid.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tweet Sentiment Classifier\n\n\u003ca href=\"https://colab.research.google.com/github/brayvid/tweet-sentiment-classifier/blob/main/tweet_sentiment_classifier.ipynb\" rel=\"Open in Colab\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"\" /\u003e\u003c/a\u003e\n\u003ch4\u003eBlake Rayvid - \u003ca href=https://github.com/brayvid\u003ehttps://github.com/brayvid\u003c/a\u003e\u003c/h4\u003e\nFlatiron School Data Science Bootcamp Phase 3 Project\n\u003ch4\u003e\u003ca href=\"slides.pdf\"\u003ePresentation slides\u003c/a\u003e\u003c/h4\u003e\n\n\n## Business problem\n\u003ch3\u003e\u003cu\u003eBrand reputation management\u003c/u\u003e\u003c/h3\u003e\n\u003ch4\u003eMonitor brand perception by correctly classifying new tweets as positive, negative or neutral.\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003eAnalyze negative feedback for insights into product weaknesses and use this to drive improvements.\n\u003cli\u003eIdentify accounts with consistent positive sentiment and offer to collaborate.\n\u003cli\u003eTime launches of new products during periods of high positive sentiment.\n\u003c/ul\u003e\n\n## Dataset\n\u003ch4\u003e\u003ca href=\"https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset\"\u003ehttps://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset\u003c/a\u003e\u003c/h4\u003e\n\u003cul\u003e\n\u003cli\u003eThree classes: positive, negative, neutral in column called \u003ccode\u003esentiment\u003c/code\u003e.\n\u003cli\u003e27,000 tweets formatted as strings in \u003ccode\u003etext\u003c/code\u003e column.\n\u003cli\u003e\u003ccode\u003eselected_text\u003c/code\u003e is an additional column containing the substring of each tweet relevant to classification.\n\u003c/ul\u003e\n\u003cimg src=\"images/wordcloud_pos.png\" width=\"900px\"\u003e\n\u003cimg src=\"images/wordcloud_neg.png\" width=\"900px\"\u003e\n\n## Results\nI tried several model types, and a \u003ca href=\"https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\"\u003eSupport Vector Classifier\u003c/a\u003e (SVC) applied to ‘selected_text’ yielded the best performance. Test set results are summarized below, with precision and recall scores per class and a confusion matrix. Test accuracy was 83%.\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003eLabel\u003c/td\u003e\n\u003ctd\u003ePrecision\u003c/td\u003e\n\u003ctd\u003eRecall\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003enegative\u003c/td\u003e\n\u003ctd\u003e83%\u003c/td\u003e\n\u003ctd\u003e77%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003eneutral\u003c/td\u003e\n\u003ctd\u003e78%\u003c/td\u003e\n\u003ctd\u003e91%\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003epositive\u003c/td\u003e\n\u003ctd\u003e93%\u003c/td\u003e\n\u003ctd\u003e80%\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\u003ccenter\u003e\u003cimg src=\"images/conf_mat_svc.png\" width=\"500px\"\u003e\u003c/center\u003e\n\n## Next steps\n\u003cul\u003e\n\u003cli\u003eTry Word2Vec semantic embedding instead of frequency-based TF-IDF.\n\u003cli\u003eInvestigate dimensionality reduction with UMAP or t-SNE.\n\u003cli\u003eDeploy to a web service to classify new tweets in real time.\n\u003c/ul\u003e\n\n\nThis project highlights the importance of sentiment analysis in brand reputation management and provides a foundation for further development and deployment in a real-world setting.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrayvid%2Ftweet-sentiment-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrayvid%2Ftweet-sentiment-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrayvid%2Ftweet-sentiment-classifier/lists"}