{"id":22158339,"url":"https://github.com/mondalbidisha/twitter-sentiment-analysis","last_synced_at":"2025-03-24T14:49:11.649Z","repository":{"id":46103730,"uuid":"428014474","full_name":"mondalbidisha/twitter-sentiment-analysis","owner":"mondalbidisha","description":"Twitter sentiment analysis using Tweepy and TextBlob libraries in Python 3, extracted tweets about a user-defined topic, and classified them as positive, negative, and neutral. ","archived":false,"fork":false,"pushed_at":"2021-11-14T19:17:40.000Z","size":1654,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-29T19:49:13.064Z","etag":null,"topics":["pytohn3","textblob","tweepy"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mondalbidisha.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-14T18:56:45.000Z","updated_at":"2024-05-19T10:16:14.000Z","dependencies_parsed_at":"2022-08-30T18:01:47.761Z","dependency_job_id":null,"html_url":"https://github.com/mondalbidisha/twitter-sentiment-analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mondalbidisha%2Ftwitter-sentiment-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mondalbidisha%2Ftwitter-sentiment-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mondalbidisha%2Ftwitter-sentiment-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mondalbidisha%2Ftwitter-sentiment-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mondalbidisha","download_url":"https://codeload.github.com/mondalbidisha/twitter-sentiment-analysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245294754,"owners_count":20591899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pytohn3","textblob","tweepy"],"created_at":"2024-12-02T03:22:49.945Z","updated_at":"2025-03-24T14:49:11.625Z","avatar_url":"https://github.com/mondalbidisha.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Twitter Sentiment Analysis\n\nStep 1: Install and Import Libraries\n\nBefore analysis, you need to install textblob and tweepy libraries using !pip install command on your Jupyter Notebook.\n\n```\n# Install Libraries\n\n!pip install textblob\n!pip install tweepy\n```\nYou need to import libraries that you will use in this sentiment analysis project.\n\n```\n# Import Libraries\n\nfrom textblob import TextBlob\nimport sys\nimport tweepy\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport numpy as np\nimport os\nimport nltk\nimport pycountry\nimport re\nimport string\nfrom wordcloud import WordCloud, STOPWORDS\nfrom PIL import Image\nfrom nltk.sentiment.vader import SentimentIntensityAnalyzer\nfrom langdetect import detect\nfrom nltk.stem import SnowballStemmer\nfrom nltk.sentiment.vader import SentimentIntensityAnalyzer\nfrom sklearn.feature_extraction.text import CountVectorizer\n```\n\nTweepy supports both OAuth 1a (application-user) and OAuth 2 (application-only) authentication. Authentication is handled by the tweepy.AuthHandler class.\nOAuth 2 is a method of authentication where an application makes API requests without the user context. Use this method if you just need read-only access to public information.\nYou first register our client application and acquire a consumer key and secret. Then you create an AppAuthHandler instance, passing in our consumer key and secret.\nBefore the authentication, you need to have Twitter Developer Account. If you don’t have, you can apply by using this link. Getting Twitter developer account usually takes a day or two, or sometimes more, for your application to be reviewed by Twitter.\n\nStep 2: Authentication for Twitter API\n\n```\n# Authentication\n\nconsumerKey = “Type your consumer key here”\nconsumerSecret = “Type your consumer secret here”\naccessToken = “Type your accedd token here”\naccessTokenSecret = “Type your access token secret here”\nauth = tweepy.OAuthHandler(consumerKey, consumerSecret)\nauth.set_access_token(accessToken, accessTokenSecret)\napi = tweepy.API(auth)\n```\n\nAfter your authentication, you need to use tweepy to get text and use Textblob to calculate positive, negative, neutral, polarity and compound parameters from the text.\n\nStep 3: Getting Tweets With Keyword or Hashtag\n\n```\n# Sentiment Analysis\n\ndef percentage(part,whole):\n    return 100 * float(part)/float(whole)\nkeyword = input(“Please enter keyword or hashtag to search: “)\nnoOfTweet = int(input (“Please enter how many tweets to analyze: “))\ntweets = tweepy.Cursor(api.search, q=keyword).items(noOfTweet)\npositive = 0\nnegative = 0\nneutral = 0\npolarity = 0\ntweet_list = []\nneutral_list = []\nnegative_list = []\npositive_list = []\nfor tweet in tweets:\n \n # print(tweet.text)\n \n tweet_list.append(tweet.text)\n analysis = TextBlob(tweet.text)\n score = SentimentIntensityAnalyzer().polarity_scores(tweet.text)\n neg = score[‘neg’]\n neu = score[‘neu’]\n pos = score[‘pos’]\n comp = score[‘compound’]\n polarity += analysis.sentiment.polarity\n \n if neg \u003e pos:\n    negative_list.append(tweet.text)\n    negative += 1\nelif pos \u003e neg:\n    positive_list.append(tweet.text)\n    positive += 1\nelif pos == neg:\n    neutral_list.append(tweet.text)\n    neutral += 1\npositive = percentage(positive, noOfTweet)\nnegative = percentage(negative, noOfTweet)\nneutral = percentage(neutral, noOfTweet)\npolarity = percentage(polarity, noOfTweet)\npositive = format(positive, ‘.1f’)\nnegative = format(negative, ‘.1f’)\nneutral = format(neutral, ‘.1f’)\n```\n\nThe scenario in this post like that, the user should type keyword or hashtag (lockdown2 london) and type how many tweets (2500) that want to get and analyse.\nThe number of tweets parameter is important because of the limit.\n\nAfter getting 2500 tweets about “lockdown2 india”, let’s have a look number of tweets that which sentiments\n\n```\n# Number of Tweets (Total, Positive, Negative, Neutral)\n\ntweet_list = pd.DataFrame(tweet_list)\nneutral_list = pd.DataFrame(neutral_list)\nnegative_list = pd.DataFrame(negative_list)\npositive_list = pd.DataFrame(positive_list)\nprint(“total number: “,len(tweet_list))\nprint(“positive number: “,len(positive_list))\nprint(“negative number: “, len(negative_list))\nprint(“neutral number: “,len(neutral_list))\n```\n\nYou could get 2500 tweets and -\n\n1025 (41.0%) of tweets include positive sentiment\n\n580 (23.2%) of tweets include negative sentiment\n\n895 (35.8%) of tweets include neutral sentiment\n\n```\n# Creating PieCart\n\nlabels = [‘Positive [‘+str(positive)+’%]’ , ‘Neutral [‘+str(neutral)+’%]’,’Negative [‘+str(negative)+’%]’]\nsizes = [positive, neutral, negative]\ncolors = [‘yellowgreen’, ‘blue’,’red’]\npatches, texts = plt.pie(sizes,colors=colors, startangle=90)\nplt.style.use(‘default’)\nplt.legend(labels)\nplt.title(“Sentiment Analysis Result for keyword= “+keyword+”” )\nplt.axis(‘equal’)\nplt.show()\n```\n\nStep 4: Cleaning Tweets to Analyse Sentiment\n\nWhen you have a look tweet list you can see some duplicated tweets, so you need to drop duplicates records using drop_duplicates function.\n```\ntweet_list.drop_duplicates(inplace = True)\n```\n\nOur new data frame has 1281 unique tweets.\nFirstly, I create new data frame (tw_list) and a new feature(text), then clean text by using lambda function and clean RT, link, punctuation characters and finally convert to lowercase.\n\n```\n# Cleaning Text (RT, Punctuation etc)\n# Creating new dataframe and new features\n\ntw_list = pd.DataFrame(tweet_list)\ntw_list[“text”] = tw_list[0]\n#Removing RT, Punctuation etc\nremove_rt = lambda x: re.sub(‘RT @\\w+: ‘,” “,x)\nrt = lambda x: re.sub(“(@[A-Za-z0–9]+)|([⁰-9A-Za-z \\t])|(\\w+:\\/\\/\\S+)”,” “,x)\ntw_list[“text”] = tw_list.text.map(remove_rt).map(rt)\ntw_list[“text”] = tw_list.text.str.lower()\ntw_list.head(10)\n```\n\nStep 5: Sentiment Analyse\n\nNow, I can use cleaned text to calculate polarity, subjectivity, sentiment, negative, positive, neutral and compound parameters again. For all calculated parameters, I create new features to my data frame\n\n```\n# Calculating Negative, Positive, Neutral and Compound values\n\ntw_list[[‘polarity’, ‘subjectivity’]] = tw_list[‘text’].apply(lambda Text: pd.Series(TextBlob(Text).sentiment))\nfor index, row in tw_list[‘text’].iteritems():\n score = SentimentIntensityAnalyzer().polarity_scores(row)\n neg = score[‘neg’]\n neu = score[‘neu’]\n pos = score[‘pos’]\n comp = score[‘compound’]\n if neg \u003e pos:\n    tw_list.loc[index, ‘sentiment’] = “negative”\n elif pos \u003e neg:\n    tw_list.loc[index, ‘sentiment’] = “positive”\n else:\n    tw_list.loc[index, ‘sentiment’] = “neutral”\n    tw_list.loc[index, ‘neg’] = neg\n    tw_list.loc[index, ‘neu’] = neu\n    tw_list.loc[index, ‘pos’] = pos\n    tw_list.loc[index, ‘compound’] = comp\n    tw_list.head(10)\n```\n\nYou can split your data frame into 3 groups based on sentiment. For this one, create 3 new data frame (tw_list_negative, tw_list_positive, tw_list_neutral) and \nimport from original tw_list data frame\n\n```\n#Creating new data frames for all sentiments (positive, negative and neutral)\ntw_list_negative = tw_list[tw_list[“sentiment”]==”negative”]\ntw_list_positive = tw_list[tw_list[“sentiment”]==”positive”]\ntw_list_neutral = tw_list[tw_list[“sentiment”]==”neutral”]\n```\n\nLet’s count values for sentiment features and see total — percentage.\n\n```\n# Function for count_values_in single columns\n\ndef count_values_in_column(data,feature):\n total=data.loc[:,feature].value_counts(dropna=False)\n percentage=round(data.loc[:,feature].value_counts(dropna=False,normalize=True)*100,2)\n return pd.concat([total,percentage],axis=1,keys=[‘Total’,’Percentage’])\n\n# Count_values for sentiment\n\ncount_values_in_column(tw_list,”sentiment”)\n```\n\nYou can create a chart by using number of sentiment tweets.\n\n```\n# create data for Pie Chart\n\npichart = count_values_in_column(tw_list,”sentiment”)\nnames= pc.index\nsize=pc[“Percentage”]\n```\n\n```\n# Create a circle for the center of the plot\n\nmy_circle=plt.Circle( (0,0), 0.7, color=’white’)\nplt.pie(size, labels=names, colors=[‘green’,’blue’,’red’])\np=plt.gcf()\np.gca().add_artist(my_circle)\nplt.show()\n```\n\nNow you can prepare to create worcloud using 1281 tweets, So you can realize that which words most used in these tweets. To create a worcloud, firstly let’s define a function below, so you can use wordcloud again for all tweets, positive tweets, negative tweets etc.\n\n```\n# Function to Create Wordcloud\n\ndef create_wordcloud(text):\n mask = np.array(Image.open(“cloud.png”))\n stopwords = set(STOPWORDS)\n wc = WordCloud(background_color=”white”,\n mask = mask,\n max_words=3000,\n stopwords=stopwords,\n repeat=True)\n wc.generate(str(text))\n wc.to_file(“wc.png”)\n print(“Word Cloud Saved Successfully”)\n path=”wc.png”\n display(Image.open(path))\n```\n\nAfter defining the function, you can have a look wordcloud for all tweets\n\n```\n# Creating wordcloud for all tweets\n\ncreate_wordcloud(tw_list[“text”].values)\n```\n\nWord Cloud for tweets that have positive sentiments;\n\n```\n# Creating wordcloud for positive sentiment\n\ncreate_wordcloud(tw_list_positive[“text”].values)\n```\n\nWord Cloud for tweets that have negative sentiments;\n\n```\n# Creating wordcloud for negative sentiment\n\ncreate_wordcloud(tw_list_negative[“text”].values)\n```\n\nLet’s calculate the tweet length and word count. So you can see the density of words and characters used in tweets based on different sentiment.\n\n```\n# Calculating tweet’s lenght and word count\n\ntw_list[‘text_len’] = tw_list[‘text’].astype(str).apply(len)\ntw_list[‘text_word_count’] = tw_list[‘text’].apply(lambda x: len(str(x).split()))\nround(pd.DataFrame(tw_list.groupby(\"sentiment\").text_len.mean()),2)\n```\n\n```\nround(pd.DataFrame(tw_list.groupby(“sentiment”).text_word_count.mean()),2)\n```\n\nApplying count vectorizer provides the capability to preprocess your text data prior to generating the vector representation making it a highly flexible feature representation module for text. After count vectorizer, it is possible to analyze the words with two or three or whatever you want.\nApplying stemmer is also provides the root of words. So you can eliminate words that come from the same root, such as -\nconnect\nconnection\nconnected\nconnections\nconnects\ncomes from “connect”. If you apply the stemmer function, you can consider these all words as same\n\n```\n# Removing Punctuation\n\ndef remove_punct(text):\n text = “”.join([char for char in text if char not in string.punctuation])\n text = re.sub(‘[0–9]+’, ‘’, text)\n return text\ntw_list[‘punct’] = tw_list[‘text’].apply(lambda x: remove_punct(x))\n```\n\n```\n#Appliyng tokenization\ndef tokenization(text):\n    text = re.split('\\W+', text)\n    return text\ntw_list['tokenized'] = tw_list['punct'].apply(lambda x: tokenization(x.lower()))\n```\n\n```\n# Removing stopwords\n\nstopword = nltk.corpus.stopwords.words('english')\ndef remove_stopwords(text):\n    text = [word for word in text if word not in stopword]\n    return text\n    \ntw_list['nonstop'] = tw_list['tokenized'].apply(lambda x: remove_stopwords(x))\n```\n\n```\n# Appliyng Stemmer\n\nps = nltk.PorterStemmer()\ndef stemming(text):\n    text = [ps.stem(word) for word in text]\n    return text\ntw_list['stemmed'] = tw_list['nonstop'].apply(lambda x: stemming(x))\n```\n\n```\n# Cleaning Text\n\ndef clean_text(text):\n    text_lc = \"\".join([word.lower() for word in text if word not in string.punctuation]) # remove puntuation\n    text_rc = re.sub('[0-9]+', '', text_lc)\n    tokens = re.split('\\W+', text_rc)    # tokenization\n    text = [ps.stem(word) for word in tokens if word not in stopword]  # remove stopwords and stemming\n    return text\ntw_list.head()\n```\n\nAfter applying countverctorizer, two results show us all 1281 tweets have 2966 unique words.\nIf you have a look at our data frame, you can see new features such as punct, tokenized, nonstop, stemmed.\n\nNow, you can apply coun vectorizer the see all 2966 unique words as a new features.\n\n```\n# Appliyng Countvectorizer\n\ncountVectorizer = CountVectorizer(analyzer=clean_text) \ncountVector = countVectorizer.fit_transform(tw_list[‘text’])\nprint(‘{} Number of reviews has {} words’.format(countVector.shape[0], countVector.shape[1]))\n\n# print(countVectorizer.get_feature_names())\n\n1281 Number of reviews has 2966 words\n\ncount_vect_df = pd.DataFrame(countVector.toarray(), columns=countVectorizer.get_feature_names())\ncount_vect_df.head()\n```\n\nYou can sort values as a descending to see most used words\n\n```\n# Most Used Words\n\ncount = pd.DataFrame(count_vect_df.sum())\ncountdf = count.sort_values(0,ascending=False).head(20)\ncountdf[1:11]\n```\n\nBuilding n gram model helps us to predict most probably word that might follow this sequence. Firstly let’s create a function then built n2_bigram, n3_trigram etc.\n\n```\n# Function to ngram\n\ndef get_top_n_gram(corpus,ngram_range,n=None):\n vec = CountVectorizer(ngram_range=ngram_range,stop_words = ‘english’).fit(corpus)\n bag_of_words = vec.transform(corpus)\n sum_words = bag_of_words.sum(axis=0) \n words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n return words_freq[:n]\n#n2_bigram\nn2_bigrams = get_top_n_gram(tw_list[‘text’],(2,2),20)\nn2_bigrams\n```\n\n```\n#n3_trigram\nn3_trigrams = get_top_n_gram(tw_list[‘text’],(3,3),20)\nn3_trigrams\n```\n\nFinally, you can analyze sentiment using tweets and you can realize which words most used and which words used together.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmondalbidisha%2Ftwitter-sentiment-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmondalbidisha%2Ftwitter-sentiment-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmondalbidisha%2Ftwitter-sentiment-analysis/lists"}