https://github.com/mondalbidisha/twitter-sentiment-analysis

Twitter sentiment analysis using Tweepy and TextBlob libraries in Python 3, extracted tweets about a user-defined topic, and classified them as positive, negative, and neutral.
https://github.com/mondalbidisha/twitter-sentiment-analysis
pytohn3 textblob tweepy
Last synced: over 1 year ago
JSON representation
Twitter sentiment analysis using Tweepy and TextBlob libraries in Python 3, extracted tweets about a user-defined topic, and classified them as positive, negative, and neutral.
Host: GitHub
URL: https://github.com/mondalbidisha/twitter-sentiment-analysis
Owner: mondalbidisha
Created: 2021-11-14T18:56:45.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2021-11-14T19:17:40.000Z (over 4 years ago)
Last Synced: 2025-01-29T19:49:13.064Z (over 1 year ago)
Topics: pytohn3, textblob, tweepy
Language: Jupyter Notebook
Homepage:
Size: 1.58 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Twitter Sentiment Analysis

Step 1: Install and Import Libraries

Before analysis, you need to install textblob and tweepy libraries using !pip install command on your Jupyter Notebook.

```

# Install Libraries

!pip install textblob

!pip install tweepy

```

You need to import libraries that you will use in this sentiment analysis project.

```

# Import Libraries

from textblob import TextBlob

import sys

import tweepy

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

import os

import nltk

import pycountry

import re

import string

from wordcloud import WordCloud, STOPWORDS

from PIL import Image

from nltk.sentiment.vader import SentimentIntensityAnalyzer

from langdetect import detect

from nltk.stem import SnowballStemmer

from nltk.sentiment.vader import SentimentIntensityAnalyzer

from sklearn.feature_extraction.text import CountVectorizer

```

Tweepy supports both OAuth 1a (application-user) and OAuth 2 (application-only) authentication. Authentication is handled by the tweepy.AuthHandler class.

OAuth 2 is a method of authentication where an application makes API requests without the user context. Use this method if you just need read-only access to public information.

You first register our client application and acquire a consumer key and secret. Then you create an AppAuthHandler instance, passing in our consumer key and secret.

Before the authentication, you need to have Twitter Developer Account. If you don’t have, you can apply by using this link. Getting Twitter developer account usually takes a day or two, or sometimes more, for your application to be reviewed by Twitter.

Step 2: Authentication for Twitter API

```

# Authentication

consumerKey = “Type your consumer key here”

consumerSecret = “Type your consumer secret here”

accessToken = “Type your accedd token here”

accessTokenSecret = “Type your access token secret here”

auth = tweepy.OAuthHandler(consumerKey, consumerSecret)

auth.set_access_token(accessToken, accessTokenSecret)

api = tweepy.API(auth)

```

After your authentication, you need to use tweepy to get text and use Textblob to calculate positive, negative, neutral, polarity and compound parameters from the text.

Step 3: Getting Tweets With Keyword or Hashtag

```

# Sentiment Analysis

def percentage(part,whole):

    return 100 * float(part)/float(whole)

keyword = input(“Please enter keyword or hashtag to search: “)

noOfTweet = int(input (“Please enter how many tweets to analyze: “))

tweets = tweepy.Cursor(api.search, q=keyword).items(noOfTweet)

positive = 0

negative = 0

neutral = 0

polarity = 0

tweet_list = []

neutral_list = []

negative_list = []

positive_list = []

for tweet in tweets:

 

 # print(tweet.text)

 

 tweet_list.append(tweet.text)

 analysis = TextBlob(tweet.text)

 score = SentimentIntensityAnalyzer().polarity_scores(tweet.text)

 neg = score[‘neg’]

 neu = score[‘neu’]

 pos = score[‘pos’]

 comp = score[‘compound’]

 polarity += analysis.sentiment.polarity

 

 if neg > pos:

    negative_list.append(tweet.text)

    negative += 1

elif pos > neg:

    positive_list.append(tweet.text)

    positive += 1

elif pos == neg:

    neutral_list.append(tweet.text)

    neutral += 1

positive = percentage(positive, noOfTweet)

negative = percentage(negative, noOfTweet)

neutral = percentage(neutral, noOfTweet)

polarity = percentage(polarity, noOfTweet)

positive = format(positive, ‘.1f’)

negative = format(negative, ‘.1f’)

neutral = format(neutral, ‘.1f’)

```

The scenario in this post like that, the user should type keyword or hashtag (lockdown2 london) and type how many tweets (2500) that want to get and analyse.

The number of tweets parameter is important because of the limit.

After getting 2500 tweets about “lockdown2 india”, let’s have a look number of tweets that which sentiments

```

# Number of Tweets (Total, Positive, Negative, Neutral)

tweet_list = pd.DataFrame(tweet_list)

neutral_list = pd.DataFrame(neutral_list)

negative_list = pd.DataFrame(negative_list)

positive_list = pd.DataFrame(positive_list)

print(“total number: “,len(tweet_list))

print(“positive number: “,len(positive_list))

print(“negative number: “, len(negative_list))

print(“neutral number: “,len(neutral_list))

```

You could get 2500 tweets and -

1025 (41.0%) of tweets include positive sentiment

580 (23.2%) of tweets include negative sentiment

895 (35.8%) of tweets include neutral sentiment

```

# Creating PieCart

labels = [‘Positive [‘+str(positive)+’%]’ , ‘Neutral [‘+str(neutral)+’%]’,’Negative [‘+str(negative)+’%]’]

sizes = [positive, neutral, negative]

colors = [‘yellowgreen’, ‘blue’,’red’]

patches, texts = plt.pie(sizes,colors=colors, startangle=90)

plt.style.use(‘default’)

plt.legend(labels)

plt.title(“Sentiment Analysis Result for keyword= “+keyword+”” )

plt.axis(‘equal’)

plt.show()

```

Step 4: Cleaning Tweets to Analyse Sentiment

When you have a look tweet list you can see some duplicated tweets, so you need to drop duplicates records using drop_duplicates function.

```

tweet_list.drop_duplicates(inplace = True)

```

Our new data frame has 1281 unique tweets.

Firstly, I create new data frame (tw_list) and a new feature(text), then clean text by using lambda function and clean RT, link, punctuation characters and finally convert to lowercase.

```

# Cleaning Text (RT, Punctuation etc)

# Creating new dataframe and new features

tw_list = pd.DataFrame(tweet_list)

tw_list[“text”] = tw_list[0]

#Removing RT, Punctuation etc

remove_rt = lambda x: re.sub(‘RT @\w+: ‘,” “,x)

rt = lambda x: re.sub(“(@[A-Za-z0–9]+)|([⁰-9A-Za-z \t])|(\w+:\/\/\S+)”,” “,x)

tw_list[“text”] = tw_list.text.map(remove_rt).map(rt)

tw_list[“text”] = tw_list.text.str.lower()

tw_list.head(10)

```

Step 5: Sentiment Analyse

Now, I can use cleaned text to calculate polarity, subjectivity, sentiment, negative, positive, neutral and compound parameters again. For all calculated parameters, I create new features to my data frame

```

# Calculating Negative, Positive, Neutral and Compound values

tw_list[[‘polarity’, ‘subjectivity’]] = tw_list[‘text’].apply(lambda Text: pd.Series(TextBlob(Text).sentiment))

for index, row in tw_list[‘text’].iteritems():

 score = SentimentIntensityAnalyzer().polarity_scores(row)

 neg = score[‘neg’]

 neu = score[‘neu’]

 pos = score[‘pos’]

 comp = score[‘compound’]

 if neg > pos:

    tw_list.loc[index, ‘sentiment’] = “negative”

 elif pos > neg:

    tw_list.loc[index, ‘sentiment’] = “positive”

 else:

    tw_list.loc[index, ‘sentiment’] = “neutral”

    tw_list.loc[index, ‘neg’] = neg

    tw_list.loc[index, ‘neu’] = neu

    tw_list.loc[index, ‘pos’] = pos

    tw_list.loc[index, ‘compound’] = comp

    tw_list.head(10)

```

You can split your data frame into 3 groups based on sentiment. For this one, create 3 new data frame (tw_list_negative, tw_list_positive, tw_list_neutral) and 

import from original tw_list data frame

```

#Creating new data frames for all sentiments (positive, negative and neutral)

tw_list_negative = tw_list[tw_list[“sentiment”]==”negative”]

tw_list_positive = tw_list[tw_list[“sentiment”]==”positive”]

tw_list_neutral = tw_list[tw_list[“sentiment”]==”neutral”]

```

Let’s count values for sentiment features and see total — percentage.

```

# Function for count_values_in single columns

def count_values_in_column(data,feature):

 total=data.loc[:,feature].value_counts(dropna=False)

 percentage=round(data.loc[:,feature].value_counts(dropna=False,normalize=True)*100,2)

 return pd.concat([total,percentage],axis=1,keys=[‘Total’,’Percentage’])

# Count_values for sentiment

count_values_in_column(tw_list,”sentiment”)

```

You can create a chart by using number of sentiment tweets.

```

# create data for Pie Chart

pichart = count_values_in_column(tw_list,”sentiment”)

names= pc.index

size=pc[“Percentage”]

```

```

# Create a circle for the center of the plot

my_circle=plt.Circle( (0,0), 0.7, color=’white’)

plt.pie(size, labels=names, colors=[‘green’,’blue’,’red’])

p=plt.gcf()

p.gca().add_artist(my_circle)

plt.show()

```

Now you can prepare to create worcloud using 1281 tweets, So you can realize that which words most used in these tweets. To create a worcloud, firstly let’s define a function below, so you can use wordcloud again for all tweets, positive tweets, negative tweets etc.

```

# Function to Create Wordcloud

def create_wordcloud(text):

 mask = np.array(Image.open(“cloud.png”))

 stopwords = set(STOPWORDS)

 wc = WordCloud(background_color=”white”,

 mask = mask,

 max_words=3000,

 stopwords=stopwords,

 repeat=True)

 wc.generate(str(text))

 wc.to_file(“wc.png”)

 print(“Word Cloud Saved Successfully”)

 path=”wc.png”

 display(Image.open(path))

```

After defining the function, you can have a look wordcloud for all tweets

```

# Creating wordcloud for all tweets

create_wordcloud(tw_list[“text”].values)

```

Word Cloud for tweets that have positive sentiments;

```

# Creating wordcloud for positive sentiment

create_wordcloud(tw_list_positive[“text”].values)

```

Word Cloud for tweets that have negative sentiments;

```

# Creating wordcloud for negative sentiment

create_wordcloud(tw_list_negative[“text”].values)

```

Let’s calculate the tweet length and word count. So you can see the density of words and characters used in tweets based on different sentiment.

```

# Calculating tweet’s lenght and word count

tw_list[‘text_len’] = tw_list[‘text’].astype(str).apply(len)

tw_list[‘text_word_count’] = tw_list[‘text’].apply(lambda x: len(str(x).split()))

round(pd.DataFrame(tw_list.groupby("sentiment").text_len.mean()),2)

```

```

round(pd.DataFrame(tw_list.groupby(“sentiment”).text_word_count.mean()),2)

```

Applying count vectorizer provides the capability to preprocess your text data prior to generating the vector representation making it a highly flexible feature representation module for text. After count vectorizer, it is possible to analyze the words with two or three or whatever you want.

Applying stemmer is also provides the root of words. So you can eliminate words that come from the same root, such as -

connect

connection

connected

connections

connects

comes from “connect”. If you apply the stemmer function, you can consider these all words as same

```

# Removing Punctuation

def remove_punct(text):

 text = “”.join([char for char in text if char not in string.punctuation])

 text = re.sub(‘[0–9]+’, ‘’, text)

 return text

tw_list[‘punct’] = tw_list[‘text’].apply(lambda x: remove_punct(x))

```

```

#Appliyng tokenization

def tokenization(text):

    text = re.split('\W+', text)

    return text

tw_list['tokenized'] = tw_list['punct'].apply(lambda x: tokenization(x.lower()))

```

```

# Removing stopwords

stopword = nltk.corpus.stopwords.words('english')

def remove_stopwords(text):

    text = [word for word in text if word not in stopword]

    return text

    

tw_list['nonstop'] = tw_list['tokenized'].apply(lambda x: remove_stopwords(x))

```

```

# Appliyng Stemmer

ps = nltk.PorterStemmer()

def stemming(text):

    text = [ps.stem(word) for word in text]

    return text

tw_list['stemmed'] = tw_list['nonstop'].apply(lambda x: stemming(x))

```

```

# Cleaning Text

def clean_text(text):

    text_lc = "".join([word.lower() for word in text if word not in string.punctuation]) # remove puntuation

    text_rc = re.sub('[0-9]+', '', text_lc)

    tokens = re.split('\W+', text_rc)    # tokenization

    text = [ps.stem(word) for word in tokens if word not in stopword]  # remove stopwords and stemming

    return text

tw_list.head()

```

After applying countverctorizer, two results show us all 1281 tweets have 2966 unique words.

If you have a look at our data frame, you can see new features such as punct, tokenized, nonstop, stemmed.

Now, you can apply coun vectorizer the see all 2966 unique words as a new features.

```

# Appliyng Countvectorizer

countVectorizer = CountVectorizer(analyzer=clean_text) 

countVector = countVectorizer.fit_transform(tw_list[‘text’])

print(‘{} Number of reviews has {} words’.format(countVector.shape[0], countVector.shape[1]))

# print(countVectorizer.get_feature_names())

1281 Number of reviews has 2966 words

count_vect_df = pd.DataFrame(countVector.toarray(), columns=countVectorizer.get_feature_names())

count_vect_df.head()

```

You can sort values as a descending to see most used words

```

# Most Used Words

count = pd.DataFrame(count_vect_df.sum())

countdf = count.sort_values(0,ascending=False).head(20)

countdf[1:11]

```

Building n gram model helps us to predict most probably word that might follow this sequence. Firstly let’s create a function then built n2_bigram, n3_trigram etc.

```

# Function to ngram

def get_top_n_gram(corpus,ngram_range,n=None):

 vec = CountVectorizer(ngram_range=ngram_range,stop_words = ‘english’).fit(corpus)

 bag_of_words = vec.transform(corpus)

 sum_words = bag_of_words.sum(axis=0) 

 words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]

 words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)

 return words_freq[:n]

#n2_bigram

n2_bigrams = get_top_n_gram(tw_list[‘text’],(2,2),20)

n2_bigrams

```

```

#n3_trigram

n3_trigrams = get_top_n_gram(tw_list[‘text’],(3,3),20)

n3_trigrams

```

Finally, you can analyze sentiment using tweets and you can realize which words most used and which words used together.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mondalbidisha/twitter-sentiment-analysis

Awesome Lists containing this project

README