https://github.com/mratanusarkar/twitter-sentiment-analysis

a demo poc for sentiment analysis of tweets
https://github.com/mratanusarkar/twitter-sentiment-analysis
Last synced: 4 months ago
JSON representation
a demo poc for sentiment analysis of tweets
Host: GitHub
URL: https://github.com/mratanusarkar/twitter-sentiment-analysis
Owner: mratanusarkar
Created: 2023-02-04T15:25:09.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-06-07T19:44:52.000Z (over 1 year ago)
Last Synced: 2025-04-07T06:24:37.707Z (7 months ago)
Language: Jupyter Notebook
Size: 3.44 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 8
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # twitter-sentiment-analysis

This is a demo poc for sentiment analysis of tweets. 

The repo is divided into:

- [Notebooks](https://github.com/mratanusarkar/twitter-sentiment-analysis/tree/main/Notebooks)

- [Scripts](https://github.com/mratanusarkar/twitter-sentiment-analysis/tree/main/Runner)

Where you can find: 

- various experiments with Twitter API

- ways to scrape and collect tweet data using various kinds of search parameters 

- perform analysis and visualizations using the collected data

- Sentiment analysis and Insights using NLP

In the **Notebooks**, and a script/module format of the same in **Runner** folder for background running jobs.

# Features

This repo is still a work-in-progress. 


Some of the features currently implemented are as follows:

- Tweet Scraper

- Twitter Word Cloud

- (more features coming soon...)

---

# Tweet Scraper

This is a scrapper function used to gather and collect tweets, powered by [snscrape](https://github.com/JustAnotherArchivist/snscrape). 


compared to tweet API v2, this enables us to get unlimited tweets without any restrictions and without the need to get API tokens and secrets.

## usage:

Here is a sample usage:

```python

from module.scraper import TweetScraper

# create helper objects

tweet_scraper = TweetScraper()

# set parameters

query = '@isro'

limit = 1000

# scrape tweets

rawData = tweet_scraper.get_tweets(query, limit)

```

This will return a pandas dataframe containing last 1000 tweets from @isro. 


see the function signature below to get more details on function parameters.

## parameters

| Parameter | Data Type        | Description                                               | More Details                                                                                                                                                               |

|-----------|------------------|-----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| query     | string           | twitter search query as per https://twitter.com/search?q= | it can be a user mention like: `@user` or hashtag like `#tag` or a word like `text` or a complex query joined by  `AND`, `OR`, or statement enclosed in `()`. Explore twitter.com/search-advanced to know more. |

| limit     | int              | number of tweets you want to scrape                       | depending on number of tweets, the script will take time to execute. example: 100 tweets will be collected in 1s, where as 10,000 might take 5min and 1,00,000 may take 1h. |

| return    | pandas dataframe | a pandas dataframe with the tweets                        | as of now, the following data fields are collected: `id`, `date`, `username`, `content`, `view_count`, `like_count`, `reply_count`, `retweet_count`, `quote_Count`, `url`  |

---

# Twitter Word Cloud

This is a visualization tool powered by [word_cloud](https://github.com/amueller/word_cloud).

Combined with the scraper function above, this tool gives you the capability to visualize what's going on in twitter at a glance!

In short, it uses all the tweets and counts the most occurring words in the tweets. It discards the common english words, and non-english characters, does pre-processing and data cleaning,

and In the end, you get a word cloud that gives insight into your search query.

For example:

- you can input @user and see what's going on with the user's timeline at one go!

- you may input a trending #hashtag and take a look on what twitter has to say on the trend/issue/event at one go!

- it's left to the end user on how they may use this tool and get powerful visualization. the possibilities are limitless!

I am sharing a few use-cases below.

## sample use case:

Here is a sample word cloud generated using `limit: 10,000` and `query: ISRO (#SSLVD2 OR #ISRO)` at resolution: `width, height: 1080, 720` during the SSLV-D2 Launch on 10th Feb, 2023. You can clearly see how Twitter was looking that day during the Launch, in just one snapshot!

![ISRO SLVD2 Launch](https://user-images.githubusercontent.com/34891206/219942847-9329f7b1-7913-4d23-9222-a0553f50d9ff.png)

## usage:

Here is a sample usage:

```python

from module.scraper import TweetScraper

from module.generator import TwitterWordCloud

# create helper objects

tweet_scraper = TweetScraper()

tweet_wc = TwitterWordCloud()

# set parameters

topic_title = 'ISRO During SSLV-D2 Launch'

query = 'ISRO (#SSLVD2 OR #ISRO)'

limit = 1000

exclude_words = ['amp', 'eval']

# scrape tweets

rawData = tweet_scraper.get_tweets(query, limit)

tweet_wc.generate_word_cloud_v2(rawData, topic_title, exclude_words, 1080, 720)

```

This will generate a wordcloud using last 1000 tweets made during the ISRO SSLV-D2 Launch.

see the function signature below to get more details on function parameters.

## parameters

### function: generate_word_cloud():

a simple generator function with with only one required parameter (the dataframe) for quick easy word cloud generation. 


The output image is (1000px, 500px) in a (15, 8) inch canvas.

| Parameter           | Data Type        | Description                                              | More Details                                                                                                            |

|---------------------|------------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|

| rawData             | pandas dataframe | pandas dataframe from scraper function                   |                                                                                                                         |

| force_exclude_words | list of strings  | words you wish to exclude from word cloud                | after seeing an output, if you feel some words from the image that you wish to exclude, you can do so using this option |

| return              | None             | it generates and display the wordcloud, and saves as png |                                                                                                                         |

### generate_word_cloud_v2():

a move customizable and generic function with the following parameters

| Parameter           | Required         | Data Type        | Description                                                    | More Details                                                                                                            |

|---------------------|------------------|------------------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|

| rawData             | Yes              | pandas dataframe | pandas dataframe from scraper function                         |                                                                                                                         |

| topic_title         | Yes              | string           | a short string describing the topic of tweets in the dataframe | the output files will have the same name as the topic                                                                   |

| force_exclude_words | No, default []   | list of strings  | words you wish to exclude from word cloud                      | after seeing an output, if you feel some words from the image that you wish to exclude, you can do so using this option |

| width               | No, default 1000 | int              | number of pixels wide of the output image                      |                                                                                                                         |

| height              | No, default 500  | int              | number of pixels height of the output image                    |                                                                                                                         |

| dpi                 | No, default 100  | int              | pixel density per inch                                         |                                                                                                                         |

| return              | NA               | None             | it generates and display the wordcloud, and saves as png       |                                                                                                                         |

---
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mratanusarkar/twitter-sentiment-analysis

Awesome Lists containing this project

README