Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mratanusarkar/twitter-sentiment-analysis
a demo poc for sentiment analysis of tweets
https://github.com/mratanusarkar/twitter-sentiment-analysis
Last synced: 20 days ago
JSON representation
a demo poc for sentiment analysis of tweets
- Host: GitHub
- URL: https://github.com/mratanusarkar/twitter-sentiment-analysis
- Owner: mratanusarkar
- Created: 2023-02-04T15:25:09.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-07T19:44:52.000Z (7 months ago)
- Last Synced: 2024-06-07T20:52:57.539Z (7 months ago)
- Language: Jupyter Notebook
- Size: 3.44 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# twitter-sentiment-analysis
This is a demo poc for sentiment analysis of tweets.
The repo is divided into:
- [Notebooks](https://github.com/mratanusarkar/twitter-sentiment-analysis/tree/main/Notebooks)
- [Scripts](https://github.com/mratanusarkar/twitter-sentiment-analysis/tree/main/Runner)Where you can find:
- various experiments with Twitter API
- ways to scrape and collect tweet data using various kinds of search parameters
- perform analysis and visualizations using the collected data
- Sentiment analysis and Insights using NLPIn the **Notebooks**, and a script/module format of the same in **Runner** folder for background running jobs.
# Features
This repo is still a work-in-progress.
Some of the features currently implemented are as follows:- Tweet Scraper
- Twitter Word Cloud
- (more features coming soon...)---
# Tweet Scraper
This is a scrapper function used to gather and collect tweets, powered by [snscrape](https://github.com/JustAnotherArchivist/snscrape).
compared to tweet API v2, this enables us to get unlimited tweets without any restrictions and without the need to get API tokens and secrets.## usage:
Here is a sample usage:
```python
from module.scraper import TweetScraper# create helper objects
tweet_scraper = TweetScraper()# set parameters
query = '@isro'
limit = 1000# scrape tweets
rawData = tweet_scraper.get_tweets(query, limit)
```
This will return a pandas dataframe containing last 1000 tweets from @isro.
see the function signature below to get more details on function parameters.## parameters
| Parameter | Data Type | Description | More Details |
|-----------|------------------|-----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| query | string | twitter search query as per https://twitter.com/search?q= | it can be a user mention like: `@user` or hashtag like `#tag` or a word like `text` or a complex query joined by `AND`, `OR`, or statement enclosed in `()`. Explore twitter.com/search-advanced to know more. |
| limit | int | number of tweets you want to scrape | depending on number of tweets, the script will take time to execute. example: 100 tweets will be collected in 1s, where as 10,000 might take 5min and 1,00,000 may take 1h. |
| return | pandas dataframe | a pandas dataframe with the tweets | as of now, the following data fields are collected: `id`, `date`, `username`, `content`, `view_count`, `like_count`, `reply_count`, `retweet_count`, `quote_Count`, `url` |---
# Twitter Word Cloud
This is a visualization tool powered by [word_cloud](https://github.com/amueller/word_cloud).
Combined with the scraper function above, this tool gives you the capability to visualize what's going on in twitter at a glance!
In short, it uses all the tweets and counts the most occurring words in the tweets. It discards the common english words, and non-english characters, does pre-processing and data cleaning,
and In the end, you get a word cloud that gives insight into your search query.For example:
- you can input @user and see what's going on with the user's timeline at one go!
- you may input a trending #hashtag and take a look on what twitter has to say on the trend/issue/event at one go!
- it's left to the end user on how they may use this tool and get powerful visualization. the possibilities are limitless!I am sharing a few use-cases below.
## sample use case:
Here is a sample word cloud generated using `limit: 10,000` and `query: ISRO (#SSLVD2 OR #ISRO)` at resolution: `width, height: 1080, 720` during the SSLV-D2 Launch on 10th Feb, 2023. You can clearly see how Twitter was looking that day during the Launch, in just one snapshot!![ISRO SLVD2 Launch](https://user-images.githubusercontent.com/34891206/219942847-9329f7b1-7913-4d23-9222-a0553f50d9ff.png)
## usage:
Here is a sample usage:
```python
from module.scraper import TweetScraper
from module.generator import TwitterWordCloud# create helper objects
tweet_scraper = TweetScraper()
tweet_wc = TwitterWordCloud()# set parameters
topic_title = 'ISRO During SSLV-D2 Launch'
query = 'ISRO (#SSLVD2 OR #ISRO)'
limit = 1000
exclude_words = ['amp', 'eval']# scrape tweets
rawData = tweet_scraper.get_tweets(query, limit)
tweet_wc.generate_word_cloud_v2(rawData, topic_title, exclude_words, 1080, 720)
```This will generate a wordcloud using last 1000 tweets made during the ISRO SSLV-D2 Launch.
see the function signature below to get more details on function parameters.## parameters
### function: generate_word_cloud():
a simple generator function with with only one required parameter (the dataframe) for quick easy word cloud generation.
The output image is (1000px, 500px) in a (15, 8) inch canvas.| Parameter | Data Type | Description | More Details |
|---------------------|------------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| rawData | pandas dataframe | pandas dataframe from scraper function | |
| force_exclude_words | list of strings | words you wish to exclude from word cloud | after seeing an output, if you feel some words from the image that you wish to exclude, you can do so using this option |
| return | None | it generates and display the wordcloud, and saves as png | |### generate_word_cloud_v2():
a move customizable and generic function with the following parameters| Parameter | Required | Data Type | Description | More Details |
|---------------------|------------------|------------------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
| rawData | Yes | pandas dataframe | pandas dataframe from scraper function | |
| topic_title | Yes | string | a short string describing the topic of tweets in the dataframe | the output files will have the same name as the topic |
| force_exclude_words | No, default [] | list of strings | words you wish to exclude from word cloud | after seeing an output, if you feel some words from the image that you wish to exclude, you can do so using this option |
| width | No, default 1000 | int | number of pixels wide of the output image | |
| height | No, default 500 | int | number of pixels height of the output image | |
| dpi | No, default 100 | int | pixel density per inch | |
| return | NA | None | it generates and display the wordcloud, and saves as png | |---