https://github.com/atharvapathak/twitter_sentiment_analysis_project

Twitter sentiment analysis is the process of analyzing tweets posted on the Twitter platform to determine the overall sentiment expressed within them. It involves using natural language processing (NLP) and machine learning techniques to classify tweets.
https://github.com/atharvapathak/twitter_sentiment_analysis_project

api bag-of-words bert cnn data gbm nltk rnn spacy twitter

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/atharvapathak/twitter_sentiment_analysis_project
Owner: atharvapathak
License: mit
Created: 2024-04-09T14:20:14.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-04-09T14:34:35.000Z (about 1 year ago)
Last Synced: 2025-04-05T12:09:40.292Z (3 months ago)
Topics: api, bag-of-words, bert, cnn, data, gbm, nltk, rnn, spacy, twitter
Language: Python
Homepage:
Size: 20.8 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## 1. Technologies Used

1. Tweepy API
2. NLTK
3. BERT Model
4. Tensorflow
6. Seaborn
5. Streamlit

## 2. Project Description
### 2.1 Data Extraction and Preprocessing
We scraped data for each illness using the Tweepy API, based on keywords and phrases for each category.
Additionally, we scraped tweets that didn't contain these keywords. This data acted as the ‘neutral’ data.
The data was cleaned using libraries like regex, NLTK. Links, emojis, emoticons, and symbols were removed.

### 2.2 DL Model
We explored Transformer models and found that BERT(Bidirectional Encoder Representations from Transformers) was better-suited for sentiment analysis. We used a pretrained BERT model and fine-tuned it on our training data. We trained a model for each class.

The output given by the final layer was not fed to any activation function; it was instead given as input to a custom function to normalize and standardize the data. The function is given below:

### 2.3 Visualisation and Deployment
We used Seaborn to display the caculated level of Loneliness, Stress, and Anxiety for each user across time, thus enabling us to see how the user's mental state varied over time. Moreover, we estimate the weighted average for each category, over previous tweets **`[0:LOW,1:HIGH]`**.
Additonally, you can also view each specific tweet and its scores.
Deployment was done using Streamlit.

## 3. Files
* **`Cleaning Tweets.py`** - Script to clean scraped tweets
* **`Extracting Targeted Tweets.py`** - Script to scrape a user's Twitter information
* **`Streamlit Deployment.py`** - Script to deploy the project
* **`Streamlit Deployment.ipynb`** - Jupyter Notebook to deploy the project
* **Extracted Tweets** - Training Data
* **Training Models:**
* **`Anxiety Model.py`**
* **`Lonely Model.py`**
* **`Stress Model.py`**

## 4. References
* [Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey](https://arxiv.org/abs/2007.01127)
* [Studying expressions of loneliness in individuals using twitter: an observational study](https://bmjopen.bmj.com/content/bmjopen/9/11/e030355.full.pdf)
* [Understanding and Measuring Psychological Stress Using Social Media](https://static1.squarespace.com/static/53d29678e4b04e06965e9423/t/5ea0bea583b33b7bb006e140/1587592872890/2019UnderstandingStress.pdf)

## 5. License
[MIT](https://choosealicense.com/licenses/mit/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/atharvapathak/twitter_sentiment_analysis_project

Awesome Lists containing this project

README