Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/atharvapathak/twitter_sentiment_analysis_project
Twitter sentiment analysis is the process of analyzing tweets posted on the Twitter platform to determine the overall sentiment expressed within them. It involves using natural language processing (NLP) and machine learning techniques to classify tweets.
https://github.com/atharvapathak/twitter_sentiment_analysis_project
api bag-of-words bert cnn data gbm nltk rnn spacy twitter
Last synced: about 2 months ago
JSON representation
Twitter sentiment analysis is the process of analyzing tweets posted on the Twitter platform to determine the overall sentiment expressed within them. It involves using natural language processing (NLP) and machine learning techniques to classify tweets.
- Host: GitHub
- URL: https://github.com/atharvapathak/twitter_sentiment_analysis_project
- Owner: atharvapathak
- License: mit
- Created: 2024-04-09T14:20:14.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-04-09T14:34:35.000Z (10 months ago)
- Last Synced: 2024-12-18T18:13:06.528Z (about 2 months ago)
- Topics: api, bag-of-words, bert, cnn, data, gbm, nltk, rnn, spacy, twitter
- Language: Python
- Homepage:
- Size: 20.8 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## 1. Technologies Used
1. Tweepy API
2. NLTK
3. BERT Model
4. Tensorflow
6. Seaborn
5. Streamlit## 2. Project Description
### 2.1 Data Extraction and Preprocessing
We scraped data for each illness using the Tweepy API, based on keywords and phrases for each category.
Additionally, we scraped tweets that didn't contain these keywords. This data acted as the ‘neutral’ data.
The data was cleaned using libraries like regex, NLTK. Links, emojis, emoticons, and symbols were removed.### 2.2 DL Model
We explored Transformer models and found that BERT(Bidirectional Encoder Representations from Transformers) was better-suited for sentiment analysis. We used a pretrained BERT model and fine-tuned it on our training data. We trained a model for each class.
The output given by the final layer was not fed to any activation function; it was instead given as input to a custom function to normalize and standardize the data. The function is given below:
### 2.3 Visualisation and Deployment
We used Seaborn to display the caculated level of Loneliness, Stress, and Anxiety for each user across time, thus enabling us to see how the user's mental state varied over time. Moreover, we estimate the weighted average for each category, over previous tweets **`[0:LOW,1:HIGH]`**.
Additonally, you can also view each specific tweet and its scores.
Deployment was done using Streamlit.## 3. Files
* **`Cleaning Tweets.py`** - Script to clean scraped tweets
* **`Extracting Targeted Tweets.py`** - Script to scrape a user's Twitter information
* **`Streamlit Deployment.py`** - Script to deploy the project
* **`Streamlit Deployment.ipynb`** - Jupyter Notebook to deploy the project
* **Extracted Tweets** - Training Data
* **Training Models:**
* **`Anxiety Model.py`**
* **`Lonely Model.py`**
* **`Stress Model.py`**## 4. References
* [Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey](https://arxiv.org/abs/2007.01127)
* [Studying expressions of loneliness in individuals using twitter: an observational study](https://bmjopen.bmj.com/content/bmjopen/9/11/e030355.full.pdf)
* [Understanding and Measuring Psychological Stress Using Social Media](https://static1.squarespace.com/static/53d29678e4b04e06965e9423/t/5ea0bea583b33b7bb006e140/1587592872890/2019UnderstandingStress.pdf)## 5. License
[MIT](https://choosealicense.com/licenses/mit/)