https://github.com/githubasr2001/x_sentimentanalysis

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/githubasr2001/x_sentimentanalysis
Owner: githubasr2001
Created: 2024-02-16T03:27:23.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-16T03:30:17.000Z (over 2 years ago)
Last Synced: 2024-02-16T04:28:31.040Z (over 2 years ago)
Language: Jupyter Notebook
Size: 1.74 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Twitter Sentiment Analysis on Vaccination Tweets

## Project Overview

This project focuses on analyzing public sentiment towards vaccinations through tweets collected from Twitter. Using Python, NLP techniques, and machine learning, we aim to uncover the general public's opinions, concerns, and attitudes towards vaccination efforts globally.

## Motivation

In the wake of global health challenges, understanding public sentiment towards vaccinations is crucial for policymakers, health organizations, and the general public. This project seeks to provide insights into these sentiments by analyzing tweets related to vaccinations.

## Data Source

The dataset `vaccination_all_tweets.csv` contains tweets related to vaccinations, including user information and tweet content. This data serves as the foundation for our sentiment analysis.

## Requirements

- Python 3.x
- Pandas
- NumPy
- NLTK
- Scikit-learn
- Matplotlib
- Seaborn
- TextBlob

## Installation and Setup

Ensure you have Python installed on your system. You can then install the necessary libraries using pip:

```bash
pip install pandas numpy nltk scikit-learn matplotlib seaborn textblob
```

## Data Preprocessing

The preprocessing steps include:

- Cleaning text data by removing URLs, mentions, and special characters.
- Lowercasing all text for consistency.
- Tokenizing the text and removing stopwords.
- Applying lemmatization to get the root form of words.

## Sentiment Analysis

We used TextBlob to assign sentiment polarity to each tweet, categorizing them into positive, negative, or neutral sentiments.

## Machine Learning Model

A Logistic Regression model was trained on TF-IDF vectorized tweet texts to predict the sentiment of tweets. We used `CountVectorizer` to convert text data into a matrix of token counts and `TfidfTransformer` to compute TF-IDF values.

## Evaluation

The model's performance was evaluated using accuracy, precision, recall, and F1-score metrics.

## Visualization

We visualized the distribution of sentiments and created word clouds to display the most frequent words in positive, negative, and neutral tweets.

## How to Use

1. Clone the repository to your local machine.
2. Load your dataset or use the provided `vaccination_all_tweets.csv`.
3. Run the preprocessing script to clean and prepare the data.
4. Execute the sentiment analysis notebook to analyze and visualize sentiments.
5. Train the machine learning model using the training script.

## Future Work

- Explore more advanced models like neural networks for better accuracy.
- Implement other vectorization techniques like Word2Vec or GloVe.
- Expand the dataset to include tweets in different languages.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/githubasr2001/x_sentimentanalysis

Awesome Lists containing this project

README