https://github.com/jim-by/tweet-sentiment-analysis

Sentiment analysis of tweets using TextBlob for labeling and RandomForest for classification.
https://github.com/jim-by/tweet-sentiment-analysis

nltk nltk-tokenizer numpy pandas python random-forest-classifier sklearn textblob-sentiment-analysis

Last synced: 3 months ago
JSON representation

Sentiment analysis of tweets using TextBlob for labeling and RandomForest for classification.

Host: GitHub
URL: https://github.com/jim-by/tweet-sentiment-analysis
Owner: Jim-by
License: mit
Created: 2025-04-02T11:44:05.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-04-02T11:53:03.000Z (3 months ago)
Last Synced: 2025-04-02T12:26:35.693Z (3 months ago)
Topics: nltk, nltk-tokenizer, numpy, pandas, python, random-forest-classifier, sklearn, textblob-sentiment-analysis
Language: Python
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Analyzing the tone of tweets using TextBlob and RandomForest

This project demonstrates the process of analyzing the tone of tweets. It involves pre-processing text, marking up the tonality using the TextBlob library, and training a RandomForestClassifier model to classify the tonality.

## Description

The main goal of the project is to demonstrate skills in Natural Language Processing (NLP) and machine learning for text analysis tasks.

**Includes the following steps:**

1. **Data Loading:** Reading tweets from a CSV file.
2. **Text preprocessing:**
** Tokenization
** Stop word removal
** Lemmatization
3. **Tone analysis using TextBlob:** Each tweet is assigned a tone score, based on which a label ('positive', 'negative', 'neutral') is determined.
4. **Preparing data for machine learning:**
* Converting text labels into numerical labels.
* Text vectorization using TF-IDF.
5. **Model training:**
* Splitting the data into training and test samples (this code uses a specific approach, see Note).
* Training the `RandomForestClassifier'.
* Selection of hyperparameters using `GridSearchCV`.
6. **Model Evaluation:** Evaluating the quality of the model using F1 measure.

## Requirements

The following libraries are required to run the project:

* Python 3.x
* The main dependencies are listed in the `requirements.txt` file.

Additionally, the script loads NLTK resources: 'punkt', 'stopwords', 'wordnet'.

## Installation

1. Clone the repository:
```bash
git clone https://github.com/Jim-by/tweet-sentiment-analysis.git
cd tweet-sentiment-analysis
```

2. (Recommended) Create and activate a virtual environment:
````bash
python -m venv venv
source venv/bin/activate # For Linux/Mac
# venv\Scripts\activate # For Windows
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

1. Make sure the `submission.csv` file is in the `data/` folder and contains a `selected_text` column with the texts of the tweets.
2. Run the script:
``bash
python src/sentiment_analysis.py
```

**Expected output:**

* The console will display progress messages, best model parameters and F1 measure.
* A `tweets_sentiments.csv` file will be created in the `data/` folder containing the original tweets, TextBlob analysis results, and preprocessed text.

## Note on model estimation

In this script, tone labels are generated programmatically using TextBlob on the entire dataset. A machine learning model is then trained to predict these labels. The evaluation is performed on the same dataset on which the labels were generated.

This demonstrates the model's ability to approximate TextBlob logic based on TF-IDF features. Evaluation on fully independent data would require “true” tone labels assigned by a human or other reliable source.

## Possible Improvements

* Use of more advanced Word Embeddings techniques such as Word2Vec, GloVe, or BERT.
* Utilizing other classification models.
* More thorough cleaning and preparation of text data.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jim-by/tweet-sentiment-analysis

Awesome Lists containing this project

README