https://github.com/pedasoft-consult/sentiment_analysis

This project involves analyzing customer reviews to classify them as positive or negative using Logistic Regression. The workflow includes text preprocessing, feature extraction, training a model, making predictions, and evaluating its performance.
https://github.com/pedasoft-consult/sentiment_analysis

nltk numpy pandas sklearn

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/pedasoft-consult/sentiment_analysis
Owner: Pedasoft-Consult
Created: 2025-02-12T12:37:10.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-02-12T13:08:09.000Z (12 months ago)
Last Synced: 2025-02-12T13:57:19.697Z (12 months ago)
Topics: nltk, numpy, pandas, sklearn
Language: Python
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text Classification with Logistic Regression for Sentiment Analysis

## Project Overview
This project involves analyzing customer reviews to classify them as **positive** or **negative** using **Logistic Regression**. The workflow includes text preprocessing, feature extraction, training a model, making predictions, and evaluating its performance.

## Dataset
The dataset consists of customer reviews labeled with sentiment scores:
- **Review**: The text of the customer’s review
- **Sentiment**: The target variable (1 = Positive, 0 = Negative)

**Dataset file**: `customer_reviews_sentiment.csv`

## Requirements
Install the necessary dependencies before running the code:
```bash
pip install pandas numpy scikit-learn nltk
```

## Workflow

### 1. Data Preprocessing
- Load the dataset
- Handle missing values
- Convert text to lowercase
- Remove special characters and punctuation
- Remove stop words (using NLTK or SpaCy)
- Tokenize the text

### 2. Feature Extraction
- Convert text into numerical features using:
- **Bag of Words (BoW)** or
- **TF-IDF (Term Frequency-Inverse Document Frequency)**
- Split the dataset into training (80%) and testing (20%) sets

### 3. Train a Logistic Regression Model
- Train a **Logistic Regression classifier** on extracted features
- Tune hyperparameters (experiment with regularization parameter **C**)

### 4. Make Predictions
- Predict the sentiment for the following reviews:
- `"This product is amazing! I love it."`
- `"It broke after one use, completely disappointed."`

### 5. Model Evaluation
- Compute **accuracy** on the test dataset
- Generate **confusion matrix** and **classification report** (precision, recall, F1-score)

### 6. Model Improvements
- Experiment with other classifiers (e.g., **Naive Bayes, SVM**)
- Compare their performance with Logistic Regression

## Deliverables
1. **Preprocessed Dataset**: Cleaned text data
2. **Feature-Engineered Dataset**: Extracted numerical features
3. **Trained Model**: Logistic Regression with optimized hyperparameters
4. **Model Evaluation**: Accuracy, confusion matrix, classification report
5. **Sample Predictions**: Results for provided test cases
6. **Model Comparison**: Performance of alternative classifiers

## How to Run the Code
1. Ensure the dataset is available as `customer_reviews_sentiment.csv`
2. Run the preprocessing and feature extraction scripts
3. Train the Logistic Regression model
4. Evaluate performance and compare models

## Author
**Pedahel Emmanuel Kojo**
Senior Software Engineer, Machine Learning Engineer at CSP

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pedasoft-consult/sentiment_analysis

Awesome Lists containing this project

README