https://github.com/samehinttech/sentiment-analysis-customer-reviews
https://github.com/samehinttech/sentiment-analysis-customer-reviews
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/samehinttech/sentiment-analysis-customer-reviews
- Owner: samehinttech
- License: apache-2.0
- Created: 2025-12-05T09:14:39.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-02-20T06:46:58.000Z (4 months ago)
- Last Synced: 2026-03-17T02:32:09.883Z (3 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 2.58 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sentiment-analysis-customer-reviews
[](LICENSE)

[](https://www.jetbrains.com/pycharm/)

[](https://github.com/samehinttech/sentiment-analysis-customer-reviews/commits/main)
## Project Overview
This repository contains the deliverables for a group project completed by BIT students at the **FHNW University of Applied Sciences and Arts Northwestern Switzerland**.
The project focuses on BI and data analytics solution using a real-world customer feedback dataset. The primary goal is to apply data science and Natural Language Processing (NLP) techniques to extract actionable business insights.
---
## Implementation
### Pipeline Overview
```
Raw Reviews → Text Preprocessing → Feature Extraction → Sentiment Classification → Feature Analysis → Export
```
### Notebook Structure
1. **Part 1** – Libraries Import
2. **Part 2** – Exploratory Data Analysis (EDA)
3. **Part 3** – Text Preprocessing (cleaning, tokenization, lemmatization)
4. **Part 4** – Feature Extraction (TF-IDF vectorization)
5. **Part 5** – Sentiment Classification Models (VADER, NB, LR, BERT)
6. **Part 6** – Topic Modeling (LDA) & Feature-Based Sentiment Analysis
7. **Part 7** – Export Processed Data
8. **Part 8** – Conclusion
---
## Technology Stack
### NLP & Text Processing
- **NLTK** – Tokenization, stopword removal, lemmatization
- **TF-IDF** – Feature extraction for ML models
- **WordCloud** – Vocabulary visualization
### Sentiment Analysis Models
- **VADER** – Rule-based baseline (79.64% accuracy)
- **Naive Bayes** – Classical ML (100% accuracy)
- **Logistic Regression** – Classical ML (100% accuracy)
- **BERT** – Transformer model (91.76% accuracy)
### Topic Modeling
- **LDA (Latent Dirichlet Allocation)** – Discover topics in reviews
### Libraries
- **pandas, numpy** – Data manipulation
- **matplotlib, seaborn** – Visualization
- **scikit-learn** – ML models, TF-IDF, evaluation
- **transformers, torch** – BERT model
- **vaderSentiment** – VADER baseline
---
## Quick Start
### Prerequisites
- Python 3.12+
- NVIDIA GPU (optional, for faster BERT inference)
### Installation
1. **Clone the repository**
```bash
git clone https://github.com/samehinttech/sentiment-analysis-customer-reviews.git
cd sentiment-analysis-customer-reviews
```
2. **Create virtual environment**
```bash
python -m venv .venv
```
3. **Activate virtual environment**
```bash
.\.venv\Scripts\Activate.ps1
```
4. **Install dependencies**
```bash
pip install -r requirements.txt
```
5. **GPU Support (Optional)**
```bash
# For NVIDIA RTX 30/40 series (CUDA 12.4)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# For NVIDIA RTX 50 series (CUDA 13.0)
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130
```
6. **Run the notebook**
```bash
jupyter notebook notebooks/sentiment_analysis.ipynb
```
> **Note:** BERT model downloads automatically on first run (~500MB)
>
> **IMPORTANT NOTE** The notebook is designed to run from start to finish without interruptions.
Please ensure all cells are executed in order for proper functionality.
> Sorry for that but you need to be patient as some steps (like BERT inference) may take time depending on your hardware.
---
## References
### Official Tutorials
- [TensorFlow: Basic Text Classification (Sentiment Analysis)](https://www.tensorflow.org/tutorials/keras/text_classification)
- [TensorFlow: Classify Text with BERT](https://www.tensorflow.org/text/tutorials/classify_text_with_bert)
- [TensorFlow Hub: Text Classification with Movie Reviews](https://www.tensorflow.org/hub/tutorials/tf2_text_classification)
- [Hugging Face: Getting Started with Sentiment Analysis](https://huggingface.co/blog/sentiment-analysis-python)
### Dataset
- [Customer Sentiment Dataset on Kaggle](https://www.kaggle.com/datasets/kundanbedmutha/customer-sentiment-dataset)
### Official Documentation
- [Python Documentation](https://docs.python.org/3.13/contents.html)
- [Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html)
- [Seaborn Documentation](https://seaborn.pydata.org/tutorial.html)
- [Matplotlib Documentation](https://matplotlib.org/stable/users/index.html)
- [Scikit-learn Documentation](https://scikit-learn.org/stable/user_guide.html)
- [NLTK Documentation](https://www.nltk.org/)
- [spaCy Documentation](https://spacy.io/usage)
- [Transformers Documentation](https://huggingface.co/docs/transformers/index)
- [Hugging Face Models](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment)
- [TextBlob Documentation](https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis)
- [VADER Sentiment Analysis](https://vadersentiment.readthedocs.io/en/latest/pages/features_and_updates.html)
---
## Acknowledgement
We would like to thank our Teacher for his guidance and support throughout
this project. The teaching materials and tutorials provided were instrumental
in completing this work successfully.