https://github.com/lucianoayres/nlp-sentinel
NLP-Sentinel is a Natural Language Processing (NLP) project focused on sentiment analysis of consumer product reviews.
https://github.com/lucianoayres/nlp-sentinel
ai bert machine-learning nlp sentiment-analysis svm
Last synced: 3 months ago
JSON representation
NLP-Sentinel is a Natural Language Processing (NLP) project focused on sentiment analysis of consumer product reviews.
- Host: GitHub
- URL: https://github.com/lucianoayres/nlp-sentinel
- Owner: lucianoayres
- License: mit
- Created: 2024-11-09T20:01:21.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-11T08:44:26.000Z (3 months ago)
- Last Synced: 2025-03-11T09:23:54.864Z (3 months ago)
- Topics: ai, bert, machine-learning, nlp, sentiment-analysis, svm
- Language: Jupyter Notebook
- Homepage: https://colab.research.google.com/drive/1WwAcQrNEa5Md1hhqvQt9R4uDtFJ57JS8?usp=sharing
- Size: 8.36 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NLP-Sentinel

- ▶️ [Vídeo da Apresentação](https://drive.google.com/file/d/1TIxEj9jhdVLBvoXU1iSPaHxqvGfuRkXf/view?usp=sharing) (15 min)
- 📂 [Link Apresentação](https://docs.google.com/presentation/d/1GoP6OIU4CO6ypWyeoN47Ye6HG4E-Hq_R/edit?usp=sharing&ouid=114780034926001489401&rtpof=true&sd=true) (PPT)
- 💻 [Google Colab Notebook](https://colab.research.google.com/drive/1WwAcQrNEa5Md1hhqvQt9R4uDtFJ57JS8?usp=sharing)## Table of Contents
1. [Project Overview](#project-overview)
2. [Objectives](#objectives)
3. [Models and Approaches](#models-and-approaches)
4. [Data Description](#data-description)
5. [Sample Data](#sample-data)
6. [Running the Project](#running-the-project)
7. [Repository Structure](#repository-structure)
8. [Results and Performance](#results-and-performance)
- [Classical & Neural Models (SVM, BERT)](#classical--neural-models-svm-bert)
- [LLM In-Context Learning (OpenAI and Google Gemini)](#llm-in-context-learning-openai-and-google-gemini)
9. [Contributions](#contributions)---
## Project Overview
**NLP-Sentinel** is a Natural Language Processing (NLP) project focused on **sentiment analysis** of consumer product reviews. While the models implemented here do not target maximum predictive accuracy, the main goal is to explore and apply NLP training techniques and methodologies to a realistic scenario—analyzing consumer sentiments toward a smartphone product.
The name **NLP-Sentinel** conveys the idea of a "sentinel" continuously monitoring opinions to uncover trends and insights that drive informed decision-making.
---
## Objectives
1. **Sentiment Classification:** Identify whether a given product review is positive, neutral, or negative.
2. **Rating Correlation:** Examine the relationship between sentiment polarity and the consumers’ numerical ratings.
3. **Exploratory Analysis:** Identify prominent words, entities, bigrams, trigrams, and patterns in textual data.
4. **Model Comparison:** Compare the performance of various models, including classical machine learning approaches, transformer-based models (BERT), and Large Language Models (LLMs) via In-Context Learning.---
## Models and Approaches
- **SVM + Bag of Words (BoW):** Classic machine learning approach using simple text vectorization.
- **SVM + Embeddings (spaCy):** Using pre-trained Portuguese embeddings for semantic representation.
- **BERT (Fine-tuned):** Adapting a pre-trained BERT model (Portuguese) for sentiment classification.
- **In-Context Learning with LLMs:** Zero or few-shot classification using:
- **OpenAI GPT-4o**
- **Google Gemini**---
## Data Description
The dataset consists of:
- **review:** A free-form text evaluation of the smartphone.
- **rating (1 to 5):** A numerical score reflecting consumer satisfaction.From the rating, we derive sentiment labels as follows:
- Rating ≥ 4 → Positive
- Rating ≤ 2 → Negative
- Rating = 3 → Neutral---
## Sample Data
Below are some example entries from the dataset (`reviews.csv`):
| review | rating |
| ---------------------------------------------------------------------------------------- | ------ |
| "O produto é ok, nada demais. não se destaca no mercado." | 3 |
| "Decepcionante, não atendeu às expectativas." | 1 |
| "Não tenho muito o que comentar, cumpre o que promete." | 2 |
| "Gostei bastante, mas poderia ter mais funcionalidades." | 4 |
| "Excelente serviço, estou muito satisfeito porque atendeu todas as minhas expectativas." | 5 |---
## Running the Project
This project is designed to run in **Google Colab**, not locally. Use the following placeholder link to open the notebook in Colab:
**[Run in Google Colab](https://colab.research.google.com/drive/1WwAcQrNEa5Md1hhqvQt9R4uDtFJ57JS8?usp=sharing)**
Before running:
- Ensure that you have access to the necessary data files as described in the notebook.
- All required libraries are installed within the notebook.
- Generate and securely store your OpenAI and Google Gemini API keys as secrets in Google Colab.---
## Repository Structure
The repository is organized as follows:
```bash
nlp-sentinel/
├─ data/
│ └─ reviews.csv
├─ images/
│ └─ sentinel_banner.png
├─ notebook/
│ └─ Pos_Deep_Learning_Projeto_NPL_Sentinel_22_Dez_2024.ipynb
├─ python-src/
│ └─ pos_deep_learning_projeto_npl_sentinel_22_dez_2024.py
├─ .gitignore
├─ LICENSE
├─ Project NLP.pdf
└─ README.md
```**Key Files:**
- **[data/reviews.csv](data/reviews.csv)**: Contains the product reviews and corresponding ratings.
- **[images/sentinel_banner.png](images/sentinel_banner.png)**: Banner image for the project.
- **[notebook/Pos_Deep_Learning_Projeto_NPL_Sentinel_15_Dez_2024.ipynb](notebook/Pos_Deep_Learning_Projeto_NPL_Sentinel_22_Dez_2024.ipynb)**: Main Colab notebook with all code and analyses.
- **[python-src/post_deep_learning_projeto_npl_sentinel_22_dez_2024.py](src/pos_deep_learning_projeto_npl_sentinel_22_dez_2024.py)**: Python script with project-related code.---
## Results and Performance
### Classical & Neural Models (SVM, BERT)
| Model | Accuracy | F1-Score |
| ------------------ | -------- | -------- |
| SVM + Bag of Words | 0.640476 | 0.630571 |
| SVM + Embeddings | 0.600000 | 0.583039 |
| BERT | 0.642857 | 0.639727 |**Key Observation:**
Although not primarily focused on maximizing accuracy, BERT outperformed the SVM-based methods, achieving the highest accuracy and F1-Score among the tested models.---
### LLM In-Context Learning (OpenAI and Google Gemini)
**OpenAI GPT-4o Evaluation (Sampled Set):**
- Correct Classifications: 8/10 (80% accuracy for sampled set)
**Google Gemini Evaluation (Sampled Set):**
- Correct Classifications: 8/10 (80% accuracy for sampled set)
---
## Contributions
- **Authors:**
- Paloma Corrêa Alves (`[email protected]`)
- Luciano Ayres Farias de Carvalho (`[email protected]`)This project was developed as part of a **post-graduate** Deep Learning course at CIn - UFPE, under the supervision of Professor Luciano Barbosa.
**NLP-Sentinel** serves as a foundation for exploring and applying NLP techniques to a product-related sentiment analysis scenario. While high accuracy was not the main goal, these experiments provide a valuable learning experience and can be extended to other domains, languages, and more complex sentiment tasks.