https://github.com/paraglondhe098/sentiment-classification-llm
Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.
https://github.com/paraglondhe098/sentiment-classification-llm
data-augmentation llm llm-fine-tuning llm-quantization lora nlp nlp-augmentation peft-fine-tuning-llm qlora quantization
Last synced: over 1 year ago
JSON representation
Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.
- Host: GitHub
- URL: https://github.com/paraglondhe098/sentiment-classification-llm
- Owner: paraglondhe098
- Created: 2024-12-29T19:09:56.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-12-30T03:28:27.000Z (over 1 year ago)
- Last Synced: 2025-01-05T17:14:02.978Z (over 1 year ago)
- Topics: data-augmentation, llm, llm-fine-tuning, llm-quantization, lora, nlp, nlp-augmentation, peft-fine-tuning-llm, qlora, quantization
- Language: Jupyter Notebook
- Homepage:
- Size: 6.66 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Sentiment Analysis of Video Game Reviews
## Project Overview
This project focuses on sentiment analysis of video game reviews, leveraging advanced natural language processing techniques to classify reviews as positive or negative. Despite the dataset's high imbalance (80% positive reviews), the project achieved significant accuracy improvements through data augmentation and model optimization.
---
## Dataset
- **Source:** Steam Games Reviews
- **Nature:** Highly imbalanced dataset (80% positive reviews)
---
## Methodology
### Data Augmentation
- Utilized **NLPaug** for contextual data augmentation.
- Augmented minority class samples using **RoBERTa** to enhance class balance.
### Models Implemented
1. **LSTM (4 layers)**
- Accuracy: **86%**
2. **Bi-LSTM (3 layers)**
- Accuracy: **85%**
3. **BERT Sequence Classifier**
- Trained classifier layers only: **82% accuracy**
4. **BERT Sequence Classifier with LoRA**
- Trained classifier layers with additional layers using **LoRA (Low-Rank Adaptation)** and **4-bit quantization**: **92% accuracy**
---
## Highlights
- **Data Augmentation:** Improved class balance with contextual augmentation using **RoBERTa**.
- **Progressive Model Development:**
- Transitioned from basic LSTM models to transformer-based architectures.
- Implemented **LoRA** for parameter-efficient fine-tuning.
- Optimized performance and resource utilization using **4-bit quantization**.
- Achieved a significant accuracy boost (92%) with the advanced BERT-based approach.
---
## Dependencies
- **Python 3.8+**
- **PyTorch 1.12+**
- **Hugging Face Transformers**
- **NLPaug**
- **RoBERTa Pretrained Model**
Install dependencies using:
```bash
pip install torch transformers nlpaug
```
---
[//]: # (## How to Run)
[//]: # (1. Clone the repository:)
[//]: # ( ```bash)
[//]: # ( git clone )
[//]: # ( cd )
[//]: # ( ```)
[//]: # (2. Install dependencies (see above).)
[//]: # (3. Prepare the dataset:)
[//]: # ( - Place the Steam reviews dataset in the `data/` directory.)
[//]: # ( - Ensure the file structure matches the preprocessing script requirements.)
[//]: # (4. Run the training script:)
[//]: # ( ```bash)
[//]: # ( python train.py --model --augment )
[//]: # ( ```)
[//]: # ( Replace `` with `lstm`, `bi-lstm`, or `bert`.)
[//]: # ()
[//]: # (---)
## Results
| Model | Accuracy |
|-------------------------------|----------|
| LSTM (4 layers) | 86% |
| Bi-LSTM (3 layers) | 85% |
| BERT Sequence Classifier | 82% |
| BERT + LoRA + 4-bit Quantization | 92% |
---
## Future Work
- Explore other transformer architectures like **DeBERTa** or **DistilBERT**.
- Fine-tune models on a broader set of game reviews from other platforms.
- Implement more robust augmentation strategies.
---
[//]: # (## License)
[//]: # (This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.)
[//]: # ()
[//]: # (---)
## Acknowledgments
- Hugging Face for the Transformers library.
- Steam community for the dataset.
- NLPaug library for augmentation techniques.