https://github.com/paraglondhe098/sentiment-classification-llm

Implemented and fine-tuned BERT for a custom sequence classification task, leveraging LoRA adapters for efficient parameter updates and 4-bit quantization to optimize performance and resource utilization.
https://github.com/paraglondhe098/sentiment-classification-llm

data-augmentation llm llm-fine-tuning llm-quantization lora nlp nlp-augmentation peft-fine-tuning-llm qlora quantization

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/paraglondhe098/sentiment-classification-llm
Owner: paraglondhe098
Created: 2024-12-29T19:09:56.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-12-30T03:28:27.000Z (over 1 year ago)
Last Synced: 2025-02-23T22:27:51.746Z (over 1 year ago)
Topics: data-augmentation, llm, llm-fine-tuning, llm-quantization, lora, nlp, nlp-augmentation, peft-fine-tuning-llm, qlora, quantization
Language: Jupyter Notebook
Homepage:
Size: 6.66 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

          # Sentiment Analysis of Video Game Reviews

## Project Overview

This project focuses on sentiment analysis of video game reviews, leveraging advanced natural language processing techniques to classify reviews as positive or negative. Despite the dataset's high imbalance (80% positive reviews), the project achieved significant accuracy improvements through data augmentation and model optimization.

---

## Dataset

- **Source:** Steam Games Reviews

- **Nature:** Highly imbalanced dataset (80% positive reviews)

---

## Methodology

### Data Augmentation

- Utilized **NLPaug** for contextual data augmentation.

- Augmented minority class samples using **RoBERTa** to enhance class balance.

### Models Implemented

1. **LSTM (4 layers)**

   - Accuracy: **86%**

2. **Bi-LSTM (3 layers)**

   - Accuracy: **85%**

3. **BERT Sequence Classifier**

   - Trained classifier layers only: **82% accuracy**

4. **BERT Sequence Classifier with LoRA**

   - Trained classifier layers with additional layers using **LoRA (Low-Rank Adaptation)** and **4-bit quantization**: **92% accuracy**

---

## Highlights

- **Data Augmentation:** Improved class balance with contextual augmentation using **RoBERTa**.

- **Progressive Model Development:**

  - Transitioned from basic LSTM models to transformer-based architectures.

  - Implemented **LoRA** for parameter-efficient fine-tuning.

  - Optimized performance and resource utilization using **4-bit quantization**.

- Achieved a significant accuracy boost (92%) with the advanced BERT-based approach.

---

## Dependencies

- **Python 3.8+**

- **PyTorch 1.12+**

- **Hugging Face Transformers**

- **NLPaug**

- **RoBERTa Pretrained Model**

Install dependencies using:

```bash

pip install torch transformers nlpaug

```

---

[//]: # (## How to Run)

[//]: # (1. Clone the repository:)

[//]: # (   ```bash)

[//]: # (   git clone )

[//]: # (   cd )

[//]: # (   ```)

[//]: # (2. Install dependencies (see above).)

[//]: # (3. Prepare the dataset:)

[//]: # (   - Place the Steam reviews dataset in the `data/` directory.)

[//]: # (   - Ensure the file structure matches the preprocessing script requirements.)

[//]: # (4. Run the training script:)

[//]: # (   ```bash)

[//]: # (   python train.py --model  --augment )

[//]: # (   ```)

[//]: # (   Replace `` with `lstm`, `bi-lstm`, or `bert`.)

[//]: # ()

[//]: # (---)

## Results

| Model                         | Accuracy |

|-------------------------------|----------|

| LSTM (4 layers)               | 86%      |

| Bi-LSTM (3 layers)            | 85%      |

| BERT Sequence Classifier      | 82%      |

| BERT + LoRA + 4-bit Quantization | 92%      |

---

## Future Work

- Explore other transformer architectures like **DeBERTa** or **DistilBERT**.

- Fine-tune models on a broader set of game reviews from other platforms.

- Implement more robust augmentation strategies.

---

[//]: # (## License)

[//]: # (This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.)

[//]: # ()

[//]: # (---)

## Acknowledgments

- Hugging Face for the Transformers library.

- Steam community for the dataset.

- NLPaug library for augmentation techniques.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/paraglondhe098/sentiment-classification-llm

Awesome Lists containing this project

README