https://github.com/eliask93/debertav3-for-aspect-based-sentiment-analysis
Application for training the pretrained transformer model DeBERTaV3 on an Aspect Based Sentiment Analysis task
https://github.com/eliask93/debertav3-for-aspect-based-sentiment-analysis
amazon-reviews aspect-based-sentiment-analysis deberta deberta-v3 nlp simpletransformers spacy
Last synced: 10 months ago
JSON representation
Application for training the pretrained transformer model DeBERTaV3 on an Aspect Based Sentiment Analysis task
- Host: GitHub
- URL: https://github.com/eliask93/debertav3-for-aspect-based-sentiment-analysis
- Owner: EliasK93
- Created: 2022-02-18T18:15:14.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-12-25T23:18:22.000Z (over 1 year ago)
- Last Synced: 2025-04-06T15:39:33.070Z (about 1 year ago)
- Topics: amazon-reviews, aspect-based-sentiment-analysis, deberta, deberta-v3, nlp, simpletransformers, spacy
- Language: Python
- Homepage:
- Size: 1.46 MB
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## DeBERTaV3 for Aspect Based Sentiment Analysis
Application for training the pretrained transformer model DeBERTaV3 (see paper [DeBERTaV3: Improving DeBERTa
using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing](https://arxiv.org/abs/2111.09543)) on an *Aspect Based Sentiment Analysis* task.
Aspect Based Sentiment Analysis is a Sequence Labeling task where product reviews are labeled with their
*aspects* as well as the detected *sentiments* towards each of these aspects.
Aspects in the context of product reviews are N-Grams explicitly mentioning specific functionalities, parts and related
services around the product, with the part of speech being limited to nouns, noun phrases or verbs.
|  |
|:-----------------------------------------:|
| *Example of an annotated product review* |
Training data source for the model were 1.570 sampled product reviews (5.872 sentences) from the [Amazon Review Dataset](https://nijianmo.github.io/amazon/index.html) -
specifically from the five product categories `Laptops`, `Cell Phones`, `Mens Running Shoes`, `Vacuums`, `Plush Figures` -
which I manually annotated for my bachelor's thesis following a modified version of the [SemEval2014 Aspect Based Sentiment Analysis guidelines](http://alt.qcri.org/semeval2014/task4/data/uploads/semeval14_absa_annotationguidelines.pdf) and the annotation tool [Universal Data Tool](https://udt.dev).
The model was trained for 10 epochs on the combined dataset from all five categories (training time: 02h:05m:03s on NVIDIA GeForce GTX 1660 Ti).
Model training, evaluation and inference is implemented using the wrapper [simpletransformers](https://simpletransformers.ai/) which uses [huggingface](https://huggingface.co/).
Since it requires word tokenized and sentence tokenized inputs, the raw text is first pre-processed using [SpaCy](https://spacy.io/).
The frontend and routing is implemented in [Flask](https://flask.palletsprojects.com), using [Jinja](https://jinja.palletsprojects.com) as Template Engine for rendering the HTML and [Bootstrap](https://getbootstrap.com/) for the frontend design.
### Model Evaluation on Test Set
| Metric | microsoft/deberta-v3-base |
|:---------------------------------:|:---------------------------:|
| Precision | 0.659 |
| Recall | 0.691 |
| Micro F1-Score | 0.675 |
### Examples of product reviews labeled by the model
##### Trained category (Laptops), 5 stars:

##### Non-trained category (Power Drills), 4 stars:

##### Non-trained category (Backpacks), 1 star:

### Requirements
##### - Python >= 3.10
##### - Conda
- `pytorch==2.6.0`
- `cudatoolkit=12.6`
##### - pip
- `simpletransformers`
- `spacy`
- `pandas`
- `openpyxl`
- `tqdm`
- `flask`
##### - SpaCy models
- `en_core_web_lg`
### Notes
The uploaded versions of the training data in this repository are cut off after the first 1.000 rows of each file, the
real training data contains a combined ~90.000 rows. The trained model file `pytorch_model.bin` is omitted in this repository.