Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eliask93/debertav3-for-aspect-based-sentiment-analysis
Application for training the pretrained transformer model DeBERTaV3 on an Aspect Based Sentiment Analysis task
https://github.com/eliask93/debertav3-for-aspect-based-sentiment-analysis
aspect-based-sentiment-analysis deberta nlp simpletransformers spacy
Last synced: 2 months ago
JSON representation
Application for training the pretrained transformer model DeBERTaV3 on an Aspect Based Sentiment Analysis task
- Host: GitHub
- URL: https://github.com/eliask93/debertav3-for-aspect-based-sentiment-analysis
- Owner: EliasK93
- Created: 2022-02-18T18:15:14.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2022-02-18T22:14:31.000Z (almost 3 years ago)
- Last Synced: 2024-08-23T13:35:41.981Z (5 months ago)
- Topics: aspect-based-sentiment-analysis, deberta, nlp, simpletransformers, spacy
- Language: Python
- Homepage:
- Size: 1.46 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## DeBERTaV3 for Aspect Based Sentiment Analysis
Application for training the pretrained transformer model DeBERTaV3 (see paper [DeBERTaV3: Improving DeBERTa
using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing](https://arxiv.org/abs/2111.09543)) on an *Aspect Based Sentiment Analysis* task.Aspect Based Sentiment Analysis is a Sequence Labeling task where product reviews are labeled with their
*aspects* as well as the detected *sentiments* towards each of these aspects.
Aspects in the context of product reviews are N-Grams explicitly mentioning specific functionalities, parts and related
services around the product, with the part of speech being limited to nouns, noun phrases or verbs.| ![](imgs/absa_example.JPG) |
|:-----------------------------------------:|
| *Example of an annotated product review* |Training data source for the model were 1.570 sampled product reviews (5.872 sentences) from the [Amazon Review Dataset](https://nijianmo.github.io/amazon/index.html) -
specifically from the five product categories `Laptops`, `Cell Phones`, `Mens Running Shoes`, `Vacuums`, `Plush Figures` -
which I manually annotated for my bachelor's thesis following a modified version of the [SemEval2014 Aspect Based Sentiment Analysis guidelines](http://alt.qcri.org/semeval2014/task4/data/uploads/semeval14_absa_annotationguidelines.pdf) and the annotation tool [Universal Data Tool](https://udt.dev).The model was trained for 10 epochs on the combined dataset from all five categories (training time: 02h:05m:03s on NVIDIA GeForce GTX 1660 Ti).
Model training, evaluation and inference is implemented using the wrapper [simpletransformers](https://simpletransformers.ai/) which uses [huggingface](https://huggingface.co/).
Since it requires word tokenized and sentence tokenized inputs, the raw text is first pre-processed using [SpaCy](https://spacy.io/).The frontend and routing is implemented in [Flask](https://flask.palletsprojects.com), using [Jinja](https://jinja.palletsprojects.com) as Template Engine for rendering the HTML and [Bootstrap](https://getbootstrap.com/) for the frontend design.
### Model Evaluation on Test Set
| Metric | microsoft/deberta-v3-base |
|:---------------------------------:|:---------------------------:|
| Precision | 0.659 |
| Recall | 0.691 |
| Micro F1-Score | 0.675 |
### Examples of product reviews labeled by the model
##### Trained category (Laptops), 5 stars:
![](imgs/laptops_5.png)
##### Non-trained category (Power Drills), 4 stars:
![](imgs/power_drill_4.png)
##### Non-trained category (Backpacks), 1 star:
![](imgs/backpack_1.png)
### Requirements
##### - Python >= 3.8
##### - Conda
- `pytorch==1.7.1`
- `cudatoolkit=10.1`##### - pip
- `simpletransformers`
- `spacy`
- `pandas`
- `openpyxl`
- `tqdm`
- `flask`##### - SpaCy models
- `en_core_web_lg`
### Notes
The uploaded versions of the training data in this repository are cut off after the first 1000 rows of each file, the
real training data contains a combined ~90.000 rows. The trained model file `pytorch_model.bin` is omitted in this repository.