https://github.com/farid-karimi/fine-grained-classification
Sentiment Analysis for Fine-Grained Classification
https://github.com/farid-karimi/fine-grained-classification
bert distilbert fine-grained-classification logistic-regression neuromatch-academy pytorch rnn sentiment-analysis svm
Last synced: 6 months ago
JSON representation
Sentiment Analysis for Fine-Grained Classification
- Host: GitHub
- URL: https://github.com/farid-karimi/fine-grained-classification
- Owner: Farid-Karimi
- License: mit
- Created: 2024-07-28T07:53:12.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-07-28T08:45:20.000Z (about 1 year ago)
- Last Synced: 2025-03-29T06:51:37.708Z (6 months ago)
- Topics: bert, distilbert, fine-grained-classification, logistic-regression, neuromatch-academy, pytorch, rnn, sentiment-analysis, svm
- Language: Jupyter Notebook
- Homepage:
- Size: 551 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Sentiment Analysis for Fine-Grained Classification
## Abstract
Sentiment analysis on movie reviews is crucial for understanding audience preferences. This study compares traditional machine learning (ML) and modern deep learning (DL) models for fine-grained sentiment classification using the SST-5 dataset of movie reviews with five sentiment classes. Logistic Regression, Random Forest, Support Vector Machine, a simple neural network, and LSTM achieved 36-40% accuracy. In contrast, pretrained transformer models DistilBERT and BERT outperformed with 51% and 53% accuracy, respectively. Oversampling decreased performance due to overfitting and noise amplification. The results highlight the effectiveness of pretrained DL models but also the challenges posed by frequent neutral texts and annotator bias in SST-5.
## Introduction
Fine-grained sentiment analysis allows for more precision and nuance in understanding human language compared to binary classification. However, it is more complex due to factors like sentence structure and dataset balance. The Stanford Sentiment Tree (SST-5) dataset was developed to address shortcomings of traditional datasets by incorporating a tree structure onto sentences and labeling phrases on a 1-5 sentiment scale.
## Methodology
We explored various machine learning and deep learning models on the SST-5 dataset, including Logistic Regression, Random Forest, Support Vector Machine, a simple neural network, LSTM, DistilBERT, and BERT. Preprocessing steps involved data splitting, tokenization, lemmatization, stop-word removal, and TF-IDF vectorization.
## Results
## Discussion
Pretrained transformer models like BERT and DistilBERT performed significantly better than traditional models, attributed to their ability to capture contextual and semantic information effectively. However, the SST-5 dataset posed challenges like frequent neutral short texts and potential annotator bias, impacting model efficacy. Oversampling also led to overfitting and amplification of noise in minority classes.
## Conclusion
Modern deep learning models, especially pretrained transformer architectures like BERT, outperformed traditional approaches for fine-grained sentiment classification on SST-5. However, dataset-specific challenges highlight the need for further optimization and research to address complexities in fine-grained sentiment analysis.
## Teammate and Project Information
Teammate: [Ali KhosraviPour](https://www.linkedin.com/in/alikhosravipour)
The project was part of the Neuromatch Academy in Deep Learning course, supervised by our Project TA [Joseph AKINYEMI](https://www.linkedin.com/in/joseph-akinyemi-66ab6481/). The goal was to compare the performance of traditional and modern sentiment analysis models on a fine-grained dataset and analyze the impact of various challenges inherent in such tasks.
## References
1. [**Cheang, B., Wei, B., Kogan, D., Qiu, H., & Ahmed, M. (2020).** "Language Representation Models for Fine-Grained Sentiment Classification." *Cornell Tech, New York, NY.*](https://arxiv.org/pdf/2005.13619)
2. [**Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013).** "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank." *Stanford University, Stanford, CA.*](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)