https://github.com/swamikannan/sentiment-analysis-using-huggingface-transformers

Building a Sentiment Analysis Machine usine Transformers
https://github.com/swamikannan/sentiment-analysis-using-huggingface-transformers

bert-embeddings bert-model eda exploratory-data-analysis kaggle keras rotten-tomatoes sentiment-analysis tensorflow2 tokenizer

Last synced: 24 days ago
JSON representation

Building a Sentiment Analysis Machine usine Transformers

Host: GitHub
URL: https://github.com/swamikannan/sentiment-analysis-using-huggingface-transformers
Owner: SwamiKannan
License: mit
Created: 2022-08-31T05:49:37.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-10-25T14:57:22.000Z (over 2 years ago)
Last Synced: 2025-03-03T16:48:46.668Z (4 months ago)
Topics: bert-embeddings, bert-model, eda, exploratory-data-analysis, kaggle, keras, rotten-tomatoes, sentiment-analysis, tensorflow2, tokenizer
Language: Jupyter Notebook
Homepage:
Size: 1.43 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: Readme.md
- License: license

Awesome Lists containing this project

README

# Transformers
Transformers have been exciting development in Deep Learning starting with the "Attention is all you need" paper by Ashish Vaswani, et. al. It maximally exploits any set of data where there are correlations between two data points such as sequence models and vision

# Sentiment Analysis on Movie Reviews

## Data:
"There's a thin line between likably old-fashioned and fuddy-duddy, and The Count of Monte Cristo ... never quite settles on either side."
The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. This competition presents a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.

## Data reference:
https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/
## Attributes :
The dataset is comprised of tab-separated files with phrases from the Rotten Tomatoes dataset. The train/test split has been preserved for the purposes of benchmarking, but the sentences have been shuffled from their original order. Each Sentence has been parsed into many phrases by the Stanford parser. Each phrase has a PhraseId. Each sentence has a SentenceId. Phrases that are repeated (such as short/common words) are only included once in the data.
train.tsv contains the phrases and their associated sentiment labels. We have additionally provided a SentenceId so that you can track which phrases belong to a single sentence.
test.tsv contains just phrases. You must assign a sentiment label to each phrase.
The sentiment labels are:
0 - negative
1 - somewhat negative
2 - neutral
3 - somewhat positive
4 – positive

## Key asks:
• Assign a sentiment label to each phrase in the test.tsv file

## Model