{"id":23671632,"url":"https://github.com/mohammad95labbaf/transformersentimentanalysis","last_synced_at":"2025-12-17T14:30:16.527Z","repository":{"id":269820078,"uuid":"907793072","full_name":"mohammad95labbaf/TransformerSentimentAnalysis","owner":"mohammad95labbaf","description":"Sentiment analysis on tweets using transformer models like BERT, RoBERTa, DistilBERT, ALBERT, and XLNet with evaluation metrics.","archived":false,"fork":false,"pushed_at":"2024-12-26T11:18:39.000Z","size":3290,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-26T12:20:21.354Z","etag":null,"topics":["albert","bert","bert-model","distilbert","large-language-model","large-language-models","llm","roberta","roberta-model","sentiment","sentiment-analysis","sentiment-classification","transformer","transformers","xlnet"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mohammad95labbaf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-24T11:59:37.000Z","updated_at":"2024-12-26T11:24:29.000Z","dependencies_parsed_at":"2024-12-26T12:30:50.915Z","dependency_job_id":null,"html_url":"https://github.com/mohammad95labbaf/TransformerSentimentAnalysis","commit_stats":null,"previous_names":["mohammad95labbaf/transformersentimentanalysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammad95labbaf%2FTransformerSentimentAnalysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammad95labbaf%2FTransformerSentimentAnalysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammad95labbaf%2FTransformerSentimentAnalysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammad95labbaf%2FTransformerSentimentAnalysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mohammad95labbaf","download_url":"https://codeload.github.com/mohammad95labbaf/TransformerSentimentAnalysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239690074,"owners_count":19681035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert","bert","bert-model","distilbert","large-language-model","large-language-models","llm","roberta","roberta-model","sentiment","sentiment-analysis","sentiment-classification","transformer","transformers","xlnet"],"created_at":"2024-12-29T10:19:25.056Z","updated_at":"2025-12-17T14:30:16.490Z","avatar_url":"https://github.com/mohammad95labbaf.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Twitter Sentiment Analysis with Transformer Models\n\nThis repository contains a Jupyter Notebook for performing sentiment analysis on a Twitter dataset using various transformer models, including BERT, RoBERTa, DistilBERT, ALBERT, and XLNet. The notebook provides a complete pipeline for loading the data, preprocessing it, training the models, and evaluating their performance.\n\n## Dataset Overview\n\nThe dataset consists of tweets related to 32 unique entities, each labeled with one of four sentiment categories:\n\n- **Negative**: Indicates unfavorable sentiment.\n- **Positive**: Indicates favorable sentiment.\n- **Neutral**: Indicates no strong sentiment.\n- **Irrelevant**: Indicates content unrelated to the target entity.\n\n### Data Columns\n- **Tweet ID**: Unique identifier for each tweet.\n- **Entity**: The subject or topic discussed in the tweet (e.g., Overwatch, PlayStation5).\n- **Sentiment**: The sentiment expressed in the tweet (Negative, Positive, Neutral, Irrelevant).\n- **Tweet Content**: The actual text of the tweet.\n\n### Data Splits\n- **Training Set**: 59,745 tweets used for training the model.\n- **Validation Set**: 14,937 tweets used for evaluating model performance.\n\n### Sentiment Distribution\n- **Negative**: 30.3% of the dataset.\n- **Positive**: 27.5% of the dataset.\n- **Neutral**: 24.6% of the dataset.\n- **Irrelevant**: 17.5% of the dataset.\n\n## Models Implemented\n\nThe notebook uses the following transformer models for sentiment analysis:\n\n- **BERT**: Bidirectional Encoder Representations from Transformers.\n- **RoBERTa**: A robustly optimized BERT pretraining approach.\n- **DistilBERT**: A smaller, faster version of BERT.\n- **ALBERT**: A lite version of BERT with fewer parameters.\n- **XLNet**: A generalized autoregressive pretraining model.\n\n## How to Use\n\n### Step 1: Load the Dataset\nThe notebook loads the dataset containing tweet IDs, entities, sentiments, and tweet content. The dataset is pre-split into training and validation sets.\n\n### Step 2: Preprocess the Data\nThe notebook includes a preprocessing step that:\n- Tokenizes the tweet text using the appropriate tokenizer for each transformer model.\n- Converts sentiment labels into numerical format for model training.\n- Pads and truncates sequences to a fixed length for consistency.\n\n### Step 3: Train the Model\nThe notebook allows you to choose from the following transformer models:\n- **BERT**\n- **RoBERTa**\n- **DistilBERT**\n- **ALBERT**\n- **XLNet**\n\nYou can fine-tune the model of your choice on the training set. The notebook provides options to adjust hyperparameters like batch size, learning rate, and number of epochs.\n\n### Step 4: Evaluate the Model\nOnce trained, the model is evaluated on the validation set. Evaluation metrics include:\n- **Accuracy**: Proportion of correct predictions.\n- **Precision, Recall, F1-Score**: For each sentiment class (Negative, Positive, Neutral, Irrelevant).\n- **Confusion Matrix**: Visualizes the model’s performance by comparing true vs. predicted labels.\n\n### Step 5: Visualize Results\nThe notebook includes visualizations, including:\n- A confusion matrix to understand misclassifications.\n- A classification report for detailed performance metrics.\n\n### Step 6: Save the Model\nAfter training, the model can be saved to disk for later use. The notebook provides a function to export the model and tokenizer for inference.\n\n```python\nmodel.save_pretrained('path_to_save_model')\ntokenizer.save_pretrained('path_to_save_model')\n```\n\n## Example Usage\n\nHere’s a simple example of how to run the notebook:\n\n1. Load and preprocess the dataset.\n2. Fine-tune a transformer model (e.g., BERT).\n3. Evaluate the model and visualize performance metrics.\n4. Save the trained model for future use.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammad95labbaf%2Ftransformersentimentanalysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohammad95labbaf%2Ftransformersentimentanalysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammad95labbaf%2Ftransformersentimentanalysis/lists"}