{"id":18058240,"url":"https://github.com/ahmedabdalkreem/sentiment-analysis","last_synced_at":"2026-04-08T18:31:28.202Z","repository":{"id":259237897,"uuid":"877293739","full_name":"AhmedAbdAlkreem/Sentiment-Analysis","owner":"AhmedAbdAlkreem","description":"This project performs sentiment analysis on a Twitter dataset, aiming to classify tweets into positive, negative, or neutral sentiments. Sentiment analysis is crucial for understanding public opinion on various topics, brands, or events based on social media data.","archived":false,"fork":false,"pushed_at":"2024-10-24T13:42:22.000Z","size":1772,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-05T11:14:30.024Z","etag":null,"topics":["bert-model","lematization","matplotlib","nlp","nltk","numpy","pandas","python3","sentiment-analysis","stopwords","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AhmedAbdAlkreem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-23T12:19:32.000Z","updated_at":"2024-10-24T13:42:26.000Z","dependencies_parsed_at":"2024-10-23T19:40:16.070Z","dependency_job_id":"a6db290b-f189-4853-843b-2327387bfd67","html_url":"https://github.com/AhmedAbdAlkreem/Sentiment-Analysis","commit_stats":null,"previous_names":["ahmedabdalkreem/sentiment-analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AhmedAbdAlkreem%2FSentiment-Analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AhmedAbdAlkreem%2FSentiment-Analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AhmedAbdAlkreem%2FSentiment-Analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AhmedAbdAlkreem%2FSentiment-Analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AhmedAbdAlkreem","download_url":"https://codeload.github.com/AhmedAbdAlkreem/Sentiment-Analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247325701,"owners_count":20920714,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-model","lematization","matplotlib","nlp","nltk","numpy","pandas","python3","sentiment-analysis","stopwords","streamlit"],"created_at":"2024-10-31T03:05:43.367Z","updated_at":"2025-12-30T23:07:14.256Z","avatar_url":"https://github.com/AhmedAbdAlkreem.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cb\u003e\u003ch1\u003eTwitter Sentiment Analysis using BERT\u003c/h1\u003e\u003c/b\u003e\n\n\u003cb\u003eProject Overview\u003c/b\u003e\n\nThis project is aimed at performing sentiment analysis on a Twitter dataset from Kaggle. The main objective is to classify tweets into positive, negative, or neutral sentiments. The project uses BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art model for NLP tasks, to achieve high accuracy.\n\nThe project also includes the preprocessing of raw text data, such as removing special characters, tokenization, lemmatization, removing stopwords, and converting text to lowercase. The model is deployed using Streamlit, enabling a user-friendly web interface for real-time tweet sentiment analysis.\nProject Features\n\n\u003cb\u003eData Preprocessing:\u003c/b\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;Removing Special Characters: Cleaning the dataset by removing unnecessary characters (e.g., hashtags, mentions, punctuation).\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;Tokenization: Splitting text into tokens (individual words or phrases).\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;Removing Stopwords: Eliminating common words that don’t contribute to sentiment (e.g., \"the\", \"is\", \"in\").\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;Lemmatization: Converting words to their base or root form.\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;Lowercasing: Converting all text to lowercase for uniformity.\n\n\n\n\u003cb\u003eSentiment Classification using BERT:\u003cb\u003e\u003cbr\u003e\n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;Fine-tuning the BERT model to predict tweet sentiment (positive, negative, or neutral).\u003cbr\u003e\n\n**Deployment:**  \n\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;A Streamlit web application is created to provide a simple interface for users to input tweets and receive sentiment predictions in real-time.\n\n\u003cb\u003eDataset\u003c/b\u003e\n\nThe dataset used in this project is obtained from Kaggle. It consists of labeled tweets for sentiment analysis, with three sentiment categories: positive, negative, and neutral.\n\n    Kaggle Dataset Link: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis\n\n\u003cb\u003eDataset Structure:\u003c/b\u003e\n\n    Text: The tweet itself.\n    Sentiment: The label indicating the sentiment (positive, negative, neutral).\n\n\u003cb\u003eTechnology Stack\u003c/b\u003e\n\n    Programming Language: Python\n    Libraries:\n        Pandas: For data manipulation and preprocessing.\n        NLTK: For text processing tasks such as stopword, removal, tokenization and lemmatization.\n        Transformers (Hugging Face): For implementing the BERT model.\n        Streamlit: For deploying the application and creating an interactive web interface.\n        Matplotlib, Seaborn: For data visualization.\n\nSetup and Installation\n\nTo run this project locally, follow these steps:\n    1. Clone the repository:\n\n    bash\n   \n    git clone https://github.com/your-username/twitter-sentiment-analysis-bert.git\n   \n    cd twitter-sentiment-analysis-bert\n\n2. Download the Dataset:\n\nDownload the dataset from Kaggle and place it in the project's data folder. You can use the Kaggle API to download it directly.\n\n    bash\n\n    kaggle datasets download -d \u003cdataset-name\u003e\n\n3. Preprocessing the Dataset:\n\nRun the preprocessing script to clean and prepare the data for training:\n\n    bash\n\n    python preprocess.py\n\nThis script performs the following tasks:\n\n    Remove special characters, URLs, and mentions.\n    Tokenize text.\n    Remove stopwords.\n    Apply lemmatization.\n    Convert text to lowercase.\n\n4. Train the BERT Model:\n\nOnce the data is preprocessed, train the BERT model on the dataset:\n\n    bash\n\n    python train.py\n\nThis script will fine-tune the pre-trained BERT model on the Twitter dataset.\n5. Run the Streamlit Web Application:\n\nAfter the model is trained, you can run the Streamlit app for real-time sentiment analysis:\n\n    bash\n\n    streamlit run app.py\n \nThis will launch a web interface where you can input tweets and receive sentiment predictions.\n\n\u003cb\u003eResults\u003c/b\u003e\n\nThe BERT model achieves 0.76% accuracy on the test set, outperforming traditional machine learning approaches due to its ability to capture the contextual meaning of words in tweets. Detailed evaluation metrics such as precision, recall, and F1-score are available in the training logs.\n \n\u003cb\u003eConclusion\u003c/b\u003e\n\nThis project demonstrates the application of state-of-the-art NLP models (BERT) for sentiment analysis on social media data, leveraging both text preprocessing and deep learning. The deployment using Streamlit provides a simple interface for real-time sentiment predictions, making it a useful tool for businesses or researchers to gauge public sentiment.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmedabdalkreem%2Fsentiment-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmedabdalkreem%2Fsentiment-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmedabdalkreem%2Fsentiment-analysis/lists"}