{"id":23097166,"url":"https://github.com/ihabbendidi/sentiment_embeddings","last_synced_at":"2025-10-05T03:22:35.671Z","repository":{"id":119867566,"uuid":"319336755","full_name":"IhabBendidi/sentiment_embeddings","owner":"IhabBendidi","description":"A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets","archived":false,"fork":false,"pushed_at":"2020-12-14T05:31:32.000Z","size":56615,"stargazers_count":13,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-16T13:01:15.566Z","etag":null,"topics":["3d-visualization","benchmark","bert","colab","doc2vec","embedding-evaluation","keras","logistic-regression","lstm","nlp","notebook","python","pytorch","sentiment-analysis","sentiment-embeddings","textblob","twitter-data","visualization","word2vec"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IhabBendidi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-12-07T13:59:09.000Z","updated_at":"2024-10-10T13:16:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"a21ba830-d318-4c12-858a-4f7472ff4836","html_url":"https://github.com/IhabBendidi/sentiment_embeddings","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/IhabBendidi/sentiment_embeddings","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IhabBendidi%2Fsentiment_embeddings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IhabBendidi%2Fsentiment_embeddings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IhabBendidi%2Fsentiment_embeddings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IhabBendidi%2Fsentiment_embeddings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IhabBendidi","download_url":"https://codeload.github.com/IhabBendidi/sentiment_embeddings/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IhabBendidi%2Fsentiment_embeddings/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278403544,"owners_count":25981076,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-visualization","benchmark","bert","colab","doc2vec","embedding-evaluation","keras","logistic-regression","lstm","nlp","notebook","python","pytorch","sentiment-analysis","sentiment-embeddings","textblob","twitter-data","visualization","word2vec"],"created_at":"2024-12-16T22:56:47.137Z","updated_at":"2025-10-05T03:22:35.665Z","avatar_url":"https://github.com/IhabBendidi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sentiment Analysis Benchmark\n## A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb)\n[![GitHub license](https://img.shields.io/github/license/Naereen/StrapDown.js.svg)](https://github.com/IhabBendidi/sentiment_embeddings/blob/master/LICENSE)\n\n**Authors :** *Ihab Bendidi*, *Yousra Bourkiche*, *Clément Siegrist*, *Kaouter Berrahal*\n\nIn general, documents with similar sentiments, would be close to each other in the embeddings feature space. This can become another method to judge the performance of sentiment analysis models.\n\nIn this work, we aim to perform a benchmark of recent sentiment analysis works and models, reproduce their results, and judge their performance in comparison to baseline methods.\n\n## Outline \n\nThe following work in made on a jupyter notebook, that you can find [here](https://github.com/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb), or open in Colab [here](https://colab.research.google.com/github/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.ipynb).\n\n**I - Processing \u0026 Exploratory Data Analysis**\n- *Understanding the data*\n- *Text Preprocessing*\n\n**II - Sentiment classification models**\n- *Bert Model*\n- *LSTM recurrent model*\n- *Baseline method : textblob*\n\n**III - Document Embeddings**\n- *Training doc2vec*\n- *Doc2vec sentiment classifier*\n\n**IV - Model performance visualisation**\n- *Bert model*\n- *LSTM model*\n- *Logreg model*\n- *Textblob*\n\nYou can also find `.pdf`report with code [here](https://github.com/IhabBendidi/sentiment_embeddings/blob/main/sentiment_embeddings.pdf).\n\n### Installation\n\nThis was tested on Ubuntu 20.04 with Python 3.7, but should run on any device and any python 3 version.\n\nBefore running it, make sure to install dependencies, by running in terminal :\n\n```\npip install -r requirements.txt\n```\n\nOn Google colab, you would need to import the `requirements.txt` file, and the `tweets.csv` dataset to your colab session.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihabbendidi%2Fsentiment_embeddings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fihabbendidi%2Fsentiment_embeddings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihabbendidi%2Fsentiment_embeddings/lists"}