{"id":29625326,"url":"https://github.com/leo-capvano/sentiment_analysis_benchmarking","last_synced_at":"2026-05-15T21:37:54.208Z","repository":{"id":222537695,"uuid":"434505774","full_name":"leo-capvano/sentiment_analysis_benchmarking","owner":"leo-capvano","description":"A benchmark comparison project among the most popular sentiment analysis engines: VaderSentiment, TextBlob, Azure Text Analysis and Amazon Comprehend. The benchmarker is a python module that supports 3 datasets: IMDb, Sentiment140 and Twitter.","archived":false,"fork":false,"pushed_at":"2022-01-10T19:49:43.000Z","size":3488,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-04-23T11:16:09.636Z","etag":null,"topics":["amazon-comprehend","aws","azurecognitiveservices","dataset","machine-learning","natural-language-processing","nlp","sentiment-analysis","textblob-sentiment-analysis","vader-sentiment-analysis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leo-capvano.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-12-03T07:32:59.000Z","updated_at":"2022-01-13T21:05:24.000Z","dependencies_parsed_at":"2024-02-14T20:40:02.139Z","dependency_job_id":null,"html_url":"https://github.com/leo-capvano/sentiment_analysis_benchmarking","commit_stats":null,"previous_names":["leo-capvano/sentiment_analysis_benchmarking"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/leo-capvano/sentiment_analysis_benchmarking","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-capvano%2Fsentiment_analysis_benchmarking","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-capvano%2Fsentiment_analysis_benchmarking/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-capvano%2Fsentiment_analysis_benchmarking/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-capvano%2Fsentiment_analysis_benchmarking/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leo-capvano","download_url":"https://codeload.github.com/leo-capvano/sentiment_analysis_benchmarking/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leo-capvano%2Fsentiment_analysis_benchmarking/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266253846,"owners_count":23900056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon-comprehend","aws","azurecognitiveservices","dataset","machine-learning","natural-language-processing","nlp","sentiment-analysis","textblob-sentiment-analysis","vader-sentiment-analysis"],"created_at":"2025-07-21T06:35:43.812Z","updated_at":"2026-05-15T21:37:54.140Z","avatar_url":"https://github.com/leo-capvano.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sentiment analysis tools benchmarking\n\n## Description\n\n```\n The sentiment analysis is the use of natural language processing, text analysis, \n computational linguistics, and biometrics to systematically identify, extract,\n quantify, and study affective states and subjective information.\n \n The aim of this project is to measure the accuracy of the most popular sentiment analysis tools.\n```\n\nThe benchmark test was carried out by using the below described datasets:\n\n- IMDb Dataset: containing a list of movie review divided into negative and positive statements. It contains the most\n  25k popular reviews. [source](http://ai.stanford.edu/~amaas/data/sentiment/)\n- Twitter US Airlines Dataset: containing the tweets about each of the major US airlines since Feb 2015. Each tweet il\n  classified positive, negative or\n  neutral. [source](https://www.kaggle.com/crowdflower/twitter-airline-sentiment/version/4)\n- Sentiment140 dataset: this dataset contains various tweets record that includes polarity, date, and the tweet\n  text [source](http://help.sentiment140.com/for-students)\n\nThe minimum information contained by each record of those datasets are:\n\n- the text to analyze\n- the related correct sentiment\n\n## Overview:\n\nThe project has one entrypoint file: **make_benchmark.py**. It has dedicated commands to execute a benchmark with a\nspecific dataset and a specific sentiment analysis tool. The entrypoint will then call two of the underlying layers:\n\n1. **dataset reader**: a module that reads the dataset and pre-process/standardizes the data to adapt them to the\n   sentiment analysis core procedure\n2. **sentiment matcher**: is the sentiment analysis core procedure which acts as result post-process; it understands the\n   results and standardizes them to the caller in order to calculate the sentiment hit or sentiment miss. Each sentiment\n   matcher knows the correct sentiment for each record; this allows to calculate a sentiment hit or sentiment miss.\n\n### Commands\n\n**-dataset**: allow you to select one of the supported dataset to execute the benchmark. The possibile values are:\n\n- _-imdb_: to select the imdb dataset\n- _-twitter_: to select the twitter dataset\n- _-sentiment140_: to select the sentiment140 dataset\n\n**-tool**: allow you to specify the tool to use to execute the sentiment analysis prediction/benchmark\n\n- _-vader_: to select vader as sentiment analysis executor tool\n- _-textblob_: to select textblob as sentiment analysis executor tool\n- _-azure_: to select azure text-language-engine as sentiment analysis executor tool\n- _-aws_: to select amazon comprehend as sentiment analysis executor tool\n\nAn example of execution with azure text-language-engine as sentiment analysis engine and the sentiment140 dataset as\ninput of the benchmark:\n\n```\npython make_benchmark.py -dataset sentiment140_dataset.csv -tool azure\n```\n\nAn example of response that shows the number of sentiment hit and sentiment miss:\n\n```\n.\n...\n.....\npositives 739 ### negatives 574 ### neutrals 1039\npositives 739 ### negatives 574 ### neutrals 1040\npositives 739 ### negatives 574 ### neutrals 1041\npositives 739 ### negatives 574 ### neutrals 1041\nhits: 523, analyzed rows: 2470\n```\n\n# Usage Examples\nTo run a benchmark of the **vader** tool given **imdb sentences dataset** as input:\n```\npython .\\make_benchmark.py -dataset imdb -tool vader\n```\nTo run a benchmark of the **textblob** tool given **imdb sentences dataset** as input:\n```\npython .\\make_benchmark.py -dataset imdb -tool textblob\n```\nTo run a benchmark of the **textblob** tool given **sentiment140 dataset** as input:\n```\npython .\\make_benchmark.py -dataset sentiment140 -tool textblob\n```\nTo run a benchmark of the **aws** tool given **sentiment140 dataset** as input:\n```\npython .\\make_benchmark.py -dataset sentiment140 -tool aws\n```\nTo run a benchmark of the **azure** tool given **twitter dataset** as input:\n```\npython .\\make_benchmark.py -dataset twitter -tool azure\n```\n\n## How to configure credential for AWS and Azure services\n### AWS\n1. Create _.aws_ folder under C:/Users/_user_/\n2. Create the file .aws/credentials\n3. The credentials file will contain:\n```\n[default]\naws_access_key_id = ...\naws_secret_access_key = ...\n```\n4. the AWS comprehend client automatically reads the credentials file to open the connection\n\n### Azure\nThe azure_sentiment_matcher.py module reads the Azure endpoint key from an environment\nvariable called _AZURE_KEY_, so:\n1. Get the endpoint key from the Azure resource manager portal\n2. Create the _AZURE_KEY_ environment variable and assign it the key value\n3. You are ready to run the azure_sentiment_matcher.py\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleo-capvano%2Fsentiment_analysis_benchmarking","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleo-capvano%2Fsentiment_analysis_benchmarking","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleo-capvano%2Fsentiment_analysis_benchmarking/lists"}