Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-sentiment-analysis

Repository with all what is necessary for sentiment analysis and related areas
https://github.com/laugustyniak/awesome-sentiment-analysis

Last synced: 4 days ago
JSON representation

  • Libraries

      • Python, Spacy - Industrial-Strength Natural Language Processing in Python, one of the best and the fastest libs for NLP. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.
      • Python, TextBlob - TextBlob allows you to specify which algorithms you want to use under the hood of its simple API.
      • Python, pattern - The pattern.en module contains a fast part-of-speech tagger for English (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.
      • R, TM - R text mining module including tm.plugin.sentiment.
      • Software, GATE - GATE is open source software capable of solving almost any text processing problem.
      • JAVA, OpenNLP - The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
      • sentiment: Tools for Sentiment Analysis in R - sentiment is an R package with tools for sentiment analysis including bayesian classifiers for positivity/negativity and emotion classification.
      • ASUM Java - Aspect and Sentiment Unification Model for Online Review Analysis.
      • Python, Textlytics - set of sentiment analysis examples based on Amazon Data, SemEval, IMDB etc.
      • Java, Polish Sentiment Model - Sentiment analysis for polish language using SVM and BoW - within Docker.
      • Software, RapidMiner - software capable of solving almost any text processing problem. processing text using computational linguistics.
      • Software, KNIME - KNIME® Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. Our enterprise-grade, open source platform is fast to deploy, easy to scale and intuitive to learn. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. Our steady course on unrestricted open source is your passport to a global community of data scientists, their expertise, and their active contributions.
    • Aspect-based Sentiment Analysis

  • Resources

    • Lexicons

      • Multidomain Sentiment Lexicons - lexicons from 10 domains based on Amazon Product Dataset extracted using method described in
      • SentiWordNet
      • [paper - Lexical resource based on WordNet
      • SentiWords - Collection of 155,000 English words with a sentiment score included between -1 and 1. Words are in the form lemma#PoS and are aligned with WordNet lists that include adjectives, nouns, verbs and adverbs.
      • SenticNet
      • [API - Words with a sentiment score included between -1 and 1.
      • WordStat - Context-specific sentiment analysis dictionary with categories Negative, Positive, Uncertainty, Litigiousness and Modal. This dataset is inspired from two papers, written by Loughran and McDonald (2011) and Young and Soroka (2011).
      • MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon - The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon is a list of subjectivity clues that is part of [OpinionFinder](http://mpqa.cs.pitt.edu/opinionfinder/opinionfinder_2/) and also helps to determine text polarity.
      • NRC-Canada Lexicons - the web page lists various word association lexicons that capture word-sentiment, word-emotion, and word-colour associations.
      • Sentiment140 - One of the NRC-Canada team lexicon - the Sentiment140 Lexicon is a list of words and their associations with positive and negative sentiment. The lexicon is provides sentiment score for unigrams, bigrams and unigram-bigram pairs.
      • MSOL - Macquarie Semantic Orientation Lexicon.
      • SemEval-2015 English Twitter Sentiment Lexicon - The lexicon was used as an official test set in the [SemEval-2015 shared Task #10: Subtask E](http://alt.qcri.org/semeval2015/task10/). The phrases in this lexicon include at least one of these [negators](http://saifmohammad.com/WebDocs/lexiconstoreleaseonsclpage/SemEval2015-English-negators.txt).
      • SemEval-2016 Arabic Twitter Sentiment Lexicon - The lexicon was used as an official test set in the [SemEval-2016 shared Task #7: Detecting Sentiment Intensity of English and Arabic Phrases](http://alt.qcri.org/semeval2016/task7/). The phrases in this lexicon include at least one of these [negators](http://saifmohammad.com/WebDocs/list-Arabic-negators.txt).
      • SemEval-2016 English Twitter Mixed Polarity Lexicon - This SCL, referred to as the Sentiment Composition Lexicon of Opposing Polarity Phrases (SCL-OPP), includes phrases that have at least one positive and at least one negative word—for example, phrases such as happy accident, best winter break, couldn’t stop smiling, and lazy sundays. We refer to such phrases as opposing polarity phrases. SCL-OPP has 265 trigrams, 311 bigrams, and 602 unigrams annotated with real-valued sentiment association scores through Best-Worst scaling (aka MaxDiff).
      • The NRC Valence, Arousal, and Dominance Lexicon - The NRC Valence, Arousal, and Dominance (VAD) Lexicon includes a list of more than 20,000 English words and their valence, arousal, and dominance scores. For a given word and a dimension (V/A/D), the scores range from 0 (lowest V/A/D) to 1 (highest V/A/D). The lexicon with its fine-grained real-valued scores was created by manual annotation using Best--Worst Scaling.
      • EmoLex NRC Word-Emotion Association Lexicon - the NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.
      • WN-Affect emotion Lexicon - WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words. Similarly to our method for domain labels, we assigned to a number of WordNet synsets one or more affective labels (a-labels). In particular, the affective concepts representing emotional state are individuated by synsets marked with the a-label emotion. There are also other a-labels for those concepts representing moods, situations eliciting emotions, or emotional responses.
      • SemEval-2016 General English Sentiment Modifiers Lexicon - Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA). Negators, modals, and degree adverbs can significantly affect the sentiment of the words they modify. We manually annotate a set of phrases that include negators (such as no and cannot), modals (such as would have been and could), degree adverbs (such as quite and less), and their combinations. Both the phrases and their constituent content words are annotated with real-valued scores of sentiment intensity using the technique Best–Worst Scaling (aka MaxDiff), which provides reliable annotations. We refer to the resulting lexicon as Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA). The lexicon was used as an official test set in the [SemEval-2016 shared Task #7: Detecting Sentiment Intensity of English and Arabic Phrases](http://alt.qcri.org/semeval2016/task7/). The objective of that task was to automatically predict sentiment intensity scores for multi-word phrases.
    • Datasets

      • Amazon Product Dataset - This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). The updated version of dataset - update as for 2018 is availalbe here [https://nijianmo.github.io/amazon/index.html](https://nijianmo.github.io/amazon/index.html).
      • IMDB Movies Reviews Dataset - This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Authors provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.
      • paper
    • Word Embeddings

      • WordNet2Vec - Corpora Agnostic Word Vectorization Method based on WordNet.
    • Pretrained Language Models

    • International Workshops

  • Papers

  • Tutorials

  • Books

    • Lexicon-based Ensembles

      • Sentiment Analysis: mining sentiments, opinions, and emotions - This book is suitable for students, researchers, and practitioners interested in natural language processing in general, and sentiment analysis, opinion mining, emotion analysis, debate analysis, and intention mining in specific. Lecturers can use the book in class.
  • Demos

  • API