{"id":29100698,"url":"https://github.com/bagustris/text-vad","last_synced_at":"2025-07-17T23:35:59.585Z","repository":{"id":50624785,"uuid":"143022884","full_name":"bagustris/text-vad","owner":"bagustris","description":"VAD analysis of text using some affective lexicon (ANEW, SENTIWORDNET, and VADER)","archived":false,"fork":false,"pushed_at":"2022-03-17T00:30:40.000Z","size":2646,"stargazers_count":25,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-28T18:52:55.119Z","etag":null,"topics":["affective-computing","natural-language-processing","nlp","sentiment-analysis","text-emotion-recognition"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bagustris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-31T14:07:33.000Z","updated_at":"2025-03-26T08:06:37.000Z","dependencies_parsed_at":"2022-09-06T01:31:54.616Z","dependency_job_id":null,"html_url":"https://github.com/bagustris/text-vad","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bagustris/text-vad","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bagustris%2Ftext-vad","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bagustris%2Ftext-vad/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bagustris%2Ftext-vad/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bagustris%2Ftext-vad/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bagustris","download_url":"https://codeload.github.com/bagustris/text-vad/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bagustris%2Ftext-vad/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265678831,"owners_count":23810120,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["affective-computing","natural-language-processing","nlp","sentiment-analysis","text-emotion-recognition"],"created_at":"2025-06-28T18:38:07.500Z","updated_at":"2025-07-17T23:35:59.557Z","avatar_url":"https://github.com/bagustris.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text Emotion Recognition and Sentiment Analysis based on Affective Dictionary/Lexicon\n\n## Technical details\n### Requirement\nThis script required the following packages:\n- nltk\n- stanfordcorelp\n\nYou also need to download nltk stopword, punkt and wordnet as follow\n\n    \u003e\u003e\u003e import nltk\n    \u003e\u003e\u003e nltk.dowload('stopwords', 'punkt', 'wordnet')\n\n### Example Usage\n    $ python3.6 anew_vad_analysis.py\n    $ python3.6 sentiwordnet_analysis3.py\n    $ python3.6 vader_analysis.py\n    \nSpecify input text file (input also can be directory of text file), mode and output directory on that python files. Available mode are: 'mean', 'median', and 'mika' (Mika et al., 2016).\n\nWorking directory is `./src`, input file is targetted in `data`, and output file will be located in `out` directory. Example of file `input.txt` is included in `src` directory.\n\n### Directory structure\n```\n.\n├── stanford-corenlp-full-XXXX-XX-XX (downloaded separately)\n└── VADanalysis\n    ├── data\n    ├── lib\n    ├── out\n    └── src\n```\n### Database and VAD scale\n - IEMOCAP and Emobank database uses 1-5 scale (negative to positive) for each valence, arousal and doiminance.\n - ANEW use 1-9 scale (positive to negative) for each valence, arousal and dominance score.\n - Sentiwordnet use (-1, 1) valence/sentiment scale.\n - VADER uses (-1, 1) valence/sentiment scale (compound).\n\n## Theoritical details\nSentiment analysis is an automated task to automatically evaluate the overall sentiment evoked by a text – positive or negative. The value determining this sentiment is called valence.\n\nThere are many existing tools and models for sentiment analysis, and we have implemented and summarized some, listed below. We only focusing on valence, arousal, dominance (vad) score determination rather than categorical model as it is strongly closed to human behaviour.\n\n### ANEW\n\nANEW, short for Affective Norms for English Words, is a database of 1,034 English words that have been manually rated by many human volunteers on three affective measures: pleasure (valence), arousal (excitement), and dominance (level of control), as elicited by a particular word (Bradley \u0026 Lang 1999). In 2013, Warriner et al. expanded the database to nearly 14,000 English lemmas, and also split data by gender, age, and educational differences in raters (Warriner et al. 2013). In both databases, affective ratings are on a scale from 1 to 9, where 1 is the least pleasurable/exciting/controlling, and 9 is the most.\n\nIn our implementation, we used Warriner et al.’s expanded database, extracting the average valence, pleasant, and arousal for each word from the larger database, for word-by-word sentiment analysis.\n\nWe wrote a Python script to perform sentiment analysis with the resulting data. Given a body of text in .txt format, we first tokenized the text into sentences using the NLTK’s sentence tokenizer, and then tokenized each sentence into individual words with the NLTK’s word tokenizer, stripping out all stop words found in the NLTK’s English stop word database. For each non-stop word in each sentence, we searched for the word in the database and stored its individual valence, arousal, and dominance values.\n\nAs ANEW uses ratings from a scale of 1 (most negative) to 9 (most positive), valence values of 5 are considered neutral; values less than 5 are considered positive, and values greater than 5 are considered negative. In accordance with Hutto \u0026 Gilbert (2014)’s method for accounting for negative values, if a word in the three words prior to the word indicated negation – “not” or “no” – we reversed the polarity of that word. We did this by computing (5 – (valence – 5)) as the new valence value.\n\nAfter finding sentiment ratings for each non-stop word in each sentence, we found overall sentiment ratings for the sentence by either taking the median or the mean of the sentiment ratings for each word in that sentence, according to the method selected by the user. For each sentence, we labeled the sentence’s valence as negative if less than 5, neutral if equal to 5, and positive if greater than 5.\nWeaknesses of this approach: As this is a word-for-word approach to analyzing the sentiment of an entire sentence, the results are limited by the number of words available in ANEW.\n\n### SENTIWORDNET\nSENTIWORDNET (Baccianella et al., 2009) is a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications. The version 3.0 is an improvement of previous version 1.0. SENTIWORDNET consist of three score: positive, negative and objective neutral, while the dictionary it self only lists positive and negative score. The objective score then is calculated as, \n\n**Obj = 1 - (Pos + Neg)**\n\nIn this research, valence score for each word is subtraction of positive (Pos) by negative (Neg) score,\n\n**V = Pos - Neg**\n\nValence score for each utterance then is calculated using either mean, median or Mika method. As it used WordNet as dictionary, a sets of synonyms is used to search the similar word meaning. I used degree of synset \"0\" to take the most common of a word given its part-of-speech (POS) in WordNet. Sentence with no word found in SENTIWORDNET is categorized as neutral.\n\n### NLTK VADER\nNLTK, an abbreviation for the Natural Language Toolkit, is a robust library of Python functions for various natural language processing tasks (Bird 2006). Among its many functions is an implementation of the VADER (Valence Aware Dictionary for sEntiment Reasoning) sentiment analysis tools.\n\nVADER is a simple rule-based model for sentiment analysis for general sentiment analysis. It is most accurate for social media data, but is generalizable to other domains as well. To create the model, Hutto \u0026 Gilbert first constructed a gold-standard list of features using features from several widely used sentiments lexicons and some of their own (such as emoticons), using a wisdom-of-the-crowd approach to acquire a valid point estimate for the valence of each feature, as well as intensity ratings from Amazon Mechanical Turk workers. They also created five general heuristics for sentiment analysis of texts: punctuation, capitalization, intensifiers, the use of the contrastive conjunction but, and negation.\n\nWe implemented VADER in Python using NLTK’s VADER library. Identically to our implementation of ANEW, we tokenized texts into sentences; for each sentence, we used NLTK’s SentimentIntensityAnalyzer to obtain polarity scores for that sentence. Scores are normalized on a scale from 1 to -1, where positive values have a positive valence, 0 is neutral, and negative values have a negative valence. Sentence without word found in VADER library is also categorized as neutral.\n\n### Result\nBest concordance correlation score (CCC) for valence prediction by three approaches:\n\nDatabase : IEMOCAP\n\n|   Method  |   Valence |   Arousal |   Dominance   |\n|-----------|-----------|-----------|---------------|\n|ANEW       | 0.1632    | 0.1953    | 0.1537        |\n|SWN        | 0.0882    | -         | -             |\n|VADER      | 0.212     | -         | -             |\n\n\nDatabase : EMOBANK\n\n|   Method  |   Valence |   Arousal |   Dominance   |\n|-----------|-----------|-----------|---------------|\n|ANEW       | 0.3804    | 0.1372    | 0.1555        |\n|SWN        | 0.2120    |  -        | -             |\n|VADER      | 0.3877    |  -        | -             |\n\n\n*) ANEW and SWN scores are taken from the highest score among 'mean', 'median', and Mika method. See the accompanied paper for detail.\n\n### References\n- Bird, S. (2006, July). NLTK: the natural language toolkit. In Proceedings of the COLING/ACL\non Interactive presentation sessions (pp. 69-72). Association for Computational\nLinguistics.\n\n- Bradley, M. M., \u0026 Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction\nmanual and affective ratings (pp. 1-45). Technical report C-1, the center for research in\npsychophysiology, University of Florida.\n\n- Hutto, C. J., \u0026 Gilbert, E. (2014, May). Vader: A parsimonious rule-based model for sentiment\nanalysis of social media text. In Eighth international AAAI conference on weblogs and\nsocial media.\n\n- Warriner, A. B., Kuperman, V., \u0026 Brysbaert, M. (2013). Norms of valence, arousal, and\ndominance for 13,915 English lemmas. Behavior research methods, 45(4), 1191-1207.\n\n- Mäntylä, M., Adams, B., Destefanis, G., Graziotin, D., \u0026 Ortu, M. (2016). Mining Valence, Arousal, and Dominance - Possibilities for Detecting Burnout and Productivity? https://doi.org/10.1145/2901739.2901752\n\n- https://github.com/dwzhou/SentimentAnalysis\n\n### Citation\n```\nB.T. Atmaja, K. Shirai, M. Akagi, Deep Learning-based Categorical and\nDimensional Emotion Recognition for Written\nand Spoken Text, International Seminar on Science and Technology, Surabaya - Indonesia, \n2019.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbagustris%2Ftext-vad","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbagustris%2Ftext-vad","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbagustris%2Ftext-vad/lists"}