Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mkearney/dict
Word-Based Dictionaries for Natural Language
https://github.com/mkearney/dict
dictionaries natural-language-processing natural-language-understanding r r-package rstats rstats-package sentiment-analysis text-analysis text-mining topic-dictionaries word-dictionaries
Last synced: about 2 months ago
JSON representation
Word-Based Dictionaries for Natural Language
- Host: GitHub
- URL: https://github.com/mkearney/dict
- Owner: mkearney
- License: other
- Created: 2019-09-08T16:28:20.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-09-12T16:01:52.000Z (over 5 years ago)
- Last Synced: 2024-08-06T03:04:14.581Z (5 months ago)
- Topics: dictionaries, natural-language-processing, natural-language-understanding, r, r-package, rstats, rstats-package, sentiment-analysis, text-analysis, text-mining, topic-dictionaries, word-dictionaries
- Language: R
- Size: 239 KB
- Stars: 10
- Watchers: 4
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# dict
The goal of dict is to make it easy to create, modify, and use word-based
dictionaries for analyzing natural language texts. In other words, this package
is designed to create sentiment-analysis like tools for measuring the extent to
which natural language reflects (positively and/or negatively) any user-defined
dimension or topic (e.g., politics, sports, emotions, active, pass,
medical, philosophical, etc.).## Installation
You can install the released version of dict from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("dict")
```You can install the developmernt version of dict from [Github](https://github.com) with:
``` r
remotes::install_github("mkearney/dict")
```## Example
Create vectors of positively and negatively identifying words for a simple
"valence" dimension.```{r}
## load package
library(dict)## vector of positive words
pos <- c("like", "love", "amazing", "excellent", "great", "fantastic",
"incredible", "awesome", "best", "favorite", "fan", "fun", "enjoyed",
"good", "solid", "better", "soooo", "happy")## vetor of negative words
neg <- c("hate", "loathe", "dislike", "awful", "horrible", "worst",
"miserable", "ruin", "ruining", "destroy", "destroyed", "destroying",
"pathetic", "hated", "unhappy", "terrible")
```## Create dictionaries
Create a dictionary using only positively-coded words or both positively and negatively coded words.
```{r}
## create dictionary using only positive words
only_pos <- dict(pos)## create dictionary using both positive and negative words
d <- dict(list(pos = pos, neg = neg))## view up to n entries of dictionary
print(d, n = 15)
```## Use word dictionary
Apply a dictionary to some example text:
```{r}
## example text
txt <- c("love amazing excellent good",
"hate loathe horrifies unhappy terrible",
"awesome best hateful hated worst")## get estimates for each element of txt using pos/neg dictionary
d$score(txt)## store only the overall score in a tibble
tibble::tibble(
text = txt,
score = d$score_score(txt)
)
```## Export dictionary via R package
Export word dictionaries as super fast packages using this wrapper around `usethis::create_package()`
```{r}
## create package path via temp directory
path_pkg <- file.path(tempdir(), "simpleexample")## create R package featuring d
create_dict_pkg(d, path_pkg)## test new package's score function on txt vector
simpleexample::score(txt)
```Compare the speed of the default returned function (written in R) versus the optimized version in the standalone package (written in C)
```{r}
## analyzeSentiment() won't work unless you load the whole library
library(SentimentAnalysis)## compare speed
bm <- bench::mark(
SentimentAnalysis = analyzeSentiment(txt),
syuzhet = syuzhet::get_sentiment(txt),
dict_fun = d$score(txt),
dict_pkg = simpleexample::score(txt),
relative = TRUE,
check = FALSE,
iterations = 30
)## view results
bm## view plot
ggplot2::autoplot(bm)
```## TO DO
See [issues labelled enhancement](https://github.com/mkearney/dict/labels/enhancement).