Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mkearney/dict

Word-Based Dictionaries for Natural Language
https://github.com/mkearney/dict

dictionaries natural-language-processing natural-language-understanding r r-package rstats rstats-package sentiment-analysis text-analysis text-mining topic-dictionaries word-dictionaries

Last synced: about 2 months ago
JSON representation

Word-Based Dictionaries for Natural Language

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# dict

The goal of dict is to make it easy to create, modify, and use word-based
dictionaries for analyzing natural language texts. In other words, this package
is designed to create sentiment-analysis like tools for measuring the extent to
which natural language reflects (positively and/or negatively) any user-defined
dimension or topic (e.g., politics, sports, emotions, active, pass,
medical, philosophical, etc.).

## Installation

You can install the released version of dict from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("dict")
```

You can install the developmernt version of dict from [Github](https://github.com) with:

``` r
remotes::install_github("mkearney/dict")
```

## Example

Create vectors of positively and negatively identifying words for a simple
"valence" dimension.

```{r}
## load package
library(dict)

## vector of positive words
pos <- c("like", "love", "amazing", "excellent", "great", "fantastic",
"incredible", "awesome", "best", "favorite", "fan", "fun", "enjoyed",
"good", "solid", "better", "soooo", "happy")

## vetor of negative words
neg <- c("hate", "loathe", "dislike", "awful", "horrible", "worst",
"miserable", "ruin", "ruining", "destroy", "destroyed", "destroying",
"pathetic", "hated", "unhappy", "terrible")
```

## Create dictionaries

Create a dictionary using only positively-coded words or both positively and negatively coded words.

```{r}
## create dictionary using only positive words
only_pos <- dict(pos)

## create dictionary using both positive and negative words
d <- dict(list(pos = pos, neg = neg))

## view up to n entries of dictionary
print(d, n = 15)
```

## Use word dictionary

Apply a dictionary to some example text:

```{r}
## example text
txt <- c("love amazing excellent good",
"hate loathe horrifies unhappy terrible",
"awesome best hateful hated worst")

## get estimates for each element of txt using pos/neg dictionary
d$score(txt)

## store only the overall score in a tibble
tibble::tibble(
text = txt,
score = d$score_score(txt)
)
```

## Export dictionary via R package

Export word dictionaries as super fast packages using this wrapper around `usethis::create_package()`

```{r}
## create package path via temp directory
path_pkg <- file.path(tempdir(), "simpleexample")

## create R package featuring d
create_dict_pkg(d, path_pkg)

## test new package's score function on txt vector
simpleexample::score(txt)
```

Compare the speed of the default returned function (written in R) versus the optimized version in the standalone package (written in C)

```{r}
## analyzeSentiment() won't work unless you load the whole library
library(SentimentAnalysis)

## compare speed
bm <- bench::mark(
SentimentAnalysis = analyzeSentiment(txt),
syuzhet = syuzhet::get_sentiment(txt),
dict_fun = d$score(txt),
dict_pkg = simpleexample::score(txt),
relative = TRUE,
check = FALSE,
iterations = 30
)

## view results
bm

## view plot
ggplot2::autoplot(bm)
```

## TO DO

See [issues labelled enhancement](https://github.com/mkearney/dict/labels/enhancement).