Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mkearney/dict

Word-Based Dictionaries for Natural Language
https://github.com/mkearney/dict

dictionaries natural-language-processing natural-language-understanding r r-package rstats rstats-package sentiment-analysis text-analysis text-mining topic-dictionaries word-dictionaries

Last synced: 3 months ago
JSON representation

Word-Based Dictionaries for Natural Language

Host: GitHub
URL: https://github.com/mkearney/dict
Owner: mkearney
License: other
Created: 2019-09-08T16:28:20.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-09-12T16:01:52.000Z (over 5 years ago)
Last Synced: 2024-08-06T03:04:14.581Z (7 months ago)
Topics: dictionaries, natural-language-processing, natural-language-understanding, r, r-package, rstats, rstats-package, sentiment-analysis, text-analysis, text-mining, topic-dictionaries, word-dictionaries
Language: R
Size: 239 KB
Stars: 10
Watchers: 4
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# dict

The goal of dict is to make it easy to create, modify, and use word-based 

dictionaries for analyzing natural language texts. In other words, this package

is designed to create sentiment-analysis like tools for measuring the extent to

which natural language reflects (positively and/or negatively) any user-defined 

dimension or topic (e.g., politics, sports, emotions, active, pass, 

medical, philosophical, etc.).

## Installation

You can install the released version of dict from [CRAN](https://CRAN.R-project.org) with:

``` r

install.packages("dict")

```

You can install the developmernt version of dict from [Github](https://github.com) with:

``` r

remotes::install_github("mkearney/dict")

```

## Example

Create vectors of positively and negatively identifying words for a simple 

"valence" dimension.

```{r}

## load package

library(dict)

## vector of positive words

pos <- c("like", "love", "amazing", "excellent", "great", "fantastic",

  "incredible", "awesome", "best", "favorite", "fan", "fun", "enjoyed",

  "good", "solid", "better", "soooo", "happy")

## vetor of negative words

neg <- c("hate", "loathe", "dislike", "awful", "horrible", "worst",

  "miserable", "ruin", "ruining", "destroy", "destroyed", "destroying",

  "pathetic", "hated", "unhappy", "terrible")

```

## Create dictionaries

Create a dictionary using only positively-coded words or both positively and negatively coded words.

```{r}

## create dictionary using only positive words

only_pos <- dict(pos)

## create dictionary using both positive and negative words

d <- dict(list(pos = pos, neg = neg))

## view up to n entries of dictionary

print(d, n = 15)

```

## Use word dictionary

Apply a dictionary to some example text:

```{r}

## example text

txt <- c("love amazing excellent good",

  "hate loathe horrifies unhappy terrible",

  "awesome best hateful hated worst")

## get estimates for each element of txt using pos/neg dictionary

d$score(txt)

## store only the overall score in a tibble

tibble::tibble(

  text  = txt,

  score = d$score_score(txt)

)

```

## Export dictionary via R package

Export word dictionaries as super fast packages using this wrapper around `usethis::create_package()`

```{r}

## create package path via temp directory

path_pkg <- file.path(tempdir(), "simpleexample")

## create R package featuring d

create_dict_pkg(d, path_pkg)

## test new package's score function on txt vector

simpleexample::score(txt)

```

Compare the speed of the default returned function (written in R) versus the optimized version in the standalone package (written in C)

```{r}

## analyzeSentiment() won't work unless you load the whole library

library(SentimentAnalysis)

## compare speed

bm <- bench::mark(

  SentimentAnalysis = analyzeSentiment(txt),

  syuzhet = syuzhet::get_sentiment(txt),

  dict_fun = d$score(txt),

  dict_pkg = simpleexample::score(txt),

  relative = TRUE,

  check = FALSE,

  iterations = 30

)

## view results

bm

## view plot

ggplot2::autoplot(bm)

```

## TO DO

See [issues labelled enhancement](https://github.com/mkearney/dict/labels/enhancement).