https://github.com/wrathematics/meanr

A sentiment analysis package for R.
https://github.com/wrathematics/meanr

Last synced: about 2 months ago
JSON representation

A sentiment analysis package for R.

Host: GitHub
URL: https://github.com/wrathematics/meanr
Owner: wrathematics
License: other
Created: 2016-12-01T20:47:13.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2023-12-10T18:49:04.000Z (over 1 year ago)
Last Synced: 2024-08-10T10:38:07.580Z (10 months ago)
Language: C
Size: 513 KB
Stars: 22
Watchers: 2
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog
- License: LICENSE

Awesome Lists containing this project

jimsghstars - wrathematics/meanr - A sentiment analysis package for R. (C)

README

        # meanr

* **Version:** 0.1-5

* **URL**: https://github.com/wrathematics/meanr

* **License:** [BSD 2-Clause](https://opensource.org/license/bsd-2-clause/)

* **Author:** Drew Schmidt

**meanr** is an R package performing sentiment analysis.  Its main method, `score()`, computes sentiment as a simple sum of the counts of positive (+1) and negative (-1) sentiment words in a piece of text.  More sophisticated techniques are available to R, for example in the **qdap** package's `polarity()` function.  This package uses [the Hu and Liu sentiment dictionary](https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html), same as everybody else.

**meanr** is significantly faster than everything else I tried (which was actually the motivation for its creation), but I don't claim to have tried everything.  I believe the package is quite fast.  However, the method is merely a dictionary lookup, so it ignores word context like in more sophisticated methods.  On the other hand, the more sophisticated tools are very slow.  If you have a large volume of text, I believe there is value in getting a "first glance" at the data, and **meanr** allows you to do this very quickly.

## Installation

The stable version is available on CRAN:

```r

install.packages("meanr")

``` 

The development version is maintained on GitHub:

```r

remotes::install_github("wrathematics/meanr")

```

## Example Usage

I have a dataset that, for legal reasons, I can not describe, much less provide.  You can think of it like a collection of tweets (they are not tweets).  But take my word for it that it's real, English language text.  The data is in the form of a vector of strings, which we'll call `x`.

```r

x = readRDS("x.rds")

length(x)

## [1] 655760

sum(nchar(x))

## [1] 162663972

library(meanr)

system.time(s <- score(x))

##  user  system elapsed 

## 1.072   0.000   0.285 

head(s)

##   positive negative score  wc

## 1        2        0     2  32

## 2        5        0     5  29

## 3        4        2     2  67

## 4       12        3     9 203

## 5        8        2     6 101

## 6        4        3     1  99

```

## How It Works

The `score()` function receives a vector of strings, and operates on each one as follows:

1. The maximum string length is found, and a buffer of that size is allocated.

2. The string is copied to the buffer.

3. All punctuation is removed. All characters are converted to lowercase.

4. Score sentiment:

    - Tokenize words as collections of chars separated by a space.

    - Check if the word is positive; if not, check if it is negative; if not, then it's assumed to be neutral.  Each check is a lookup up in one of two tables of Hu and Liu's dictionaries.

    - If the word is in the table, get its value from the hash table (positive words have value 1, negative words -1) and update the various counts.  Otherwise, the word is "neutral" (score of 0).

This is all done in four passes of each string; each pass corresponds to each of the enumerated items above.  The hash tables uses perfect hash functions generated by gperf.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wrathematics/meanr

Awesome Lists containing this project

README