Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/BrianWeinstein/googlenlp

An Interface to Google's Cloud Natural Language API
https://github.com/BrianWeinstein/googlenlp

api cran google-cloud-platform nlp r

Last synced: about 1 month ago
JSON representation

An Interface to Google's Cloud Natural Language API

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```

googlenlp
---

[![Travis-CI Build Status](https://travis-ci.org/BrianWeinstein/googlenlp.svg?branch=master)](https://travis-ci.org/BrianWeinstein/googlenlp) [![CRAN status](http://www.r-pkg.org/badges/version/googlenlp)](https://cran.r-project.org/package=googlenlp) [![Download count](https://cranlogs.r-pkg.org/badges/googlenlp)](https://cran.r-project.org/package=googlenlp)

---

The googlenlp package provides an R interface to Google's [Cloud Natural Language API](https://cloud.google.com/natural-language/).

"Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to **extract information** about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to **understand sentiment** about your product on social media or **parse intent** from customer conversations happening in a call center or a messaging app." [[source](https://cloud.google.com/natural-language/)]

There are four main features of the API, all of which are available through this R package [[source](https://cloud.google.com/natural-language/)]:

* **Syntax Analysis:** "Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence."
* **Entity Analysis:** "Identify entities and label by types such as person, organization, location, events, products and media."
* **Sentiment Analysis:** "Understand the overall sentiment expressed in a block of text."
* **Multi-Language:** "Enables you to easily analyze text in multiple languages including English, Spanish and Japanese."

### Resources

* [API Documentation](https://cloud.google.com/natural-language/docs/)
* [Natural Language API Basics](https://cloud.google.com/natural-language/docs/basics)
* [Morphology & Dependency Trees](https://cloud.google.com/natural-language/docs/morphology)

### Installation

The current googlenlp release can be installed from CRAN:

```{r eval = FALSE}
install.packages("googlenlp")
```

The newest development release can be installed from GitHub:

```{r eval = FALSE}
# install.packages('devtools')
devtools::install_github("BrianWeinstein/googlenlp")
```

### Authentication

To use the API, you'll first need to [create a Google Cloud project and enable billing](https://cloud.google.com/natural-language/docs/getting-started), and get an [API key](https://cloud.google.com/natural-language/docs/common/auth).

### Configuration

Load the package and set your API key. There are two ways to do this.

#### Method A (preferred)
Method A (preferred method) adds your API key as a variable to your `.Renviron` file. Under this method, you only need to do this setup process one time.
```{r eval = FALSE}
library(googlenlp)

configure_googlenlp() # follow the instructions printed to the console
```
```
googlenlp setup instructions:
1. Your ~/.Renviron file will now open in a new window/tab.
*** If it doesn't open, run: file.edit("~/.Renviron") ***
2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).
3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).
4. In your ~/.Renviron file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.
5. Save your ~/.Renviron file.
6. *** Restart your R session for changes to take effect. ***
```

#### Method B
Method B defines your API key as a session-level variable. Under this method, you'll need to set your API key at the beginning of each R session.
```{r eval = FALSE}
library(googlenlp)

set_api_key("MY_API_KEY") # replace this with your API key
```

### Getting started

```{r eval = TRUE, include = FALSE}
devtools::load_all("~/Documents/googlenlp")
library(dplyr)
```

Define the text you'd like to analyze.
```{r eval = TRUE}
text <- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.
Sundar Pichai said in his keynote that users love their new Android phones."
```

The `annotate_text` function analyzes the text's syntax (sentences and tokens), entities, sentiment, and language; and returns the result as a five-element list.

```{r eval = TRUE}
analyzed <- annotate_text(text_body = text)

str(analyzed, max.level = 1)
```

#### Sentences

"Sentence extraction breaks up the stream of text into a series of sentences." [[API Documentation](https://cloud.google.com/natural-language/docs/basics#sentence-extraction)]

* `beginOffset` indicates the (zero-based) character index of where the sentence begins (wtih UTF-8 encoding).
* The `magnitude` and `score` fields quantify each sentence's sentiment — see the [Document Sentiment](#document-sentiment) section for more details.

```{r eval = FALSE}
analyzed$sentences
```
```{r eval = TRUE, echo = FALSE}
knitr::kable(analyzed$sentences, format = "markdown")
```

#### Tokens

"Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word.
The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens." [[API Documentation](https://cloud.google.com/natural-language/docs/basics#tokenization)]

* `lemma` indicates the token's "root" word, and can be useful in standardizing the word within the text.
* `tag` indicates the token's part of speech.
* Additional column definitions are outlined [here](https://cloud.google.com/natural-language/docs/basics#tokenization) and [here](https://cloud.google.com/natural-language/docs/morphology#parts_of_speech).

```{r eval = FALSE}
analyzed$tokens
```
```{r eval = TRUE, echo = FALSE}
knitr::kable(analyzed$tokens, format = "markdown")
```

#### Entities

"Entity Analysis provides information about entities in the text, which generally refer to named 'things' such as famous individuals, landmarks, common objects, etc... A good general practice to follow is that if something is a noun, it qualifies as an 'entity.'" [[API Documentation](https://cloud.google.com/natural-language/docs/basics#entity_analysis)]

* `entity_type` indicates the type of entity (i.e., it classifies the entity as a person, location, consumer good, etc.).
* `mid` provides a "machine-generated identifier" correspoding to the entity's [Google Knowledge Graph](https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html) entry.
* `wikipedia_url` provides the entity's [Wikipedia](https://www.wikipedia.org/) URL.
* `salience` indicates the entity's importance to the entire text. Scores range from 0.0 (less important) to 1.0 (highly important).
* Additional column definitions are outlined [here](https://cloud.google.com/natural-language/docs/basics#entity_analysis_response_fields).

```{r eval = FALSE}
analyzed$entities
```
```{r eval = TRUE, echo = FALSE}
knitr::kable(analyzed$entities, format = "markdown")
```

#### Document sentiment {#document-sentiment}

"Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical `score` and `magnitude` values." [[API Documentation](https://cloud.google.com/natural-language/docs/basics#sentiment_analysis)]

* `score` ranges from -1.0 (negative) to 1.0 (positive), and indicates to the "overall emotional leaning of the text".
* `magnitude` "indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes)."

A note on how to interpret these sentiment values is posted [here](https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values).

```{r eval = FALSE}
analyzed$documentSentiment
```
```{r eval = TRUE, echo = FALSE}
knitr::kable(analyzed$documentSentiment, format = "markdown")
```

#### Language

`language` indicates the detected language of the document. Only English ("en"), Spanish ("es") and Japanese ("ja") are currently supported by the API.

```{r eval = TRUE}
analyzed$language
```