https://github.com/BrianWeinstein/googlenlp

An Interface to Google's Cloud Natural Language API
https://github.com/BrianWeinstein/googlenlp

api cran google-cloud-platform nlp r

Last synced: 4 months ago
JSON representation

An Interface to Google's Cloud Natural Language API

Host: GitHub
URL: https://github.com/BrianWeinstein/googlenlp
Owner: BrianWeinstein
License: other
Created: 2016-12-11T19:20:54.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2018-07-14T21:21:59.000Z (over 7 years ago)
Last Synced: 2024-11-29T00:10:29.582Z (about 1 year ago)
Topics: api, cran, google-cloud-platform, nlp, r
Language: R
Size: 333 KB
Stars: 8
Watchers: 2
Forks: 1
Open Issues: 4
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - BrianWeinstein/googlenlp - An Interface to Google's Cloud Natural Language API (R)

README

          ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

googlenlp

---

[![Travis-CI Build Status](https://travis-ci.org/BrianWeinstein/googlenlp.svg?branch=master)](https://travis-ci.org/BrianWeinstein/googlenlp) [![CRAN status](http://www.r-pkg.org/badges/version/googlenlp)](https://cran.r-project.org/package=googlenlp) [![Download count](https://cranlogs.r-pkg.org/badges/googlenlp)](https://cran.r-project.org/package=googlenlp)

---

The googlenlp package provides an R interface to Google's [Cloud Natural Language API](https://cloud.google.com/natural-language/).

"Google Cloud Natural Language API reveals the structure and meaning of text by offering powerful machine learning models in an easy to use REST API. You can use it to **extract information** about people, places, events and much more, mentioned in text documents, news articles or blog posts. You can use it to **understand sentiment** about your product on social media or **parse intent** from customer conversations happening in a call center or a messaging app." [[source](https://cloud.google.com/natural-language/)]

There are four main features of the API, all of which are available through this R package [[source](https://cloud.google.com/natural-language/)]:

* **Syntax Analysis:** "Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence."

* **Entity Analysis:** "Identify entities and label by types such as person, organization, location, events, products and media."

* **Sentiment Analysis:** "Understand the overall sentiment expressed in a block of text."

* **Multi-Language:** "Enables you to easily analyze text in multiple languages including English, Spanish and Japanese."

### Resources

* [API Documentation](https://cloud.google.com/natural-language/docs/)

* [Natural Language API Basics](https://cloud.google.com/natural-language/docs/basics)

* [Morphology & Dependency Trees](https://cloud.google.com/natural-language/docs/morphology)

### Installation

The current googlenlp release can be installed from CRAN: 

```{r eval = FALSE}

install.packages("googlenlp")

```

The newest development release can be installed from GitHub:

```{r eval = FALSE}

# install.packages('devtools')

devtools::install_github("BrianWeinstein/googlenlp")

```

### Authentication

To use the API, you'll first need to [create a Google Cloud project and enable billing](https://cloud.google.com/natural-language/docs/getting-started), and get an [API key](https://cloud.google.com/natural-language/docs/common/auth).

### Configuration

Load the package and set your API key. There are two ways to do this.

#### Method A (preferred)

Method A (preferred method) adds your API key as a variable to your `.Renviron` file. Under this method, you only need to do this setup process one time.

```{r eval = FALSE}

library(googlenlp)

configure_googlenlp() # follow the instructions printed to the console

```

```

googlenlp setup instructions:

 1. Your ~/.Renviron file will now open in a new window/tab.

    *** If it doesn't open, run:  file.edit("~/.Renviron") ***

 2. To use the API, you'll first need to create a Google Cloud project and enable billing (https://cloud.google.com/natural-language/docs/getting-started).

 3. Next you'll need to get an API key (https://cloud.google.com/natural-language/docs/common/auth).

 4. In your  ~/.Renviron  file, replace the ENTER_YOUR_API_KEY_HERE with your Google Cloud API key.

 5. Save your ~/.Renviron file.

 6. *** Restart your R session for changes to take effect. ***

```

#### Method B

Method B defines your API key as a session-level variable. Under this method, you'll need to set your API key at the beginning of each R session.

```{r eval = FALSE}

library(googlenlp)

set_api_key("MY_API_KEY") # replace this with your API key

```

### Getting started

```{r eval = TRUE, include = FALSE}

devtools::load_all("~/Documents/googlenlp")

library(dplyr)

```

Define the text you'd like to analyze.

```{r eval = TRUE}

text <- "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.

         Sundar Pichai said in his keynote that users love their new Android phones."

```

The `annotate_text` function analyzes the text's syntax (sentences and tokens), entities, sentiment, and language; and returns the result as a five-element list.

```{r eval = TRUE}

analyzed <- annotate_text(text_body = text)

str(analyzed, max.level = 1)

```

#### Sentences

"Sentence extraction breaks up the stream of text into a series of sentences." [[API Documentation](https://cloud.google.com/natural-language/docs/basics#sentence-extraction)]

* `beginOffset` indicates the (zero-based) character index of where the sentence begins (wtih UTF-8 encoding).

* The `magnitude` and `score` fields quantify each sentence's sentiment — see the [Document Sentiment](#document-sentiment) section for more details.

```{r eval = FALSE}

analyzed$sentences

```

```{r eval = TRUE, echo = FALSE}

knitr::kable(analyzed$sentences, format = "markdown")

```

#### Tokens

"Tokenization breaks the stream of text up into a series of tokens, with each token usually corresponding to a single word.

The Natural Language API then processes the tokens and, using their locations within sentences, adds syntactic information to the tokens." [[API Documentation](https://cloud.google.com/natural-language/docs/basics#tokenization)]

* `lemma` indicates the token's "root" word, and can be useful in standardizing the word within the text.

* `tag` indicates the token's part of speech.

* Additional column definitions are outlined [here](https://cloud.google.com/natural-language/docs/basics#tokenization) and [here](https://cloud.google.com/natural-language/docs/morphology#parts_of_speech).

```{r eval = FALSE}

analyzed$tokens

```

```{r eval = TRUE, echo = FALSE}

knitr::kable(analyzed$tokens, format = "markdown")

```

#### Entities

"Entity Analysis provides information about entities in the text, which generally refer to named 'things' such as famous individuals, landmarks, common objects, etc... A good general practice to follow is that if something is a noun, it qualifies as an 'entity.'" [[API Documentation](https://cloud.google.com/natural-language/docs/basics#entity_analysis)]

* `entity_type` indicates the type of entity (i.e., it classifies the entity as a person, location, consumer good, etc.).

* `mid` provides a "machine-generated identifier" correspoding to the entity's [Google Knowledge Graph](https://www.google.com/intl/bn/insidesearch/features/search/knowledge.html) entry.

* `wikipedia_url` provides the entity's [Wikipedia](https://www.wikipedia.org/) URL.

* `salience` indicates the entity's importance to the entire text. Scores range from 0.0 (less important) to 1.0 (highly important).

* Additional column definitions are outlined [here](https://cloud.google.com/natural-language/docs/basics#entity_analysis_response_fields).

```{r eval = FALSE}

analyzed$entities

```

```{r eval = TRUE, echo = FALSE}

knitr::kable(analyzed$entities, format = "markdown")

```

#### Document sentiment {#document-sentiment}

"Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical `score` and `magnitude` values." [[API Documentation](https://cloud.google.com/natural-language/docs/basics#sentiment_analysis)]

* `score` ranges from -1.0 (negative) to 1.0 (positive), and indicates to the "overall emotional leaning of the text".

* `magnitude` "indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes)."

A note on how to interpret these sentiment values is posted [here](https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values).

```{r eval = FALSE}

analyzed$documentSentiment

```

```{r eval = TRUE, echo = FALSE}

knitr::kable(analyzed$documentSentiment, format = "markdown")

```

#### Language

`language` indicates the detected language of the document. Only English ("en"), Spanish ("es") and Japanese ("ja") are currently supported by the API.

```{r eval = TRUE}

analyzed$language

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/BrianWeinstein/googlenlp

Awesome Lists containing this project

README