https://github.com/trinker/parsent

Sentence parsing tools; create sentence parse trees & extract portions of sentences
https://github.com/trinker/parsent

corenlp opennlp r sentence-parser

Last synced: 6 months ago
JSON representation

Sentence parsing tools; create sentence parse trees & extract portions of sentences

Host: GitHub
URL: https://github.com/trinker/parsent
Owner: trinker
Created: 2015-11-21T03:35:45.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2017-11-22T23:39:38.000Z (almost 8 years ago)
Last Synced: 2025-02-14T12:45:15.736Z (8 months ago)
Topics: corenlp, opennlp, r, sentence-parser
Language: R
Homepage:
Size: 128 KB
Stars: 5
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS

Awesome Lists containing this project

README

          ---

title: "parsent"

date: "`r format(Sys.time(), '%d %B, %Y')`"

output:

  md_document:

    toc: true      

---

```{r, echo=FALSE}

desc <- suppressWarnings(readLines("DESCRIPTION"))

regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"

loc <- grep(regex, desc)

ver <- gsub(regex, "\\2", desc[loc])

verbadge <- sprintf('', ver, ver)

verbadge <- ''

````

[![Build Status](https://travis-ci.org/trinker/parsent.svg?branch=master)](https://travis-ci.org/trinker/parsent)

[![Coverage Status](https://coveralls.io/repos/trinker/parsent/badge.svg?branch=master)](https://coveralls.io/r/trinker/parsent?branch=master)

`r verbadge`

```{r, echo=FALSE, message=FALSE}

library(knitr)

knit_hooks$set(htmlcap = function(before, options, envir) {

  if(!before) {

    paste('
',options$htmlcap,"",sep="")

    }

    })

knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)

knitr::opts_chunk$set(fig.path = "tools/figure/")

```

![](tools/parsent_logo/r_parsent.png)

**parsent** is a collection of tools used to parse sentences.  The package is a wrapper for the **NLP**/**openNLP** packages that simplifies and extends the user experience.

# Function Usage

Functions typically fall into the task category of (1) parsing, (2) converting, & (3) extracting.  The main functions, task category, & descriptions are summarized in the table below:

| Function                  | Task       | Description                                               | 

|---------------------------|------------|-----------------------------------------------------------| 

| `parser`                  | parsing    | Parse sentences into phrases                              | 

| `parse_annotator`         | parsing    | Generate **OpenNLP** parser required by `parser` function | 

| `as_tree`                 | converting | Convert `parser` output into tree form                    | 

| `as_square_brace`         | converting | Convert `parser` output in square brace form (vs. round)  | 

| `as_square_brace_latex`   | converting | Convert `parser` output LaTeX ready form                  | 

| `get_phrases`             | extracting | Extract [phrases](https://en.wikipedia.org/wiki/Phrase_structure_grammar) from `parser` output                  |

| `get_phrase_type`         | extracting | Extract phrases one step down the tree                    | 

| `get_phrase_type_regex`   | extracting | Extract phrases at any level in the tree (uses regex)     | 

| `get_leaves`              | extracting | Extract the leaves (tokens or words) from a phrase        | 

| `take`                    | extracting | Select indexed elements from a vector                     |  

# Installation

To download the development version of **parsent**:

Download the [zip ball](https://github.com/trinker/parsent/zipball/master) or [tar ball](https://github.com/trinker/parsent/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:

```r

if (!require("pacman")) install.packages("pacman")

pacman::p_load_gh(c(

    "trinker/textshape", 

    "trinker/coreNLPsetup",          

    "trinker/parsent"

))

```

# Contact

You are welcome to:

* submit suggestions and bug-reports at: 

* send a pull request on: 

* compose a friendly e-mail to: 

# Demonstration

## Load the Packages/Data

```{r, message=FALSE}

if (!require("pacman")) install.packages("pacman")

pacman::p_load(parsent, magrittr)

txt <- c(

    "Really, I like chocolate because it is good. It smells great.",

    "Robots are rather evil and most are devoid of decency.",

    "He is my friend.",

    "Clifford the big red dog ate my lunch.",

    "Professor Johns can not teach",

    "",

    NA

)

```

## Create Annotator 

```{r}

if(!exists('parse_ann')) {

    parse_ann <- parse_annotator()

}

```

## Parsing

```{r}

(x <- parser(txt, parse.annotator = parse_ann))

```

Note that the user may choose to use CoreNLP as a backend by setting `engine = "coreNLP"`.  To ensure that coreNLP is setup properly use `check_setup`.

## Plotting

```{r}

par(mar = c(0,0,0,.7) + 0.2)

plot(x[[2]])

```

```{r}

par(

    mfrow = c(3, 2),

    mar = c(0,0,1,1) + 0.1

)

invisible(lapply(x[1:5], plot))

```

## Get Subject, Verb, and Direct Object

### Subject

```{r}

get_phrase_type(x, "NP") %>%

    take() %>%

    get_leaves()

```

### Predicate Verb

```{r}

get_phrase_type_regex(x, "VP") %>%

    take() %>%

    get_phrase_type_regex("(VB|MD)") %>%

    take() %>%

    get_leaves()

```

### Direct Object

```{r}

get_phrase_type_regex(x, "VP") %>%

    take() %>%

    get_phrase_type_regex("NP") %>%

    take() %>%

    get_leaves()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/trinker/parsent

Awesome Lists containing this project

README