https://github.com/trinker/parsent
Sentence parsing tools; create sentence parse trees & extract portions of sentences
https://github.com/trinker/parsent
corenlp opennlp r sentence-parser
Last synced: 6 months ago
JSON representation
Sentence parsing tools; create sentence parse trees & extract portions of sentences
- Host: GitHub
- URL: https://github.com/trinker/parsent
- Owner: trinker
- Created: 2015-11-21T03:35:45.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2017-11-22T23:39:38.000Z (almost 8 years ago)
- Last Synced: 2025-02-14T12:45:15.736Z (8 months ago)
- Topics: corenlp, opennlp, r, sentence-parser
- Language: R
- Homepage:
- Size: 128 KB
- Stars: 5
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS
Awesome Lists containing this project
README
---
title: "parsent"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
md_document:
toc: true
---```{r, echo=FALSE}
desc <- suppressWarnings(readLines("DESCRIPTION"))
regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"
loc <- grep(regex, desc)
ver <- gsub(regex, "\\2", desc[loc])
verbadge <- sprintf('', ver, ver)
verbadge <- ''
````[](https://travis-ci.org/trinker/parsent)
[](https://coveralls.io/r/trinker/parsent?branch=master)
`r verbadge````{r, echo=FALSE, message=FALSE}
library(knitr)
knit_hooks$set(htmlcap = function(before, options, envir) {
if(!before) {
paste('',options$htmlcap,"
",sep="")
}
})
knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)
knitr::opts_chunk$set(fig.path = "tools/figure/")
```
**parsent** is a collection of tools used to parse sentences. The package is a wrapper for the **NLP**/**openNLP** packages that simplifies and extends the user experience.
# Function Usage
Functions typically fall into the task category of (1) parsing, (2) converting, & (3) extracting. The main functions, task category, & descriptions are summarized in the table below:
| Function | Task | Description |
|---------------------------|------------|-----------------------------------------------------------|
| `parser` | parsing | Parse sentences into phrases |
| `parse_annotator` | parsing | Generate **OpenNLP** parser required by `parser` function |
| `as_tree` | converting | Convert `parser` output into tree form |
| `as_square_brace` | converting | Convert `parser` output in square brace form (vs. round) |
| `as_square_brace_latex` | converting | Convert `parser` output LaTeX ready form |
| `get_phrases` | extracting | Extract [phrases](https://en.wikipedia.org/wiki/Phrase_structure_grammar) from `parser` output |
| `get_phrase_type` | extracting | Extract phrases one step down the tree |
| `get_phrase_type_regex` | extracting | Extract phrases at any level in the tree (uses regex) |
| `get_leaves` | extracting | Extract the leaves (tokens or words) from a phrase |
| `take` | extracting | Select indexed elements from a vector |# Installation
To download the development version of **parsent**:
Download the [zip ball](https://github.com/trinker/parsent/zipball/master) or [tar ball](https://github.com/trinker/parsent/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:
```r
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(c(
"trinker/textshape",
"trinker/coreNLPsetup",
"trinker/parsent"
))
```# Contact
You are welcome to:
* submit suggestions and bug-reports at:
* send a pull request on:
* compose a friendly e-mail to:# Demonstration
## Load the Packages/Data
```{r, message=FALSE}
if (!require("pacman")) install.packages("pacman")pacman::p_load(parsent, magrittr)
txt <- c(
"Really, I like chocolate because it is good. It smells great.",
"Robots are rather evil and most are devoid of decency.",
"He is my friend.",
"Clifford the big red dog ate my lunch.",
"Professor Johns can not teach",
"",
NA
)
```## Create Annotator
```{r}
if(!exists('parse_ann')) {
parse_ann <- parse_annotator()
}
```## Parsing
```{r}
(x <- parser(txt, parse.annotator = parse_ann))
```Note that the user may choose to use CoreNLP as a backend by setting `engine = "coreNLP"`. To ensure that coreNLP is setup properly use `check_setup`.
## Plotting
```{r}
par(mar = c(0,0,0,.7) + 0.2)
plot(x[[2]])
``````{r}
par(
mfrow = c(3, 2),
mar = c(0,0,1,1) + 0.1
)
invisible(lapply(x[1:5], plot))
```## Get Subject, Verb, and Direct Object
### Subject```{r}
get_phrase_type(x, "NP") %>%
take() %>%
get_leaves()
```### Predicate Verb
```{r}
get_phrase_type_regex(x, "VP") %>%
take() %>%
get_phrase_type_regex("(VB|MD)") %>%
take() %>%
get_leaves()
```### Direct Object
```{r}
get_phrase_type_regex(x, "VP") %>%
take() %>%
get_phrase_type_regex("NP") %>%
take() %>%
get_leaves()
```