An open API service indexing awesome lists of open source software.

https://github.com/chainsawriot/sehrnett

📖 A Very Nice Interface To Princeton's WordNet
https://github.com/chainsawriot/sehrnett

r r-package rstats wordnet

Last synced: 8 months ago
JSON representation

📖 A Very Nice Interface To Princeton's WordNet

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
devtools::load_all()
```

# sehrnett

[![R-CMD-check](https://github.com/chainsawriot/sehrnett/workflows/R-CMD-check/badge.svg)](https://github.com/chainsawriot/sehrnett/actions)
[![Codecov test coverage](https://codecov.io/gh/chainsawriot/sehrnett/branch/master/graph/badge.svg)](https://app.codecov.io/gh/chainsawriot/sehrnett?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/sehrnett)](https://CRAN.R-project.org/package=sehrnett)
[![R-CMD-check](https://github.com/chainsawriot/sehrnett/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/chainsawriot/sehrnett/actions/workflows/R-CMD-check.yaml)

The goal of sehrnett is to provide a nice (and fast) interface to [Princeton's WordNet](https://wordnet.princeton.edu/). Unlike the original [wordnet package](https://cran.r-project.org/package=wordnet) (Feinerer et al., 2020), you don't need to install WordNet and / or setup rJava.

The data is not included in the package. Please run `download_wordnet()` to download the data (~100M Zipped, ~400M Unzipped) from the Internet, if such data is not available. Please make sure you agree with the [WordNet License](https://wordnet.princeton.edu/license-and-commercial-use).

## Installation

``` r
devtools::install_github("chainsawriot/sehrnett")
```

## `get_lemmas`

The most basic function is `get_lemmas`. It generates basic information about the lemmas [^1] you provided.

```r
library(sehrnett)
```

```{r}
get_lemmas(c("very", "nice"))
```

```{r}
get_lemmas("nice")
```

```{r}
get_lemmas("nice", pos = "n")
```

Please note that some definitions in WordNet are considered pejorative or offensive, e.g.

```{r}
get_lemmas("dog")
```

### Dot notation

The dot notation ("lemma.pos.sensenum") can be used to quick search for a particular word sense. For example, one can search for "king.n.10" to quickly pin down the word sense of "king" as a chess piece.

```{r}
get_lemmas("king.n.10")
```

### Lemmatization

The [morphological processing](https://wordnet.princeton.edu/documentation/morphy7wn) of the original Wordnet is partially implemented in `sehrnett` [^2]. As the Wordnet's database contains only information about lemmas (e.g. *eat*), you need to convert inflected variants (e.g. *ate*, *eaten*, *eating*) back to their lemmas to query them. The process is otherwise known as [lemmatization](https://en.wikipedia.org/wiki/Lemmatisation).

`sehrnett` provides such lemmatization. But you need to provide exactly one `pos` and set `lemmatize` to `TRUE` (default).

```{r}
get_lemmas(c("ate", "ducking"), pos = "v")
```

```{r}
get_lemmas(c("loci", "lemmata", "boxesful"), pos = "n")
```

```{r}
get_lemmas(c("nicest", "stronger"), pos = "a")
```

## A practical example

For example, you want to know the synonyms of the word "nuance" (very important for academic writing). You can first search using the lemma "nuance" with `get_lemmas`.

```{r}
res <- get_lemmas("nuance")
res
```

There could be multiple word senses and you need to choose which word sense you want to convey. But in this case, there is only one. You can then search for the `synsetid` (cognitive synonym identifier) of that word sense.

```{r}
# get_synonyms() is a wrapper to get_synsetids
get_synsetids(res$synsetid[1])
```

## Chainablilty

All `get_` functions are chainable by using the magrittr pipe operator.

```{r}
c("switch off") %>% get_lemmas(pos = "v") %>% get_synonyms
```

## `get_outdegrees`

WordNet is indeed a network. synsetids are connected to each other in a directed graph. An node (a synsetid) is linked to another with different link (edge) types labelling with different `linkid`s. You can list out all available `linkid`s with the function `list_linktypes`.

```{r}
list_linktypes()
```

```{r}
## all hypernyms
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 1)
```

```{r}
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 2)
```

```{r}
## all antonyms
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_outdegrees(linkid = 30)
```

### Sugars

`sehrnett` provides several syntactic sugars as `get_` functions. For example:

```{r}
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_hyponyms()
```

```{r}
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_antonyms()
```

```{r}
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_derivatives()
```

---

[^1]: Yes, the plural of *lemma* can also be *lemmata*, you Latin-speaking people.

[^2]: Like many implementations (e.g. NLTK, Ruby's rwordnet and node-wordnet-magic), the morpological processing is only partial. Collocations and hyphenation are not supported. Therefore, please don't expect that lemmatizing *asking for it* would obtain *ask for it* (as documented in Wordnet's website).