https://github.com/chainsawriot/sehrnett
📖 A Very Nice Interface To Princeton's WordNet
https://github.com/chainsawriot/sehrnett
r r-package rstats wordnet
Last synced: 8 months ago
JSON representation
📖 A Very Nice Interface To Princeton's WordNet
- Host: GitHub
- URL: https://github.com/chainsawriot/sehrnett
- Owner: chainsawriot
- License: gpl-3.0
- Created: 2021-11-30T21:42:16.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-02T21:08:50.000Z (over 2 years ago)
- Last Synced: 2025-02-07T00:09:14.025Z (8 months ago)
- Topics: r, r-package, rstats, wordnet
- Language: R
- Homepage:
- Size: 188 KB
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
devtools::load_all()
```# sehrnett
[](https://github.com/chainsawriot/sehrnett/actions)
[](https://app.codecov.io/gh/chainsawriot/sehrnett?branch=master)
[](https://CRAN.R-project.org/package=sehrnett)
[](https://github.com/chainsawriot/sehrnett/actions/workflows/R-CMD-check.yaml)The goal of sehrnett is to provide a nice (and fast) interface to [Princeton's WordNet](https://wordnet.princeton.edu/). Unlike the original [wordnet package](https://cran.r-project.org/package=wordnet) (Feinerer et al., 2020), you don't need to install WordNet and / or setup rJava.
The data is not included in the package. Please run `download_wordnet()` to download the data (~100M Zipped, ~400M Unzipped) from the Internet, if such data is not available. Please make sure you agree with the [WordNet License](https://wordnet.princeton.edu/license-and-commercial-use).
## Installation
``` r
devtools::install_github("chainsawriot/sehrnett")
```## `get_lemmas`
The most basic function is `get_lemmas`. It generates basic information about the lemmas [^1] you provided.
```r
library(sehrnett)
``````{r}
get_lemmas(c("very", "nice"))
``````{r}
get_lemmas("nice")
``````{r}
get_lemmas("nice", pos = "n")
```Please note that some definitions in WordNet are considered pejorative or offensive, e.g.
```{r}
get_lemmas("dog")
```### Dot notation
The dot notation ("lemma.pos.sensenum") can be used to quick search for a particular word sense. For example, one can search for "king.n.10" to quickly pin down the word sense of "king" as a chess piece.
```{r}
get_lemmas("king.n.10")
```### Lemmatization
The [morphological processing](https://wordnet.princeton.edu/documentation/morphy7wn) of the original Wordnet is partially implemented in `sehrnett` [^2]. As the Wordnet's database contains only information about lemmas (e.g. *eat*), you need to convert inflected variants (e.g. *ate*, *eaten*, *eating*) back to their lemmas to query them. The process is otherwise known as [lemmatization](https://en.wikipedia.org/wiki/Lemmatisation).
`sehrnett` provides such lemmatization. But you need to provide exactly one `pos` and set `lemmatize` to `TRUE` (default).
```{r}
get_lemmas(c("ate", "ducking"), pos = "v")
``````{r}
get_lemmas(c("loci", "lemmata", "boxesful"), pos = "n")
``````{r}
get_lemmas(c("nicest", "stronger"), pos = "a")
```## A practical example
For example, you want to know the synonyms of the word "nuance" (very important for academic writing). You can first search using the lemma "nuance" with `get_lemmas`.
```{r}
res <- get_lemmas("nuance")
res
```There could be multiple word senses and you need to choose which word sense you want to convey. But in this case, there is only one. You can then search for the `synsetid` (cognitive synonym identifier) of that word sense.
```{r}
# get_synonyms() is a wrapper to get_synsetids
get_synsetids(res$synsetid[1])
```## Chainablilty
All `get_` functions are chainable by using the magrittr pipe operator.
```{r}
c("switch off") %>% get_lemmas(pos = "v") %>% get_synonyms
```## `get_outdegrees`
WordNet is indeed a network. synsetids are connected to each other in a directed graph. An node (a synsetid) is linked to another with different link (edge) types labelling with different `linkid`s. You can list out all available `linkid`s with the function `list_linktypes`.
```{r}
list_linktypes()
``````{r}
## all hypernyms
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 1)
``````{r}
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 2)
``````{r}
## all antonyms
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_outdegrees(linkid = 30)
```### Sugars
`sehrnett` provides several syntactic sugars as `get_` functions. For example:
```{r}
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_hyponyms()
``````{r}
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_antonyms()
``````{r}
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_derivatives()
```---
[^1]: Yes, the plural of *lemma* can also be *lemmata*, you Latin-speaking people.
[^2]: Like many implementations (e.g. NLTK, Ruby's rwordnet and node-wordnet-magic), the morpological processing is only partial. Collocations and hyphenation are not supported. Therefore, please don't expect that lemmatizing *asking for it* would obtain *ask for it* (as documented in Wordnet's website).