Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ColinFay/lexiquer

Access Lexique3.81, a Natural Language Processing Database for French
https://github.com/ColinFay/lexiquer

Last synced: about 2 months ago
JSON representation

Access Lexique3.81, a Natural Language Processing Database for French

Awesome Lists containing this project

README

        

---
output:
md_document:
variant: markdown_github
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
warning = FALSE,
message = FALSE
)
```

Notes: this package is still under development.

# lexiquer

This package is a wrapper around Lexique 3.81, a database for natural language processing in French.

More info on: [http://www.lexique.org](http://www.lexique.org)

## What's in this db?

Lexique gives access to ~ 150 000 french words with several annotations: lemme, phoneme, genre, frequency, number of letters, word neighbours...

## Getting started

The full corpus is a data object contained inside the package, which you can call with :

```{r}
library(lexiquer)
data("lexique")
```

You can then left join it with a one-word-per-row data.frame:

```{r message=FALSE}
library(tidytext)
library(proustr)
library(tidyverse)

sw <- proust_stopwords()
ds <- ducotedechezswann

tm <- unnest_tokens(ds, word, text) %>%
slice(1:10) %>%
select(word)
tm %>%
left_join(lexique, by = c("word" = "ortho")) %>%
select(lemme, cgramortho) %>%
na.omit() %>%
count(lemme, cgramortho) %>%
top_n(10, n) %>%
arrange(desc(n))
```

### `bind_*` wrappers

`{lexiquer}` provides a series of wrapper to bind specific part of the corpus to your text. See the `bind_*` functions for more details.

For example, you can binf the grammatical category of the word:

```{r}
bind_gram_cat(tm, word)
```

Or the lemme

```{r}
bind_lemme(tm, word)
```

### `is_lemme`

Test if a word is a lemme :

```{r}
is_lemme(tm, word)
```

### `count_*` wrappers

Several counting functions are available:

```{r}
count_syll(tm, word)
```

## Install

```{r eval = FALSE}
devtools::install_github("ColinFay/lexiquer")
```

## Feedbacks

Questions and feedbacks [welcome](mailto:[email protected])!

You want to contribute ? Open a [PR](https://github.com/ColinFay/lexiquer/pulls) :) If you encounter a bug or want to suggest an enhancement, please [open an issue](https://github.com/ColinFay/lexiquer/issues).