Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ColinFay/lexiquer
Access Lexique3.81, a Natural Language Processing Database for French
https://github.com/ColinFay/lexiquer
Last synced: about 2 months ago
JSON representation
Access Lexique3.81, a Natural Language Processing Database for French
- Host: GitHub
- URL: https://github.com/ColinFay/lexiquer
- Owner: ColinFay
- License: other
- Created: 2017-10-05T20:07:24.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-11-13T17:17:34.000Z (about 7 years ago)
- Last Synced: 2024-08-03T17:12:27.690Z (5 months ago)
- Language: R
- Homepage:
- Size: 14.4 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- frrrenchies - lexiquer
README
---
output:
md_document:
variant: markdown_github
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-",
warning = FALSE,
message = FALSE
)
```Notes: this package is still under development.
# lexiquer
This package is a wrapper around Lexique 3.81, a database for natural language processing in French.
More info on: [http://www.lexique.org](http://www.lexique.org)
## What's in this db?
Lexique gives access to ~ 150 000 french words with several annotations: lemme, phoneme, genre, frequency, number of letters, word neighbours...
## Getting started
The full corpus is a data object contained inside the package, which you can call with :
```{r}
library(lexiquer)
data("lexique")
```You can then left join it with a one-word-per-row data.frame:
```{r message=FALSE}
library(tidytext)
library(proustr)
library(tidyverse)sw <- proust_stopwords()
ds <- ducotedechezswanntm <- unnest_tokens(ds, word, text) %>%
slice(1:10) %>%
select(word)
tm %>%
left_join(lexique, by = c("word" = "ortho")) %>%
select(lemme, cgramortho) %>%
na.omit() %>%
count(lemme, cgramortho) %>%
top_n(10, n) %>%
arrange(desc(n))
```### `bind_*` wrappers
`{lexiquer}` provides a series of wrapper to bind specific part of the corpus to your text. See the `bind_*` functions for more details.
For example, you can binf the grammatical category of the word:
```{r}
bind_gram_cat(tm, word)
```Or the lemme
```{r}
bind_lemme(tm, word)
```### `is_lemme`
Test if a word is a lemme :
```{r}
is_lemme(tm, word)
```### `count_*` wrappers
Several counting functions are available:
```{r}
count_syll(tm, word)
```## Install
```{r eval = FALSE}
devtools::install_github("ColinFay/lexiquer")
```## Feedbacks
Questions and feedbacks [welcome](mailto:[email protected])!
You want to contribute ? Open a [PR](https://github.com/ColinFay/lexiquer/pulls) :) If you encounter a bug or want to suggest an enhancement, please [open an issue](https://github.com/ColinFay/lexiquer/issues).