Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ColinFay/lexiquer

Access Lexique3.81, a Natural Language Processing Database for French
https://github.com/ColinFay/lexiquer

Last synced: about 2 months ago
JSON representation

Access Lexique3.81, a Natural Language Processing Database for French

Host: GitHub
URL: https://github.com/ColinFay/lexiquer
Owner: ColinFay
License: other
Created: 2017-10-05T20:07:24.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2017-11-13T17:17:34.000Z (about 7 years ago)
Last Synced: 2024-08-03T17:12:27.690Z (5 months ago)
Language: R
Homepage:
Size: 14.4 MB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

frrrenchies - lexiquer

README

        ---

output:

  md_document:

    variant: markdown_github

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-", 

  warning = FALSE, 

  message = FALSE

)

```

Notes: this package is still under development.

# lexiquer

This package is a wrapper around Lexique 3.81, a database for natural language processing in French. 

More info on: [http://www.lexique.org](http://www.lexique.org)

## What's in this db?

Lexique gives access to ~ 150 000 french words with several annotations: lemme, phoneme, genre, frequency, number of letters, word neighbours... 

## Getting started

The full corpus is a data object contained inside the package, which you can call with :

```{r}

library(lexiquer)

data("lexique")

```

You can then left join it with a one-word-per-row data.frame: 

```{r message=FALSE}

library(tidytext)

library(proustr)

library(tidyverse)

sw <- proust_stopwords()

ds <- ducotedechezswann

tm <- unnest_tokens(ds, word, text) %>%

  slice(1:10) %>%

  select(word)

tm %>%

  left_join(lexique, by = c("word" = "ortho")) %>%

  select(lemme, cgramortho) %>%

  na.omit() %>%

  count(lemme, cgramortho) %>%

  top_n(10, n) %>%

  arrange(desc(n))

```

### `bind_*` wrappers

`{lexiquer}` provides a series of wrapper to bind specific part of the corpus to your text. See the `bind_*` functions for more details. 

For example, you can binf the grammatical category of the word: 

```{r}

bind_gram_cat(tm, word)

```

Or the lemme

```{r}

bind_lemme(tm, word)

```

### `is_lemme` 

Test if a word is a lemme : 

```{r}

is_lemme(tm, word)

```

### `count_*` wrappers

Several counting functions are available:

```{r}

count_syll(tm, word)

```

## Install 

```{r eval = FALSE}

devtools::install_github("ColinFay/lexiquer")

```

## Feedbacks 

Questions and feedbacks [welcome](mailto:[email protected])!

You want to contribute ? Open a [PR](https://github.com/ColinFay/lexiquer/pulls) :) If you encounter a bug or want to suggest an enhancement, please [open an issue](https://github.com/ColinFay/lexiquer/issues).