Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tidymodels/textrecipes
Extra recipes for Text Processing
https://github.com/tidymodels/textrecipes
Last synced: 3 days ago
JSON representation
Extra recipes for Text Processing
- Host: GitHub
- URL: https://github.com/tidymodels/textrecipes
- Owner: tidymodels
- License: other
- Created: 2018-09-10T23:15:56.000Z (about 6 years ago)
- Default Branch: main
- Last Pushed: 2024-11-08T21:59:53.000Z (4 days ago)
- Last Synced: 2024-11-08T22:32:05.032Z (4 days ago)
- Language: R
- Homepage: https://textrecipes.tidymodels.org/
- Size: 70 MB
- Stars: 160
- Watchers: 10
- Forks: 14
- Open Issues: 29
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```[![R-CMD-check](https://github.com/tidymodels/textrecipes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/textrecipes/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/textrecipes/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/textrecipes?branch=main)
[![CRAN status](http://www.r-pkg.org/badges/version/textrecipes)](https://CRAN.R-project.org/package=textrecipes)
[![Downloads](http://cranlogs.r-pkg.org/badges/textrecipes)](https://CRAN.R-project.org/package=textrecipes)
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)## Introduction
**textrecipes** contain extra steps for the [`recipes`](https://CRAN.R-project.org/package=recipes) package for preprocessing text data.
## Installation
You can install the released version of textrecipes from [CRAN](https://CRAN.R-project.org) with:
```{r, eval=FALSE}
install.packages("textrecipes")
```Install the development version from GitHub with:
```{r installation, eval=FALSE}
# install.packages("pak")
pak::pak("tidymodels/textrecipes")
```## Example
In the following example we will go through the steps needed, to convert a character variable to the TF-IDF of its tokenized words after removing stopwords, and, limiting ourself to only the 10 most used words. The preprocessing will be conducted on the variable `medium` and `artist`.
```{r, message=FALSE}
library(recipes)
library(textrecipes)
library(modeldata)data("tate_text")
okc_rec <- recipe(~ medium + artist, data = tate_text) %>%
step_tokenize(medium, artist) %>%
step_stopwords(medium, artist) %>%
step_tokenfilter(medium, artist, max_tokens = 10) %>%
step_tfidf(medium, artist)okc_obj <- okc_rec %>%
prep()str(bake(okc_obj, tate_text))
```## Breaking changes
As of version 0.4.0, `step_lda()` no longer accepts character variables and instead takes tokenlist variables.
the following recipe
```{r, eval=FALSE}
recipe(~text_var, data = data) %>%
step_lda(text_var)
```can be replaced with the following recipe to achive the same results
```{r, eval=FALSE}
lda_tokenizer <- function(x) text2vec::word_tokenizer(tolower(x))
recipe(~text_var, data = data) %>%
step_tokenize(text_var,
custom_token = lda_tokenizer
) %>%
step_lda(text_var)
```## Contributing
This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/textrecipes/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).