https://github.com/tidymodels/textrecipes

Extra recipes for Text Processing
https://github.com/tidymodels/textrecipes

Last synced: 7 months ago
JSON representation

Extra recipes for Text Processing

Host: GitHub
URL: https://github.com/tidymodels/textrecipes
Owner: tidymodels
License: other
Created: 2018-09-10T23:15:56.000Z (over 7 years ago)
Default Branch: main
Last Pushed: 2025-04-22T23:29:14.000Z (8 months ago)
Last Synced: 2025-04-22T23:35:36.960Z (8 months ago)
Language: R
Homepage: https://textrecipes.tidymodels.org/
Size: 71.4 MB
Stars: 161
Watchers: 9
Forks: 14
Open Issues: 33
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r}

#| label: setup

#| include: false

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# textrecipes 

[![R-CMD-check](https://github.com/tidymodels/textrecipes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/textrecipes/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/tidymodels/textrecipes/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/textrecipes?branch=main)

[![CRAN status](http://www.r-pkg.org/badges/version/textrecipes)](https://CRAN.R-project.org/package=textrecipes)

[![Downloads](http://cranlogs.r-pkg.org/badges/textrecipes)](https://CRAN.R-project.org/package=textrecipes)

[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)

## Introduction

**textrecipes** contain extra steps for the [`recipes`](https://CRAN.R-project.org/package=recipes) package for preprocessing text data. 

## Installation

You can install the released version of textrecipes from [CRAN](https://CRAN.R-project.org) with:

```{r}

#| eval: false

install.packages("textrecipes")

```

Install the development version from GitHub with:

```{r}

#| label: installation

#| eval: false

# install.packages("pak")

pak::pak("tidymodels/textrecipes")

```

## Example

In the following example we will go through the steps needed, to convert a character variable to the TF-IDF of its tokenized words after removing stopwords, and, limiting ourself to only the 10 most used words. The preprocessing will be conducted on the variable `medium` and `artist`.

```{r}

#| message: false

library(recipes)

library(textrecipes)

library(modeldata)

data("tate_text")

okc_rec <- recipe(~ medium + artist, data = tate_text) |>

  step_tokenize(medium, artist) |>

  step_stopwords(medium, artist) |>

  step_tokenfilter(medium, artist, max_tokens = 10) |>

  step_tfidf(medium, artist)

okc_obj <- okc_rec |>

  prep()

str(bake(okc_obj, tate_text))

```

## Breaking changes

As of version 0.4.0, `step_lda()` no longer accepts character variables and instead takes tokenlist variables.

the following recipe

```{r}

#| eval: false

recipe(~text_var, data = data) |>

  step_lda(text_var)

```

can be replaced with the following recipe to achive the same results

```{r}

#| eval: false

lda_tokenizer <- function(x) text2vec::word_tokenizer(tolower(x))

recipe(~text_var, data = data) |>

  step_tokenize(text_var,

    custom_token = lda_tokenizer

  ) |>

  step_lda(text_var)

```

## Contributing

This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/textrecipes/issues).

- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.

- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tidymodels/textrecipes

Awesome Lists containing this project

README