{"id":13857800,"url":"https://github.com/tidymodels/textrecipes","last_synced_at":"2025-05-16T06:03:30.441Z","repository":{"id":39707733,"uuid":"148230862","full_name":"tidymodels/textrecipes","owner":"tidymodels","description":"Extra recipes for Text Processing","archived":false,"fork":false,"pushed_at":"2025-04-22T23:29:14.000Z","size":74849,"stargazers_count":161,"open_issues_count":33,"forks_count":14,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-22T23:35:36.960Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://textrecipes.tidymodels.org/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tidymodels.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-10T23:15:56.000Z","updated_at":"2025-04-22T23:09:02.000Z","dependencies_parsed_at":"2024-02-09T01:57:35.261Z","dependency_job_id":"38285628-6d71-462e-b985-1e58f0e1c2fb","html_url":"https://github.com/tidymodels/textrecipes","commit_stats":{"total_commits":609,"total_committers":12,"mean_commits":50.75,"dds":0.08702791461412152,"last_synced_commit":"30e607a06fa8a34efa959d256a71301905bd068d"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftextrecipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftextrecipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftextrecipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftextrecipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tidymodels","download_url":"https://codeload.github.com/tidymodels/textrecipes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254478160,"owners_count":22077675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:47.306Z","updated_at":"2025-05-16T06:03:30.423Z","avatar_url":"https://github.com/tidymodels.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r}\n#| label: setup\n#| include: false\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# textrecipes \u003ca href='https://textrecipes.tidymodels.org'\u003e\u003cimg src='man/figures/logo.png' align=\"right\" height=\"139\" /\u003e\u003c/a\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/tidymodels/textrecipes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/textrecipes/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/tidymodels/textrecipes/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/textrecipes?branch=main)\n[![CRAN status](http://www.r-pkg.org/badges/version/textrecipes)](https://CRAN.R-project.org/package=textrecipes)\n[![Downloads](http://cranlogs.r-pkg.org/badges/textrecipes)](https://CRAN.R-project.org/package=textrecipes)\n[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)\n\u003c!-- badges: end --\u003e\n\n## Introduction\n\n**textrecipes** contain extra steps for the [`recipes`](https://CRAN.R-project.org/package=recipes) package for preprocessing text data. \n\n## Installation\n\nYou can install the released version of textrecipes from [CRAN](https://CRAN.R-project.org) with:\n\n```{r}\n#| eval: false\ninstall.packages(\"textrecipes\")\n```\n\nInstall the development version from GitHub with:\n\n```{r}\n#| label: installation\n#| eval: false\n# install.packages(\"pak\")\npak::pak(\"tidymodels/textrecipes\")\n```\n\n## Example\n\nIn the following example we will go through the steps needed, to convert a character variable to the TF-IDF of its tokenized words after removing stopwords, and, limiting ourself to only the 10 most used words. The preprocessing will be conducted on the variable `medium` and `artist`.\n\n```{r}\n#| message: false\nlibrary(recipes)\nlibrary(textrecipes)\nlibrary(modeldata)\n\ndata(\"tate_text\")\n\nokc_rec \u003c- recipe(~ medium + artist, data = tate_text) |\u003e\n  step_tokenize(medium, artist) |\u003e\n  step_stopwords(medium, artist) |\u003e\n  step_tokenfilter(medium, artist, max_tokens = 10) |\u003e\n  step_tfidf(medium, artist)\n\nokc_obj \u003c- okc_rec |\u003e\n  prep()\n\nstr(bake(okc_obj, tate_text))\n```\n\n## Breaking changes\n\nAs of version 0.4.0, `step_lda()` no longer accepts character variables and instead takes tokenlist variables.\n\nthe following recipe\n\n```{r}\n#| eval: false\nrecipe(~text_var, data = data) |\u003e\n  step_lda(text_var)\n```\n\ncan be replaced with the following recipe to achive the same results\n\n```{r}\n#| eval: false\nlda_tokenizer \u003c- function(x) text2vec::word_tokenizer(tolower(x))\nrecipe(~text_var, data = data) |\u003e\n  step_tokenize(text_var,\n    custom_token = lda_tokenizer\n  ) |\u003e\n  step_lda(text_var)\n```\n\n## Contributing\n\nThis project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\n- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15\u0026tags=tidymodels,question).\n\n- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/textrecipes/issues).\n\n- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.\n\n- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Ftextrecipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidymodels%2Ftextrecipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Ftextrecipes/lists"}