{"id":19253593,"url":"https://github.com/tidymodels/tidyclust","last_synced_at":"2025-04-12T19:48:59.776Z","repository":{"id":37053087,"uuid":"429917806","full_name":"tidymodels/tidyclust","owner":"tidymodels","description":"A tidy unified interface to clustering models","archived":false,"fork":false,"pushed_at":"2025-01-27T23:38:32.000Z","size":9690,"stargazers_count":112,"open_issues_count":34,"forks_count":17,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-12T19:48:54.361Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://tidyclust.tidymodels.org/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tidymodels.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-19T19:52:49.000Z","updated_at":"2025-03-21T18:39:36.000Z","dependencies_parsed_at":"2024-06-17T18:31:54.989Z","dependency_job_id":"c4eae988-7954-451d-8f54-f53adfa814a3","html_url":"https://github.com/tidymodels/tidyclust","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftidyclust","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftidyclust/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftidyclust/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidymodels%2Ftidyclust/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tidymodels","download_url":"https://codeload.github.com/tidymodels/tidyclust/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625501,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T18:31:54.397Z","updated_at":"2025-04-12T19:48:59.753Z","avatar_url":"https://github.com/tidymodels.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# tidyclust \u003cimg src=\"man/figures/logo.svg\" align=\"right\" height=\"139\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![Codecov test coverage](https://codecov.io/gh/tidymodels/tidyclust/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tidyclust?branch=main)\n[![R-CMD-check](https://github.com/tidymodels/tidyclust/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/tidyclust/actions/workflows/R-CMD-check.yaml)\n\u003c!-- badges: end --\u003e\n\nThe goal of tidyclust is to provide a tidy, unified interface to clustering models. The packages is closely modeled after the [parsnip](https://parsnip.tidymodels.org/) package.\n\n## Installation\n\nYou can install the released version of tidyclust from [CRAN](https://CRAN.R-project.org) with:\n\n``` r\ninstall.packages(\"tidyclust\")\n```\n\n\nand the development version of tidyclust from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"pak\")\npak::pak(\"tidymodels/tidyclust\")\n```\n\n## Example\n\nThe first thing you do is to create a `cluster specification`. For this example we are creating a K-means model, using the `stats` engine.\n\n```{r}\nlibrary(tidyclust)\nset.seed(1234)\n\nkmeans_spec \u003c- k_means(num_clusters = 3) %\u003e%\n  set_engine(\"stats\")\n\nkmeans_spec\n```\n\nThis specification can then be fit using data.\n\n```{r}\nkmeans_spec_fit \u003c- kmeans_spec %\u003e%\n  fit(~., data = mtcars)\nkmeans_spec_fit\n```\n\nOnce you have a fitted tidyclust object, you can do a number of things. `predict()` returns the cluster a new observation belongs to\n\n```{r}\npredict(kmeans_spec_fit, mtcars[1:4, ])\n```\n\n`extract_cluster_assignment()` returns the cluster assignments of the training observations\n\n```{r}\nextract_cluster_assignment(kmeans_spec_fit)\n```\n\nand `extract_centroids()` returns the locations of the clusters\n\n```{r}\nextract_centroids(kmeans_spec_fit)\n```\n\n## Visual comparison of clustering methods\n\nBelow is a visualization of the available models and how they compare using 2 dimensional toy data sets.\n\n```{r comparison, echo=FALSE, message=FALSE, fig.asp=1/2, dpi=105, dev=\"svglite\"}\n#| fig-alt: \"Mock comparison for different clustering methods for different data sets. Each row correspods to a clustering method, each column corresponds to a data set type.\"\nlibrary(tidymodels)\nlibrary(tidyclust)\nset.seed(1234)\n\nmake_circles \u003c- function(n) {\n  x \u003c- seq(0, pi * 2, length.out = n)\n  x \u003c- cos(x) * c(0.5, 1)\n  x \u003c- x + rnorm(n, sd = 0.05)\n\n  y \u003c- seq(0, pi * 2, length.out = n)\n  y \u003c- sin(y) * c(0.5, 1)\n  y \u003c- y + rnorm(n, sd = 0.05)\n\n  out \u003c- data.frame(x, y)\n  attr(out, \"name\") \u003c- \"circles\"\n  out\n}\n\nmake_halves \u003c- function(n) {\n  x \u003c- seq(0, pi * 2, length.out = n)\n  x \u003c- cos(x) + rep(c(0, 1), each = n / 2)\n  x \u003c- x + rnorm(n, sd = 0.05)\n\n  y \u003c- seq(0, pi * 2, length.out = n)\n  y \u003c- sin(y) + rep(c(0, 0.5), each = n / 2)\n  y \u003c- y + rnorm(n, sd = 0.05)\n  y \u003c- y - 0.25\n  y \u003c- y * (1 / 0.75)\n\n  out \u003c- data.frame(x, y)\n  attr(out, \"name\") \u003c- \"halves\"\n  out\n}\n\nmake_uniform \u003c- function(n) {\n  x \u003c- runif(n, min = -1, max = 1)\n  y \u003c- runif(n, min = -1, max = 1)\n\n  out \u003c- data.frame(x, y)\n  attr(out, \"name\") \u003c- \"uniform\"\n  out\n}\n\nmake_blobs \u003c- function(n) {\n  x \u003c- rep(c(1, 2, 3), length.out = n)\n  x \u003c- x + rnorm(n, sd = 0.5)\n  x \u003c- (x - min(x)) / (max(x) - min(x)) * 2 - 1\n\n  y \u003c- rep(c(1, 4, 2), length.out = n)\n  y \u003c- y + rnorm(n, sd = 0.5)\n  y \u003c- (y - min(y)) / (max(y) - min(y)) * 2 - 1\n\n  out \u003c- data.frame(x, y)\n  attr(out, \"name\") \u003c- \"blobs\"\n  out\n}\n\naugment_model \u003c- function(model, data) {\n  model |\u003e\n    fit(~., data = data) |\u003e\n    augment(new_data = data) |\u003e\n    mutate(\n      data_name = attr(data, \"name\"),\n      model_name = class(model)[1]\n    )\n}\n\ncircle_data \u003c- make_circles(500)\nhalves_data \u003c- make_halves(500)\nuniform_data \u003c- make_uniform(500)\nblobs_data \u003c- make_blobs(500)\n\ncolors \u003c- c(\"#E49E68\", \"#6899E4\", \"#E068E4\")\n\nexpand_grid(\n  models = list(\n    k_means(num_clusters = 3),\n    hier_clust(num_clusters = 3)\n  ),\n  datasets = list(\n    circle_data,\n    halves_data,\n    blobs_data,\n    uniform_data\n  )\n) |\u003e\n  pmap_dfr(~ augment_model(.x, .y)) |\u003e\n  ggplot(aes(x, y, color = .pred_cluster)) +\n  geom_point(alpha = 0.5) +\n  facet_grid(model_name ~ data_name, scales = \"free\", switch = \"y\") +\n  scale_color_manual(values = colors) +\n  theme_void() +\n  theme(\n    strip.text.x = element_blank(),\n    strip.text.y = element_text(size = 15)\n  ) +\n  guides(color = \"none\")\n```\n\n## Contributing\n\nThis project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.\n\n- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15\u0026tags=tidymodels,question).\n\n- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/tidyclust/issues).\n\n- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.\n\n- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).\nFooter\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Ftidyclust","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidymodels%2Ftidyclust","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidymodels%2Ftidyclust/lists"}