https://github.com/corybrunson/tdarec
recipes extension for persistent homology and vectorizations thereof
https://github.com/corybrunson/tdarec
machine-learning persistent-homology recipes tidymodels topological-data-analysis vectorization
Last synced: 4 months ago
JSON representation
recipes extension for persistent homology and vectorizations thereof
- Host: GitHub
- URL: https://github.com/corybrunson/tdarec
- Owner: corybrunson
- License: gpl-3.0
- Created: 2024-10-07T20:20:14.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-03T20:00:31.000Z (4 months ago)
- Last Synced: 2025-02-03T21:19:22.950Z (4 months ago)
- Topics: machine-learning, persistent-homology, recipes, tidymodels, topological-data-analysis, vectorization
- Language: R
- Homepage:
- Size: 721 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# tdarec
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
The goal of {tdarec} is to provide additional preprocessing steps to [{recipes}](https://github.com/tidymodels/recipes) to compute persistent homology (PH) and calculate vectorizations of persistence data (diagrams; PDs).
The current prototype provides one engine to compute PH:
* Vietoris--Rips filtrations of point clouds using [{ripserr}](https://github.com/tdaverse/ripserr)
and one engine to vectorize PDs:
* Euler characteristic curves using [{TDAvec}](https://github.com/uislambekov/TDAvec).
The goal is to provide all PH and PD vectorization engines published on CRAN.
## Installation
You can install the development version of tdarec from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("corybrunson/tdarec")
```## Example
This example uses existing engines in a full Tidyverse workflow to optimize a simple classification model for point clouds sampled from different embeddings of the Klein bottle:
```{r example}
# prepare a Tidymodels session and attach {tdarec}
library(tidyverse)
library(tidymodels)
library(tdarec)# generate samples from two embeddings
set.seed(20024L)
tibble(embedding = sample(c("flat", "tube"), size = 48, replace = TRUE)) |>
mutate(sample = lapply(embedding, function(emb) {
switch(
emb,
flat = tdaunif::sample_klein_flat(60, sd = .5),
tube = tdaunif::sample_klein_tube(60, sd = .5)
)
})) |>
# mutate(embedding = factor(embedding)) |>
print() -> klein_data# partition the data
klein_split <- initial_split(klein_data, prop = .8)
klein_train <- training(klein_split)
klein_test <- testing(klein_split)
klein_folds <- vfold_cv(klein_train, v = 3L)# specify a pre-processing recipe
scale_seq <- seq(0, 3, by = .05)
recipe(embedding ~ sample, data = klein_train) |>
step_phom_point_cloud(
sample, max_hom_degree = tune("vr_degree"),
keep_original_cols = FALSE
) |>
step_vpd_ecc(
sample_phom, xseq = scale_seq,
keep_original_cols = FALSE
) |>
print() -> klein_rec# specify a classification model
logistic_reg(penalty = tune(), mixture = 1) |>
set_mode("classification") |>
set_engine("glmnet") |>
print() -> klein_lm# generate a hyperparameter tuning grid
klein_rec_grid <- grid_regular(
extract_parameter_set_dials(klein_rec), levels = 3,
filter = c(vr_degree > 0)
)
klein_lm_grid <- grid_regular(
extract_parameter_set_dials(klein_lm), levels = 5
)
klein_grid <- merge(klein_rec_grid, klein_lm_grid)# optimize the model performance
klein_res <- tune_grid(
klein_lm,
preprocessor = klein_rec,
resamples = klein_folds,
grid = klein_grid,
metrics = metric_set(roc_auc, pr_auc)
)
klein_res |>
collect_metrics()
klein_res |>
select_best(metric = "roc_auc")
```## Code of Conduct
Please note that the tdarec project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.