https://github.com/corybrunson/tdarec

recipes extension for persistent homology and vectorizations thereof
https://github.com/corybrunson/tdarec

machine-learning persistent-homology recipes tidymodels topological-data-analysis vectorization

Last synced: 4 months ago
JSON representation

recipes extension for persistent homology and vectorizations thereof

Host: GitHub
URL: https://github.com/corybrunson/tdarec
Owner: corybrunson
License: gpl-3.0
Created: 2024-10-07T20:20:14.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-02-03T20:00:31.000Z (4 months ago)
Last Synced: 2025-02-03T21:19:22.950Z (4 months ago)
Topics: machine-learning, persistent-homology, recipes, tidymodels, topological-data-analysis, vectorization
Language: R
Homepage:
Size: 721 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# tdarec

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

The goal of {tdarec} is to provide additional preprocessing steps to [{recipes}](https://github.com/tidymodels/recipes) to compute persistent homology (PH) and calculate vectorizations of persistence data (diagrams; PDs).

The current prototype provides one engine to compute PH:

* Vietoris--Rips filtrations of point clouds using [{ripserr}](https://github.com/tdaverse/ripserr)

and one engine to vectorize PDs:

* Euler characteristic curves using [{TDAvec}](https://github.com/uislambekov/TDAvec).

The goal is to provide all PH and PD vectorization engines published on CRAN.

## Installation

You can install the development version of tdarec from [GitHub](https://github.com/) with:

``` r

# install.packages("pak")

pak::pak("corybrunson/tdarec")

```

## Example

This example uses existing engines in a full Tidyverse workflow to optimize a simple classification model for point clouds sampled from different embeddings of the Klein bottle:

```{r example}

# prepare a Tidymodels session and attach {tdarec}

library(tidyverse)

library(tidymodels)

library(tdarec)

# generate samples from two embeddings

set.seed(20024L)

tibble(embedding = sample(c("flat", "tube"), size = 48, replace = TRUE)) |> 

  mutate(sample = lapply(embedding, function(emb) {

    switch(

      emb,

      flat = tdaunif::sample_klein_flat(60, sd = .5),

      tube = tdaunif::sample_klein_tube(60, sd = .5)

    )

  })) |> 

  # mutate(embedding = factor(embedding)) |> 

  print() -> klein_data

# partition the data

klein_split <- initial_split(klein_data, prop = .8)

klein_train <- training(klein_split)

klein_test <- testing(klein_split)

klein_folds <- vfold_cv(klein_train, v = 3L)

# specify a pre-processing recipe

scale_seq <- seq(0, 3, by = .05)

recipe(embedding ~ sample, data = klein_train) |> 

  step_phom_point_cloud(

    sample, max_hom_degree = tune("vr_degree"),

    keep_original_cols = FALSE

  ) |> 

  step_vpd_ecc(

    sample_phom, xseq = scale_seq,

    keep_original_cols = FALSE

  ) |> 

  print() -> klein_rec

# specify a classification model

logistic_reg(penalty = tune(), mixture = 1) |> 

  set_mode("classification") |> 

  set_engine("glmnet") |> 

  print() -> klein_lm

# generate a hyperparameter tuning grid

klein_rec_grid <- grid_regular(

  extract_parameter_set_dials(klein_rec), levels = 3,

  filter = c(vr_degree > 0)

)

klein_lm_grid <- grid_regular(

  extract_parameter_set_dials(klein_lm), levels = 5

)

klein_grid <- merge(klein_rec_grid, klein_lm_grid)

# optimize the model performance

klein_res <- tune_grid(

  klein_lm,

  preprocessor = klein_rec,

  resamples = klein_folds,

  grid = klein_grid,

  metrics = metric_set(roc_auc, pr_auc)

)

klein_res |> 

  collect_metrics()

klein_res |> 

  select_best(metric = "roc_auc")

```

## Code of Conduct

Please note that the tdarec project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/corybrunson/tdarec

Awesome Lists containing this project

README