https://github.com/tidymodels/filtro
Tidy tools to apply filter-based supervised feature selection methods
https://github.com/tidymodels/filtro
Last synced: 10 months ago
JSON representation
Tidy tools to apply filter-based supervised feature selection methods
- Host: GitHub
- URL: https://github.com/tidymodels/filtro
- Owner: tidymodels
- License: other
- Created: 2025-06-12T20:39:50.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-08-15T21:59:03.000Z (10 months ago)
- Last Synced: 2025-08-15T23:35:49.303Z (10 months ago)
- Language: R
- Homepage: https://filtro.tidymodels.org/dev/
- Size: 1.92 MB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 22
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# filtro
[](https://github.com/tidymodels/filtro/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/tidymodels/filtro)
[](https://CRAN.R-project.org/package=filtro)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
> ⚠️ **filtro is under active development; breaking changes may occur.**
## Overview
filtro is tidy tools to apply filter-based supervised feature
selection methods. These methods score and rank feature relevance
using metrics such as p-values, correlation, feature importance, information gain,
and more.
The package provides functions to rank and select a top proportion or number
of features using built-in methods and the
[desirability2](https://desirability2.tidymodels.org) package, and
supports streamlined preprocessing, either standalone or within tidymodels
workflows such as the [recipes](https://recipes.tidymodels.org) package.
For a detailed introduction, please see [vignette("filtro")](https://filtro.tidymodels.org/dev/articles/filtro.html).
## Installation
Install the released version of filtro from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("filtro")
```
Install the development version from GitHub with:
``` r
# install.packages("pak")
pak::pak("tidymodels/filtro")
```
## Feature selection methods
Currently, the implemented filters include:
1. ANOVA F-test
2. Correlation
3. Random forest feature importance
4. Information gain
5. Area under the ROC curve
6. Cross tabulation (Chi-squared test and Fisher's exact test)
## Scoring examples
```{r}
#| label: start
#| include: false
library(filtro)
library(desirability2)
library(dplyr)
library(modeldata)
```
```{r}
library(filtro)
library(desirability2)
library(dplyr)
library(modeldata)
```
```{r}
ames_subset <- modeldata::ames |>
# Use a subset of data for demonstration
dplyr::select(
Sale_Price,
MS_SubClass,
MS_Zoning,
Lot_Frontage,
Lot_Area,
Street
)
ames_subset <- ames_subset |>
dplyr::mutate(Sale_Price = log10(Sale_Price))
```
```{r}
# ANOVA p-value
ames_aov_pval_res <-
score_aov_pval |>
fit(Sale_Price ~ ., data = ames_subset)
ames_aov_pval_res@results
```
```{r}
# Pearson correlation
ames_cor_pearson_res <-
score_cor_pearson |>
fit(Sale_Price ~ ., data = ames_subset)
ames_cor_pearson_res@results
```
```{r}
# Forest importance
ames_imp_rf_reg_res <-
score_imp_rf |>
fit(Sale_Price ~ ., data = ames_subset, seed = 42)
ames_imp_rf_reg_res@results
```
```{r}
# Information gain
ames_info_gain_reg_res <-
score_info_gain |>
fit(Sale_Price ~ ., data = ames_subset)
ames_info_gain_reg_res@results
```
## Filtering exmples for score *singular*
```{r}
ames_aov_pval_res@results
```
```{r}
# Show best score, based on proportion of predictors
ames_aov_pval_res |> show_best_score_prop(prop_terms = 0.2)
```
```{r}
# Fill safe value, then show best score
ames_aov_pval_res <- ames_aov_pval_res |> fill_safe_value()
ames_aov_pval_res |> show_best_score_prop(prop_terms = 0.2)
```
## Filtering examples for scores *plural*
```{r}
# Create a list
class_score_list <- list(
ames_cor_pearson_res,
ames_imp_rf_reg_res,
ames_info_gain_reg_res
)
```
```{r}
# Fill safe values
ames_scores_results <- class_score_list |>
fill_safe_values() |>
# Remove outcome
dplyr::select(-outcome)
ames_scores_results
```
```{r}
# Single and multi-parameter optimization using desirability functions
# Optimize correlation alone
ames_scores_results |>
show_best_desirability_prop(
maximize(cor_pearson, low = 0, high = 1)
)
# Optimize correlation and forest importance
ames_scores_results |>
show_best_desirability_prop(
maximize(cor_pearson, low = 0, high = 1),
maximize(imp_rf)
)
# Optimize correlation, forest importance and information gain
ames_scores_results |>
show_best_desirability_prop(
maximize(cor_pearson, low = 0, high = 1),
maximize(imp_rf),
maximize(infogain)
)
# Same as above, but retain only a proportion of predictors
ames_scores_results |>
show_best_desirability_prop(
maximize(cor_pearson, low = 0, high = 1),
maximize(imp_rf),
maximize(infogain),
prop_terms = 0.2
)
# Optimize toward a target
ames_scores_results |>
show_best_desirability_prop(
target(cor_pearson, low = 0.2, target = 0.255, high = 0.9)
)
# Optimize with box constraints
ames_scores_results |>
show_best_desirability_prop(
constrain(cor_pearson, low = 0.2, high = 1)
)
```
## Contributing
Please note that the filtro project is released with a [Contributor Code of Conduct](https://filtro.tidymodels.org/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/filtro/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).