{"id":32203903,"url":"https://github.com/massimoaria/e2tree","last_synced_at":"2026-04-01T21:53:26.057Z","repository":{"id":61364190,"uuid":"550971311","full_name":"massimoaria/e2tree","owner":"massimoaria","description":"Explainable Ensemble Trees","archived":false,"fork":false,"pushed_at":"2026-03-21T14:32:37.000Z","size":9980,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-03-22T03:53:22.866Z","etag":null,"topics":["explainable-machine-learning"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/massimoaria.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-10-13T16:14:46.000Z","updated_at":"2026-03-21T14:32:41.000Z","dependencies_parsed_at":"2024-04-16T13:44:52.047Z","dependency_job_id":"e560797a-a614-48ff-b97b-1e9394fae673","html_url":"https://github.com/massimoaria/e2tree","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/massimoaria/e2tree","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massimoaria%2Fe2tree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massimoaria%2Fe2tree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massimoaria%2Fe2tree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massimoaria%2Fe2tree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/massimoaria","download_url":"https://codeload.github.com/massimoaria/e2tree/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/massimoaria%2Fe2tree/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31292607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T21:15:39.731Z","status":"ssl_error","status_checked_at":"2026-04-01T21:15:34.046Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["explainable-machine-learning"],"created_at":"2025-10-22T04:47:43.689Z","updated_at":"2026-04-01T21:53:26.049Z","avatar_url":"https://github.com/massimoaria.png","language":"R","readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n# Explainable Ensemble Trees (e2tree)\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/massimoaria/e2tree/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/massimoaria/e2tree/actions/workflows/R-CMD-check.yaml)\n[![CRAN status](https://www.r-pkg.org/badges/version/e2tree)](https://CRAN.R-project.org/package=e2tree) `r badger::badge_cran_download(\"e2tree\", \"grand-total\")`\n\n\u003c!-- badges: end --\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"man/figures/e2tree_logo.png\" width=\"400\"  /\u003e\n\u003c/p\u003e\n\n\nThe **Explainable Ensemble Trees** (**e2tree**) key idea consists of the definition of an algorithm to represent every ensemble approach based on decision trees model using a single tree-like structure. The goal is to explain the results from the ensemble algorithm while preserving its level of accuracy, which always outperforms those provided by a decision tree. The proposed method is based on identifying the relationship tree-like structure explaining the classification or regression paths summarizing the whole ensemble process. There are two main advantages of e2tree:\n- building an explainable tree that ensures the predictive performance of an RF model - allowing the decision-maker to manage with an intuitive structure (such as a tree-like structure).\n\nIn this example, we focus on Random Forest but, again, the algorithm can be generalized to every ensemble approach based on decision trees.\n\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\",\n  dpi = 300\n)\n```\n\n## Setup\n\nYou can install the **developer version** of e2tree from [GitHub](https://github.com) with:\n\n```{r eval=FALSE}\ninstall.packages(\"remotes\")\nremotes::install_github(\"massimoaria/e2tree\")\n```\n\nYou can install the **released version** of e2tree from [CRAN](https://CRAN.R-project.org) with:\n\n```{r eval=FALSE}\nif (!require(\"e2tree\", quietly=TRUE)) install.packages(\"e2tree\")\n```\n\n```{r warning=FALSE, message=FALSE}\nrequire(e2tree)\nrequire(randomForest)\nrequire(ranger)\nrequire(dplyr)\nrequire(ggplot2)\nif (!(require(rsample, quietly=TRUE))){install.packages(\"rsample\"); require(rsample, quietly=TRUE)}\noptions(dplyr.summarise.inform = FALSE)\n```\n\n```{r set-theme, include=FALSE}\ntheme_set(\n  theme_classic() +\n    theme(\n      plot.background = element_rect(fill = \"transparent\", colour = NA),\n      panel.background = element_rect(fill = \"transparent\", colour = NA)\n    )\n)\nknitr::opts_chunk$set(dev.args = list(bg = \"transparent\"))\n```\n\n\n## S3 Classes and Methods\n\nThe **e2tree** package uses a proper S3 class system. The main classes and their associated methods are:\n\n| Class | Methods |\n|-------|---------|\n| `e2tree` | `print`, `summary`, `plot`, `predict`, `fitted`, `residuals`, `as.rpart`, `nodes`, `e2splits` |\n| `eValidation` | `print`, `summary`, `plot`, `measures`, `proximity` |\n| `loi` | `print`, `summary`, `plot` |\n| `loi_perm` | `print`, `summary`, `plot` |\n\nE2Tree objects can also be converted to other formats for interoperability:\n\n- `as.rpart()` converts to `rpart` format for use with `rpart.plot`\n- `as.party()` converts to `partykit`'s `constparty` format (if partykit is installed)\n\n\n## Example 1: IRIS dataset (Classification)\n\nStarting from the IRIS dataset, we train an ensemble tree using the randomForest package and then use e2tree to obtain an explainable tree synthesis of the ensemble classifier.\n\n```{r}\n# Set random seed to make results reproducible:\nset.seed(0)\n\n# Initialize the split\niris_split \u003c- iris %\u003e% initial_split(prop = 0.6)\niris_split\n# Assign the data to the correct sets\ntraining \u003c- iris_split %\u003e% training()\nvalidation \u003c- iris_split %\u003e% testing()\nresponse_training \u003c- training[,5]\nresponse_validation \u003c- validation[,5]\n\n```\n\n\nTrain a Random Forest model with 1000 weak learners\n\n```{r}\n# Perform training with \"ranger\" or \"randomForest\" package:\n## RF with \"ranger\" package\nensemble \u003c- ranger(Species ~ ., data = training, num.trees = 1000, importance = 'impurity')\n\n## RF with \"randomForest\" package\n#ensemble = randomForest(Species ~ ., data = training, importance = TRUE, proximity = TRUE)\n```\n\nCreate the dissimilarity matrix between observations:\n\n```{r}\nD = createDisMatrix(ensemble, data = training, label = \"Species\", parallel = list(active = FALSE, no_cores = NULL))\n```\n\nBuild an explainable tree for RF:\n\n```{r}\nsetting=list(impTotal=0.1, maxDec=0.01, n=2, level=5)\ntree \u003c- e2tree(Species ~ ., data = training, D, ensemble, setting)\n```\n\n\n### S3 methods for e2tree objects\n\nThe `e2tree` class supports standard S3 methods for inspecting the fitted model:\n\n**Print** --- compact model overview:\n\n```{r}\nprint(tree)\n```\n\n**Summary** --- full model details including terminal nodes and decision rules:\n\n```{r}\nsummary(tree)\n```\n\n**Plot** --- tree visualization via `rpart.plot`:\n\n```{r}\nplot(tree, ensemble)\n```\n\n### Accessor functions\n\nAccessor functions provide a clean interface to extract components without exposing the internal structure:\n\n```{r}\n# Extract terminal nodes\nnodes(tree, terminal = TRUE)\n\n# Extract split information\nstr(e2splits(tree), max.level = 1)\n```\n\n### Coercion to other formats\n\nE2Tree objects can be converted to standard tree formats for use with other packages:\n\n```{r}\n# Convert to rpart format\nrpart_obj \u003c- as.rpart(tree, ensemble)\n\n# Convert to partykit format (if installed)\nif (requireNamespace(\"partykit\", quietly = TRUE)) {\n  party_obj \u003c- partykit::as.party(tree)\n  plot(party_obj)\n}\n```\n\n\n### Prediction\n\nUse the standard `predict()` method for prediction on new data:\n\n```{r}\n# Predict on validation set\npred \u003c- predict(tree, newdata = validation, target = \"virginica\")\nhead(pred)\n```\n\nComparison of predictions (training sample) of RF and e2tree\n\n```{r}\n# Training predictions\npred_train \u003c- predict(tree, newdata = training, target = \"virginica\")\n\n# \"ranger\" package\ntable(pred_train$fit, ensemble$predictions)\n\n# \"randomForest\" package\n#table(pred_train$fit, ensemble$predicted)\n```\n\nComparison of predictions (training sample) of RF and correct response\n\n```{r}\n# \"ranger\" package\ntable(ensemble$predictions, response_training)\n\n## \"randomForest\" package\n#table(ensemble$predicted, response_training)\n```\n\nComparison of predictions (training sample) of e2tree and correct response\n\n```{r}\ntable(pred_train$fit, response_training)\n```\n\nFitted values for the training data:\n\n```{r}\nhead(fitted(tree))\n```\n\n\n### Variable importance\n\nVariable importance is automatically detected as classification or regression:\n\n```{r}\nV \u003c- vimp(tree, training)\nV$vimp\nV$g_imp\nV$g_acc\n```\n\n\n### Prediction on validation sample\n\n```{r}\nensemble.pred \u003c- predict(ensemble, validation[,-5])\n\npred_val \u003c- predict(tree, newdata = validation, target = \"virginica\")\n```\n\nComparison of predictions (validation sample) of RF and e2tree\n\n```{r}\n## \"ranger\" package\ntable(pred_val$fit, ensemble.pred$predictions)\n\n## \"randomForest\" package\n#table(pred_val$fit, ensemble.pred$predicted)\n```\n\n\nComparison of predictions (validation sample) of e2tree and correct response\n\n```{r}\ntable(pred_val$fit, response_validation)\nroc_res \u003c- roc(response_validation, pred_val$score, target=\"virginica\")\nroc_res$auc\n```\n\n\n## Validation of the E2Tree Structure\n\nA critical question when using E2Tree is: *how well does the single tree capture the structure of the original ensemble?*\n\nAssessing the fidelity of this reconstruction requires measuring **agreement** between the ensemble and E2Tree proximity matrices --- a fundamentally different question from measuring their **association**. The distinction parallels the classical one between *correlation* and *concordance* in method comparison studies (Bland \u0026 Altman, 1986; Lin, 1989): two proximity matrices can be perfectly correlated yet systematically disagree in their actual values. The Mantel test, being scale-invariant, would declare perfect association in such a case. But for E2Tree validation, we need to know whether the *actual proximity values* are faithfully reproduced.\n\nThe `eValidation()` function supports two approaches via the `test` argument:\n\n- `test = \"mantel\"`: The classical Mantel test for *association*\n- `test = \"measures\"`: A family of divergence/similarity measures for *agreement*\n- `test = \"both\"` (default): Both approaches\n\n### Divergence and similarity measures\n\n| Measure | Type | Range | What it measures |\n|---------|------|-------|------------------|\n| **nLoI** | divergence | [0, 1] | Normalized Loss of Interpretability --- weighted divergence with diagnostic decomposition |\n| **Hellinger** | divergence | [0, 1] | Hellinger distance --- robust to sparse matrices |\n| **wRMSE** | divergence | [0, 1] | Weighted RMSE --- emphasizes high-proximity regions |\n| **RV** | similarity | [0, 1] | RV coefficient --- global structural similarity (scale-invariant) |\n| **SSIM** | similarity | [-1, 1] | Structural Similarity Index --- captures local block patterns |\n\nAll measures are tested simultaneously using a **unified row/column permutation test**.\n\n\n### Running the validation\n\n```{r}\nval \u003c- eValidation(training, tree, D, test = \"both\", graph = FALSE, n_perm = 999, seed = 42)\n```\n\n**Print** --- compact results with Mantel test and all measures:\n\n```{r}\nprint(val)\n```\n\n**Summary** --- includes the LoI diagnostic decomposition:\n\n```{r}\nsummary(val)\n```\n\n**Plot** --- heatmaps, null distribution, and LoI decomposition:\n\n```{r fig.width=10, fig.height=8}\nplot(val)\n```\n\n### Extracting results with accessors\n\nUse accessor functions instead of direct `$` access:\n\n```{r}\n# Validation measures table\nmeasures(val)\n\n# Proximity matrices\nprox \u003c- proximity(val, type = \"both\")\nstr(prox, max.level = 1)\n```\n\n\n### The nLoI Decomposition\n\nThe nLoI is unique among the measures because it decomposes into two interpretable components:\n\n- **LoI_in** (within-node): measures how well the E2Tree reproduces the ensemble's proximity values for pairs it groups *together*.\n\n- **LoI_out** (between-node): measures the ensemble proximity lost for pairs that E2Tree *separates* into different nodes.\n\nSince the number of within-node and between-node pairs can differ dramatically, the `loi()` function reports **per-pair averages** (`mean_in` and `mean_out`) that enable meaningful comparison:\n\n```{r}\nO \u003c- proximity(val, type = \"ensemble\")\nO_hat \u003c- proximity(val, type = \"e2tree\")\n\nresult \u003c- loi(O, O_hat)\nsummary(result)\n```\n\nThe per-pair averages provide actionable diagnostics:\n\n- **mean_out \u003e 0.3**: the tree is splitting apart pairs with substantial ensemble proximity --- consider more terminal nodes\n- **mean_out \u003c 0.1**: the partition correctly separates low-proximity pairs --- tree structure is well-placed\n- **mean_in \u003e 0.1**: within-node calibration error is high --- check proximity estimation\n- **mean_in \u003c 0.01**: excellent within-node match between E2Tree and ensemble\n\n\n### Standalone LoI permutation test\n\nFor a quick significance assessment:\n\n```{r}\nperm \u003c- loi_perm(O, O_hat, n_perm = 999, seed = 42)\nprint(perm)\n```\n\n```{r fig.width=10, fig.height=5}\nplot(perm)\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmassimoaria%2Fe2tree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmassimoaria%2Fe2tree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmassimoaria%2Fe2tree/lists"}