{"id":13704327,"url":"https://modeloriented.github.io/DALEXtra/","last_synced_at":"2025-05-05T09:33:41.417Z","repository":{"id":35122939,"uuid":"196374651","full_name":"ModelOriented/DALEXtra","owner":"ModelOriented","description":"Extensions for the DALEX package","archived":false,"fork":false,"pushed_at":"2023-05-25T22:53:24.000Z","size":32827,"stargazers_count":62,"open_issues_count":3,"forks_count":9,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-05-23T03:18:12.331Z","etag":null,"topics":["extension-for-dalex-package"],"latest_commit_sha":null,"homepage":"https://ModelOriented.github.io/DALEXtra/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ModelOriented.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-07-11T10:38:29.000Z","updated_at":"2024-02-22T13:18:32.000Z","dependencies_parsed_at":"2024-01-14T20:49:16.864Z","dependency_job_id":"d7ad892b-f3ae-483b-9725-11871709c995","html_url":"https://github.com/ModelOriented/DALEXtra","commit_stats":{"total_commits":186,"total_committers":9,"mean_commits":"20.666666666666668","dds":"0.23655913978494625","last_synced_commit":"a8baf5791b8d9565ca857c670c3678c568c8b3d0"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FDALEXtra","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FDALEXtra/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FDALEXtra/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ModelOriented%2FDALEXtra/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ModelOriented","download_url":"https://codeload.github.com/ModelOriented/DALEXtra/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252471724,"owners_count":21753239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extension-for-dalex-package"],"created_at":"2024-08-02T21:01:07.557Z","updated_at":"2025-05-05T09:33:40.351Z","avatar_url":"https://github.com/ModelOriented.png","language":"R","funding_links":[],"categories":["Tools"],"sub_categories":["Interpretability/Explicability"],"readme":"\n# DALEXtra \u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"150\"/\u003e\n\n[![R build\nstatus](https://github.com/maksymiuks/DALEXtra/workflows/R-CMD-check/badge.svg)](https://github.com/maksymiuks/DALEXtra/actions?query=workflow%3AR-CMD-check)\n[![Coverage\nStatus](https://img.shields.io/codecov/c/github/ModelOriented/DALEXtra/master.svg)](https://codecov.io/github/ModelOriented/DALEXtra?branch=master)\n[![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version/DALEXtra)](https://cran.r-project.org/package=DALEXtra)\n[![Total\nDownloads](http://cranlogs.r-pkg.org/badges/grand-total/DALEXtra?color=orange)](http://cranlogs.r-pkg.org/badges/grand-total/DALEXtra)\n[![DrWhy-eXtrAI](https://img.shields.io/badge/DrWhy-BackBone-373589)](http://drwhy.ai/#BackBone)\n\n# Overview\n\nThe `DALEXtra` package is an extension pack for\n[DALEX](https://modeloriented.github.io/DALEX) package. It contains\nvarious tools for XAI (eXplainable Artificial Intelligence) that can\nhelp us inspect and improve our model. Functionalities of the `DALEXtra`\ncould be divided into two areas.\n\n  - Champion-Challenger analysis\n      - Lets us compare two or more Machine-Learning models, determinate\n        which one is better and improve both of them.\n      - Funnel Plot of performance measures as an innovative approach to\n        measure comparison.\n      - Automatic HTML report.\n  - Cross language comparison\n      - Creating explainers for models created in different languges so\n        they can be explained using R tools like\n        [DrWhy.AI](https://github.com/ModelOriented/DrWhy) family.\n      - Currently supported are **Python** *scikit-learn* and *keras*,\n        **Java** *h2o*, **R** *xgboost*, *mlr*, *mlr3* and *tidymodels*.\n\n## Installation\n\n``` r\n# Install the development version from GitHub:\n\n# it is recommended to install latest version of DALEX from GitHub\ndevtools::install_github(\"ModelOriented/DALEX\")\n# install.packages(\"devtools\")\ndevtools::install_github(\"ModelOriented/DALEXtra\")\n```\n\nor latest CRAN version\n\n``` r\ninstall.packages(\"DALEX\")\ninstall.packages(\"DALEXtra\")\n```\n\nOther packages useful with explanations.\n\n``` r\ndevtools::install_github(\"ModelOriented/ingredients\")\ndevtools::install_github(\"ModelOriented/iBreakDown\")\ndevtools::install_github(\"ModelOriented/shapper\")\ndevtools::install_github(\"ModelOriented/auditor\")\ndevtools::install_github(\"ModelOriented/modelStudio\")\n```\n\nAbove packages can be used along with `explain` object to create\nexplanations (ingredients, iBreakDown, shapper), audit our model\n(auditor) or automate the model exploration process (modelStudio).\n\n# Champion-Challenger analysis\n\nWithout any doubts, comparison of models, especially black-box ones is\na very important use case nowadays. Every day new models are being created\nand we need tools that can allow us to determinate which one is better.\nFor this purpose we present Champion-Challenger analysis. It is set of\nfunctions that creates comparisons of models and later can be gathered\nup to create one report with generic comments. Example of report can be\nfound\n[here](http://htmlpreview.github.io/?https://github.com/ModelOriented/DALEXtra/blob/master/inst/ChampionChallenger/DALEXtra_example_of_report.html).\nAs you can see any explanation that has generic `plot()` function can be\nplotted.\n\n## Funnel Plot\n\nCore of our analysis is funnel plot. It lets us find subsets of data\nwhere one of the models is significantly better than the other ones. That\nability is insanely useful, when we have models that have similiar\noverall performance and we want to know which one should we use.\n\n``` r\n library(\"mlr\")\n library(\"DALEXtra\")\n task \u003c- mlr::makeRegrTask(\n   id = \"R\",\n   data = apartments,\n   target = \"m2.price\"\n )\n learner_lm \u003c- mlr::makeLearner(\n   \"regr.lm\"\n )\n model_lm \u003c- mlr::train(learner_lm, task)\n explainer_lm \u003c- explain_mlr(model_lm, apartmentsTest, apartmentsTest$m2.price, label = \"LM\", \n                             verbose = FALSE, precalculate = FALSE)\n\n learner_rf \u003c- mlr::makeLearner(\n   \"regr.randomForest\"\n )\n model_rf \u003c- mlr::train(learner_rf, task)\n explainer_rf \u003c- explain_mlr(model_rf, apartmentsTest, apartmentsTest$m2.price, label = \"RF\",\n                             verbose = FALSE, precalculate = FALSE)\n\n plot_data \u003c- funnel_measure(explainer_lm, explainer_rf, \n                             partition_data = cbind(apartmentsTest, \n                                                    \"m2.per.room\" = apartmentsTest$surface/apartmentsTest$no.rooms),\n                             nbins = 5, measure_function = DALEX::loss_root_mean_square, show_info = FALSE)\n```\n\n``` r\nplot(plot_data)[[1]]\n```\n\n\u003cimg src=\"man/figures/unnamed-chunk-5-1.png\" style=\"display: block; margin: auto;\" /\u003e\nSuch situation is shown in the following plot. Both, `LM` and `RF`\nmodels have smiliar RMSE, but Funnel Plot shows that if we want to\npredict expensive or cheap apartments, we definetly should use `LM`\nwhile `RF` for average priced apartments. Also without any doubt `LM` is\nmuch better than `RF` for `Srodmiescie` district. Following use case\nshows us how powerful of a tool Funnel Plot can be, for example we can\ncompound two or more models into one based on areas acquired from the Plot and\nthus improve our models. One another advantage of Funnel Plot is that it\ndoesn’t require model to be fitted with Variables shown on the plot, as\nyou can see, `m2.per.room` is an artificial variable.\n\n# Cross language comparison\n\nHere we will present a short use case for our package and its\ncompatibility with Python.\n\n## How to setup Anaconda\n\nIn order to be able to use some features associated with `DALEXtra`,\nAnaconda is needed. The easiest way to get it, is visiting [Anaconda\nwebsite](https://www.anaconda.com/distribution). And choosing proper OS\nas it stands in the following picture.\n![](https://raw.githubusercontent.com/ModelOriented/DALEXtra/master/README_files/figure-gfm/anaconda1.png)\nThere is no big difference bewtween Python versions when downloading\nAnaconda. You can always create virtual environment with any version of\nPython no matter which version was downloaded first.\n\n### Windows\n\nCrucial thing is adding conda to PATH environment variable when using\nWindows. You can do it during the installation, by marking this\ncheckbox.\n\n\u003ccenter\u003e\n\n![](https://raw.githubusercontent.com/ModelOriented/DALEXtra/master/README_files/figure-gfm/anaconda2.png)\n\n\u003c/center\u003e\n\nor, if conda is already installed, follow [those\ninstructions](https://stackoverflow.com/a/44597801/9717584).\n\n### Unix\n\nWhile using unix-like OS, adding conda to PATH is not required.\n\n### Loading data\n\nFirst we need provide the data, explainer is useless without them. The thing\nis that Python object does not store training data so we always have to provide\na dataset. Feel free to use those attached to `DALEX` package or those\nstored in `DALEXtra` files.\n\n``` r\ntitanic_test \u003c- read.csv(system.file(\"extdata\", \"titanic_test.csv\", package = \"DALEXtra\"))\n```\n\nKeep in mind that dataframe includes target variable (18th column) and\nscikit-learn models cannot work with it.\n\n### Creating explainer\n\nCreating explainer from scikit-learn Python model is very simple thanks\nto `DALEXtra`. The only thing you need to provide is path to pickle and,\nif necessary, something that lets recognize Python environment. It may\nbe a .yml file with packages specification, name of existing conda\nenvironment or path to Python virtual environment. Execution of\n`scikitlearn_explain` only with .pkl file and data will cause usage of\ndefault Python.\n\n``` r\nlibrary(DALEXtra)\nexplainer \u003c- explain_scikitlearn(system.file(\"extdata\", \"scikitlearn.pkl\", package = \"DALEXtra\"),\nyml = system.file(\"extdata\", \"testing_environment.yml\", package = \"DALEXtra\"), \ndata = titanic_test[,1:17], y = titanic_test$survived, colorize = FALSE)\n```\n\n    ## Preparation of a new explainer is initiated\n    ##   -\u003e model label       :  scikitlearn_model  (  default  )\n    ##   -\u003e data              :  524  rows  17  cols \n    ##   -\u003e target variable   :  524  values \n    ##   -\u003e predict function  :  yhat.scikitlearn_model  will be used (  default  )\n    ##   -\u003e predicted values  :  numerical, min =  0.02086126 , mean =  0.288584 , max =  0.9119996  \n    ##   -\u003e model_info        :  package reticulate , ver. 1.16 , task classification (  default  ) \n    ##   -\u003e residual function :  difference between y and yhat (  default  )\n    ##   -\u003e residuals         :  numerical, min =  -0.8669431 , mean =  0.02248468 , max =  0.9791387  \n    ##   A new explainer has been created!\n\nNow with explainer ready we can use any of\n[DrWhy.Ai](https://github.com/ModelOriented/DrWhy/blob/master/README.md)\nuniverse tools to make explanations. Here is a small demo.\n\n### Creating explanations\n\n``` r\nlibrary(DALEX)\nplot(model_performance(explainer))\n```\n\n\u003cimg src=\"man/figures/unnamed-chunk-8-1.png\" style=\"display: block; margin: auto;\" /\u003e\n\n``` r\nlibrary(ingredients)\nplot(feature_importance(explainer))\n```\n\n\u003cimg src=\"man/figures/unnamed-chunk-8-2.png\" style=\"display: block; margin: auto;\" /\u003e\n\n``` r\ndescribe(feature_importance(explainer))\n```\n\n    ## The number of important variables for scikitlearn_model's prediction is 3 out of 17. \n    ##  Variables gender.female, gender.male, age have the highest importantance.\n\n``` r\nlibrary(iBreakDown)\nplot(break_down(explainer, titanic_test[2, 1:17]))\n```\n\n\u003cimg src=\"man/figures/unnamed-chunk-8-3.png\" style=\"display: block; margin: auto;\" /\u003e\n\n``` r\ndescribe(break_down(explainer, titanic_test[2, 1:17]))\n```\n\n    ## Scikitlearn_model predicts, that the prediction for the selected instance is 0.132 which is lower than the average model prediction.\n    ## \n    ## The most important variable that decrease the prediction is class.3rd.\n    ## \n    ## Other variables are with less importance. The contribution of all other variables is -0.108.\n\n``` r\nlibrary(auditor)\neval \u003c- model_evaluation(explainer)\nplot_roc(eval)\n```\n\n\u003cimg src=\"man/figures/unnamed-chunk-8-4.png\" style=\"display: block; margin: auto;\" /\u003e\n\n``` r\n# Predictions with newdata\npredict(explainer, titanic_test[1:10, 1:17])\n```\n\n    ##  [1] 0.3565896 0.1321947 0.7638813 0.1037486 0.1265221 0.2949228 0.1421281\n    ##  [8] 0.1421281 0.4154695 0.1321947\n\n# Acknowledgments\n\nWork on this package was financially supported by the `NCN Opus\ngrant 2016/21/B/ST6/02176`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/modeloriented.github.io%2FDALEXtra%2F","html_url":"https://awesome.ecosyste.ms/projects/modeloriented.github.io%2FDALEXtra%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/modeloriented.github.io%2FDALEXtra%2F/lists"}