{"id":15657556,"url":"https://github.com/csinva/disentangled-attribution-curves","last_synced_at":"2025-05-05T15:51:32.341Z","repository":{"id":89624932,"uuid":"146961274","full_name":"csinva/disentangled-attribution-curves","owner":"csinva","description":"Using / reproducing DAC from the paper \"Disentangled Attribution Curves for Interpreting Random Forests and  Boosted Trees\"","archived":false,"fork":false,"pushed_at":"2021-02-11T21:17:52.000Z","size":4850,"stargazers_count":27,"open_issues_count":1,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-30T22:11:13.435Z","etag":null,"topics":["ai","artificial-intelligence","boosting","ensemble-model","explainable-ai","feature-engineering","feature-importance","interpretability","machine-learning","ml","python","random-forest","random-forests","scikit-learn","statistics","stats"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1905.07631","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/csinva.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-01T02:41:23.000Z","updated_at":"2024-11-06T14:36:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"23218c1f-98ad-4e2e-b063-66716a62b7f1","html_url":"https://github.com/csinva/disentangled-attribution-curves","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fdisentangled-attribution-curves","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fdisentangled-attribution-curves/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fdisentangled-attribution-curves/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/csinva%2Fdisentangled-attribution-curves/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/csinva","download_url":"https://codeload.github.com/csinva/disentangled-attribution-curves/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252526641,"owners_count":21762542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","boosting","ensemble-model","explainable-ai","feature-engineering","feature-importance","interpretability","machine-learning","ml","python","random-forest","random-forests","scikit-learn","statistics","stats"],"created_at":"2024-10-03T13:08:07.440Z","updated_at":"2025-05-05T15:51:32.273Z","avatar_url":"https://github.com/csinva.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e Disentangled attribution curves (DAC) 🔎\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e Official code for using / reproducing DAC from the paper \u003ci\u003eDisentangled Attribution Curves for Interpreting Random Forests \u003c/i\u003e (arXiv 2018 \u003ca href=\"https://arxiv.org/abs/1905.07631v1\"\u003epdf\u003c/a\u003e)\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-mit-blue.svg\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.6--3.8-blue\"\u003e\n  \u003cimg src=\"https://img.shields.io/github/checks-status/csinva/disentangled-attribution-curves/master\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n\t\u003ci\u003eNote: this repo is actively maintained. For any questions please file an issue.\u003c/i\u003e\n\u003c/p\u003e\n\n![](figs/fig_xor.png)\n\n# documentation\n\n## using DAC on new models\n- quick install: `pip install git+https://github.com/csinva/disentangled-attribution-curves`\n- the core of the method code lies in the [dac](dac) folder and is compatible with scikit-learn\n- the [examples/xor_dac.ipynb](examples/simple_ex.py) folder contains examples of how to use DAC on a new dataset with some simple datasets (e.g. XOR, etc.)\n- the basic api consists of two functions: `from dac import dac, dac_plot`\n- ```dac(forest, input_space_x, outcome_space_y, assignment, S, continuous_y=True, class_id=1)```\n  - inputs:\n  \t\n      - `forest`: an sklearn ensemble of decision trees\n      - `input_space_x`: the matrix of training data (feature values), a numpy 2D array\n      - `outcome_space_y`: the array of training data (labels/regression targets), a numpy 1D array\n      - `assignment`: a matrix of feature values that will have their DAC importance score evaluated, a numpy 2D array\n      - `S`: a binary indicator of whether to include each feature in the importance calculation, a numpy 1D array with values 0 and 1 only\n      - `continuous_y`: a boolean indicator of whether the y targets are regression(true) or classification(false), defaults to true\n      - `class_id`: if classification, the class value to return proportions for, defaults to 1\n  - returns\n  \n    - `dac_curve`\n  - for regression: a numpy array whose length corresponds to the number of samples in the assignment input.  Each entry is a DAC importance score, a\n        float between min(outcome_space_y) and max(outcome_space_y)\n      - for classification: a numpy array whose length corresponds to the number of samples in the assignment input.  Each entry is a DAC importance score, a\n        float between 0 and 1\n- ```dac_plot(forest, input_space_x, outcome_space_y, S, interval_x, interval_y, di_x, di_y, C, continuous_y, weights```\n    - inputs\n      - `forest`: an sklearn ensemble of decision trees (random forest or adaboosted forest)\n      - `input_space_x`: the matrix of training data (feature values), a numpy 2D array\n      - `outcome_space_y`: the array of training data (labels/regression targets), a numpy 1D array\n      - `S`: a binary indicator of whether to include each feature in the importance calculation, a numpy 1D array with values 0 and 1 only\n      - `interval_x`: an interval for the x axis of the plot, defaults to None.  If None, a reasonable interval will be extrapolated from the range\n        of the first feature specified in S.\n      - `interval_y`: an interval for the y axis of the plot (only applicable to heat maps), defaults to None.  \n        If None, a reasonable interval will be extrapolated from the range of the second feature specified in S.\n      - `di_x`: a step length for the x axis of the plot, defaults to None.  If None, a reasonable step length will be extrapolated from the range\n        of the first feature specified in S.\n      - `di_y`: a step length for the y axis of the plot (only applicable to heat maps), defaults to None.  If None, a reasonable step\n        length will be extrapolated from the range of the second feature specified in S.\n      - `C`: a hyper-parameter specifying the number of standard deviations samples can be from the mean of the leaf and be counted into the curve.  Smaller values\n        yield a more sensitive curve, larger values yield a smoother curve.\n      - `continuous_y`: a boolean indicator of whether the y targets are regression(true) or classification(false), defaults to true\n      - `weights`: weights for the individual estimators contributions to the curve, defaults to None.  If None, weights will be extrapolated from the forest type.\n    - returns\n      - ```dac_curve``` a numpy array containing values for the DAC curve or heatmap describing the interaction of the variables specified in S\n\n## reproducing results from the paper\n\n\u003cimg src=\"figs/fig_hr_holiday.png\" width=\"50%\"\u003e\n\n- the [examples/bike_sharing_dac.ipynb](examples/bike_sharing_dac.ipynb) folder contains examples of how to use DAC to reproducing the qualitative curves on the bike-sharing dataset in the paper\n- the [simulation script](experiments/simulation/run_sim_synthetic.py) replicates the experiments with running simulations\n- the [pmlb script](experiments/pmlb/run_dac_feature_engineered.py) replicates the experiments of automatic feature engineering on pmlb datasets\n\n\n\n## dac animation\n\n*a gif demonstrating calculating a DAC curve for a simple tree*\n\n![](figs/dac_animated.gif)\n\n\n\n# related work\n\n- this work is part of an overarching project on interpretable machine learning, guided by the [PDR framework](https://arxiv.org/abs/1901.04592) for interpretable machine learning\n- for related work, see the [github repo](https://github.com/csinva/hierarchical-dnn-interpretations) for disentangled hierarchical dnn interpretations ([ICLR 2019](https://arxiv.org/abs/1806.05337))\n\n# reference\n\n- feel free to use/share this code openly\n\n- citation for this work:\n\n  ```c\n  @article{devlin2019disentangled,\n      title={Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees},\n      author={Devlin, Summer and Singh, Chandan and Murdoch, W James and Yu, Bin},\n      journal={arXiv preprint arXiv:1905.07631},\n      year={2019}\n  }\n  ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsinva%2Fdisentangled-attribution-curves","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcsinva%2Fdisentangled-attribution-curves","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcsinva%2Fdisentangled-attribution-curves/lists"}