Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/leesael/EDiT

EDiT: Interpreting Ensemble Models via Compact Soft Decision Trees (ICDM'19)
https://github.com/leesael/EDiT

Last synced: 2 months ago
JSON representation

EDiT: Interpreting Ensemble Models via Compact Soft Decision Trees (ICDM'19)

Host: GitHub
URL: https://github.com/leesael/EDiT
Owner: leesael
License: bsd-3-clause
Created: 2019-08-20T06:17:09.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2019-09-25T06:55:30.000Z (over 5 years ago)
Last Synced: 2024-08-03T19:07:58.131Z (6 months ago)
Language: Python
Size: 1.33 MB
Stars: 10
Watchers: 2
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-decision-tree-papers - [Code

README

        # EDiT

This project is a PyTorch implementation of [EDiT: Interpreting Ensemble Models via Compact Soft Decision Trees](docs/YooS19.pdf), published as a conference proceeding at [ICDM 2019](http://icdm2019.bigke.org/).

This paper proposes a novel approach that distills the knowledge of an ensemble model to maximize the interpretability of soft decision trees (SDT) with fewer parameters.

## Prerequisites

- Python 3.6+

- [PyTorch](https://pytorch.org/) 1.2.0+

- [NumPy](https://numpy.org)

- [scikit-learn](https://scikit-learn.org/stable/)

- [joblib](https://joblib.readthedocs.io/en/latest/)

- [pandas](https://pandas.pydata.org/)

## Usage

You should first download the datasets from [this website](http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/) and place them in `/data`.

You may just run `down.sh` in `data/` in a Linux environment.

Although it contains over a hundred datasets which were used in previous works, we use only 8 of them in our work.

The list of target datasets is described in `datasets.txt`.

Then, move to `src/` and run `python main.py` to actually run EDiT.

Currently it trains a vanilla SDT over the `abalone` dataset, but you can change easily the hyperparameters in `src/main.py` including the dataset, sparsification technique, and training procedure.

For instance, it will use the tree pruning technique if you change `tree_threshold` from `0` to a desired threshold such as `1e-4`.

If you want to enable the knowledge distillation technique, you should run `python rf.py` to train and save random forests (RF) which are not included in this repository.

The trained RF models are saved in `out/rf/models`.

The other results such as intermediate logs of training and trained compact soft decision trees are saved in `out/edit`.

## Reference

You can download [this bib file](docs/YooS19.bib) or copy the following information: 

```

@inproceedings{YooS19,

  author    = {Jaemin Yoo and Lee Sael},

  title     = {EDiT: Interpreting Ensemble Models via Compact Soft Decision Trees},

  booktitle = {IEEE International Conference on Data Mining (ICDM)},

  year      = {2019}

}

```