{"id":13783447,"url":"https://github.com/FlorentAvellaneda/InferDT","last_synced_at":"2025-05-11T19:30:48.376Z","repository":{"id":215841382,"uuid":"222539097","full_name":"FlorentAvellaneda/InferDT","owner":"FlorentAvellaneda","description":"The code of AAAI20 paper \"Efficient Inference of Optimal Decision Trees\"","archived":false,"fork":false,"pushed_at":"2020-06-26T13:19:04.000Z","size":1234,"stargazers_count":7,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-17T20:47:06.749Z","etag":null,"topics":["classification-algorithm","decision-tree-classifier","machine-learning","optimal-decision-trees"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FlorentAvellaneda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-18T20:32:49.000Z","updated_at":"2024-02-23T02:23:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"e286a2c2-2680-44e0-9203-fd651ecd92a8","html_url":"https://github.com/FlorentAvellaneda/InferDT","commit_stats":null,"previous_names":["florentavellaneda/inferdt"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorentAvellaneda%2FInferDT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorentAvellaneda%2FInferDT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorentAvellaneda%2FInferDT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FlorentAvellaneda%2FInferDT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FlorentAvellaneda","download_url":"https://codeload.github.com/FlorentAvellaneda/InferDT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253620950,"owners_count":21937446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification-algorithm","decision-tree-classifier","machine-learning","optimal-decision-trees"],"created_at":"2024-08-03T19:00:21.666Z","updated_at":"2025-05-11T19:30:48.004Z","avatar_url":"https://github.com/FlorentAvellaneda.png","language":"C++","funding_links":[],"categories":["2020"],"sub_categories":[],"readme":"\n\n# Efficient Inference of Optimal Decision Trees\n\nPaper link: http://florent.avellaneda.free.fr/dl/AAAI20.pdf\n\nThe source code is designed to be compiled and executed on GNU/Linux.\n\n## Dependencies\n\n- g++\n- cmake\n- minisat : http://minisat.se/\n- xdot : https://github.com/jrfonseca/xdot.py\n- CLI11 : https://github.com/CLIUtils/CLI11\n\n### Example of installing dependencies for Ubuntu\n\nThe following instructions have been tested on Ubuntu 19.04\n\n```bash\nsudo apt install g++ cmake minisat2 xdot libzip-dev libboost-dev\n```\n\n## Build\n\n```bash\ncmake .\nmake\n```\n\n## Running\n\nInfer a decision tree with the algorithm *DT_depth* for the dataset \"mouse\":\n\n```bash\n./InferDT -d data/mouse.csv infer\n```\n\nInfer a decision tree with the algorithm *DT_size* for the dataset \"car\":\n\n```bash\n./InferDT data/car.csv infer\n```\n\nRun a 10-cross-validation on the dataset \"mouse\" with the algorithm *DT_size*:\n\n```bash\n./InferDT data/mouse.csv bench\n```\n\nInfer a decision tree with the algorithm *DT_size* for the training set balance-scale.csv.train1 and evaluate the model with the testing set balance-scale.csv.test1:\n\n```bash\n./InferDT data/balance-scale.csv.train1.csv infer -t data/balance-scale.csv.test1.csv\n```\n\nPrint a help message:\n\n```bash\n./InferDT --help\n```\n\n## Benchmarks\n\nWe ran experiments on Ubuntu with Intel Core CPU i7-2600K @ 3.40 GHz.\n\n### Verwer and Zhang Datasets\nThe datasets we used are extracted from the paper of Verwer and Zhang and are available at\n(https://github.com/SiccoVerwer/binoct).\n\n| Dataset       |  S   |  B   | Time  DT_depth | Accuracy DT_depth |  k   | n for DT_size | Time  DT_size | Accuracy  DT_size | Accuracy BinOCT* |\n| ------------- | :--: | :--: | :------------: | :---------------: | :--: | :-----------: | :-----------: | :---------------: | ---------------- |\n| iris          | 150  | 114  |     18 ms      |      92.9 %       |  3   |     10.6      |     30 ms     |      93.2 %       | **98.4 %**       |\n| monks1        | 124  |  17  |     24 ms      |      90.3 %       | 4.4  |      17       |     80 ms     |    **95.5 %**     | 87.1 %           |\n| monks2        | 169  |  17  |     190 ms     |      70.2 %       | 5.8  |     47.8      |    9.1 sec    |    **74.0 %**     | 63.3 %           |\n| monks3        |  122  |  17  |     30 ms      |      78.1 %       | 4.8  |     23.4      |    210 ms     |      82.6 %       | **93.5 %**       |\n| wine          | 178  | 1276 |     600 ms     |      89.3 %       |  3   |      7.8      |    1.2 sec    |    **92.0 %**     | 88.9 %           |\n| balance-scale | 625  |  20  |     50 sec     |    **93.0 %**     |  8   |      268      |    183 sec    |      92.6 %       | 77.5 %           |\n\n**S**: Number of examples in the dataset\n\n**B**: Number of Boolean features in the dataset\n\n**Time  DT_depth**: Time used by our algorithm with parameter \"-d\"\n\n**Accuracy DT_depth**: Accuracy of our algorithm with parameter \"-d\"\n\n**k**: Depth of inferred decision trees\n\n**n**: Number of nodes in inferred decision trees\n\n**Time  DT_size**: Time used by our algorithm without parameter \"-d\"\n\n**Accuracy DT_size**: Accuracy of our algorithm without parameter \"-d\"\n\n**Accuracy BinOCT**: Accuracy of algorithm from https://github.com/SiccoVerwer/binoct\n\n### Mouse\n\nWe used dataset [Mouse](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/mouse.csv) that the authors Bessiere, Hebrard and O'Sullivan shared with us. Each entry in rows DT\\_size and DT\\_depth corresponds to the average over 100 runs. The first columns correspond to the name of each algorithm used. The next three columns correspond to inferring a decision tree from the whole dataset. The last column corresponds the 10-fold cross-validations.\n\n| Algorithm                                                    |   Time   |  k   |  n   | Accuracy |\n| ------------------------------------------------------------ | :------: | :--: | :--: | :------: |\n| DT2 from paper [Minimising Decision Tree Size as Combinatorial Optimisation](http://homepages.laas.fr/ehebrard/papers/cp2009b.pdf) | 577 sec  |  4   |  15  |  83.8 %  |\n| DT1 from paper [Learning Optimal Decision Trees with SAT](https://www.ijcai.org/proceedings/2018/0189.pdf) | 12.9 sec |  4   |  15  |  83.8 %  |\n| DT_size: our algorithm without parameter \"-d\"                |  70 ms   |  4   |  15  |  83.5 %  |\n| DT_depth: our algorithm with parameter \"-d\"               |  20 ms   |  4   |  31  |  85.8 %  |\n\n### Other\n\nIn this section we perform **x** 10-cross-validations made randomly and record the average.\n\n\n| Dataset                                                      |  S   |  B   | Time  DT_depth | Accuracy DT_depth |  k   | n for DT_size | Time  DT_size | Accuracy  DT_size | x    |\n| ------------------------------------------------------------ | :--: | :--: | :------------: | :---------------: | :--: | :-----------: | :-----------: | :---------------: | ---- |\n| [zoo](http://archive.ics.uci.edu/ml/datasets/Zoo)            | 101  | 136  |     47 ms      |      91.7 %       |  4   |      20       |    200 ms     |       91 %        | 200  |\n| [BodyMassIndex](https://www.kaggle.com/yersever/500-person-gender-height-weight-bodymassindex) | 500  | 172  |     53 sec     |       85 %        | 6.6  | 109              |  6.4 h             |  85.4 %                 | 1    |\n| [lungCancerDataset](https://www.kaggle.com/yusufdede/lung-cancer-dataset) |  59  |  72  |     3.8 ms     |      89.6 %       | 2.6  |       7       |    7.5 ms     |      90.6 %       | 200  |\n| [iris](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/iris.csv) |  113  |  114  |     30 ms     |      93.6 %       | 3.6  |       14       |    120 ms     |      94.0 %       | 200  |\n| [monks1](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/monks1.csv) |  124  |  17  |     30 ms     |      97.0 %       | 4.9  |       15       |    140 ms     |      99.5 %       | 200  |\n| [monks2](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/monks2.csv) |  169  |  17  |     330 ms     |      88.0 %       | 6  |       64       |    9.3 sec     |      88.7 %       | 100  |\n| [monks3](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/monks3.csv) |  92  |  17  |     125 ms     |      79.6 %       | 5.8  |       35.5       |    2.5 sec     |      81.5 %       | 800  |\n| [balance-scale](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/balance-scale.csv) |  625  |  20  |      200 sec     |      71.8 %       | 9  |       276       |    11 h     |      73.6 %       | 1  |\n| [wine](https://raw.githubusercontent.com/FlorentAvellaneda/InferDT/master/data/wine.csv) |  178  | 1276  |      4.5 sec     |      91.5 %       | 3  |        12.2      |    14.5 sec     |      91.2 %       | 200  |\n\n**S**: Number of examples in the dataset\n\n**B**: Number of Boolean features in the dataset\n\n**k**: Average depth of inferred decision trees\n\n**n**: Average number of nodes in inferred decision trees\n\n**x**: Number of 10-cross-validation performed\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFlorentAvellaneda%2FInferDT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFlorentAvellaneda%2FInferDT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFlorentAvellaneda%2FInferDT/lists"}