{"id":21615989,"url":"https://github.com/pharo-ai/decision-tree-model","last_synced_at":"2025-09-08T04:41:03.462Z","repository":{"id":65631330,"uuid":"235577142","full_name":"pharo-ai/decision-tree-model","owner":"pharo-ai","description":"Model for Decision Trees Learning in Pharo","archived":false,"fork":false,"pushed_at":"2023-12-19T14:51:05.000Z","size":213,"stargazers_count":3,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-11T08:26:31.189Z","etag":null,"topics":["decision-trees-learning","decisiontreemodel","pharo"],"latest_commit_sha":null,"homepage":null,"language":"Smalltalk","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pharo-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-01-22T13:15:56.000Z","updated_at":"2022-01-23T22:58:44.000Z","dependencies_parsed_at":"2025-04-11T07:46:11.891Z","dependency_job_id":"44bda4f5-5a0d-443c-bf2d-34595e8581b6","html_url":"https://github.com/pharo-ai/decision-tree-model","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pharo-ai/decision-tree-model","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pharo-ai%2Fdecision-tree-model","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pharo-ai%2Fdecision-tree-model/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pharo-ai%2Fdecision-tree-model/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pharo-ai%2Fdecision-tree-model/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pharo-ai","download_url":"https://codeload.github.com/pharo-ai/decision-tree-model/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pharo-ai%2Fdecision-tree-model/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274135309,"owners_count":25228203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["decision-trees-learning","decisiontreemodel","pharo"],"created_at":"2024-11-24T22:13:19.606Z","updated_at":"2025-09-08T04:41:03.423Z","avatar_url":"https://github.com/pharo-ai.png","language":"Smalltalk","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build status](https://github.com/pharo-ai/DecisionTreeModel/workflows/CI/badge.svg)](https://github.com/pharo-ai/DecisionTreeModel/actions/workflows/test.yml)\n[![Coverage Status](https://coveralls.io/repos/github/pharo-ai/DecisionTreeModel/badge.svg?branch=master)](https://coveralls.io/github/pharo-ai/DecisionTreeModel?branch=master)\n[![Pharo version](https://img.shields.io/badge/Pharo-10-%23aac9ff.svg)](https://pharo.org/download)\n[![Pharo version](https://img.shields.io/badge/Pharo-11-%23aac9ff.svg)](https://pharo.org/download)\n[![Pharo version](https://img.shields.io/badge/Pharo-12-%23aac9ff.svg)](https://pharo.org/download)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/pharo-ai/DecisionTreeModel/master/LICENSE)\n\n## Description\n\nModel for Decision Trees Learning in Pharo\n\n## Installation\n\nTo install the DecisionTreeModel, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):\n\n```Smalltalk\nMetacello new\n  baseline: 'AIDecisionTreeModel';\n  repository: 'github://pharo-ai/DecisionTreeModel/src';\n  load.\n```\n\n## How to depend on it?\n\nIf you want to add a dependency on kNN to your project, include the following lines into your baseline method:\n\n```Smalltalk\nspec\n  baseline: 'AIDecisionTreeModel'\n  with: [ spec repository: 'github://pharo-ai/DecisionTreeModel/src' ].\n```\n\nIf you are new to baselines and Metacello, check out the [Baselines](https://github.com/pharo-open-documentation/pharo-wiki/blob/master/General/Baselines.md) tutorial on Pharo Wiki.\n\n## How to use it\n\nNote: This documentation will use datasets that can be found in Pharo-AI/Dataset. To load this project and reproduce those examples you can execute:\n\n```st\nMetacello new\n  baseline: 'AIDatasets';\n  repository: 'github://pharo-ai/datasets';\n  load.\n```\n\n### DecisionTree\n\nA simple example of how to create a DecisionTree (not a decision tree model)\n\n```Smalltalk\n| waterDecisionTree |\nwaterDecisionTree := AIDTBinaryDecisionTree withCondition: [ :value | value \u003c 0  ].\nwaterDecisionTree trueBranch: (AIDTDecision withLabel: 'ice').\nwaterDecisionTree falseBranch: (AIDTDecision withLabel: 'liquid').\t\t\n```\n\n### AIDTDataset\n\nA AIDTDataset can be initialized from a DataFrame\n\n```Smalltalk\niris := AIDTDataset fromDataFrame: AIDatasets loadIris.\n```\n\nOr from an array of objects\n\n```Smalltalk\narrayOfPoints := {Point x: 10 y: 12 . Point x: 5 y: 7} asArray.\nnewDataset := AIDTDataset fromArray: arrayOfPoints withColumns: #(degrees min max).\n``` \n\nSince AIDTDataset is used for supervised learning, one can set the features and target that one wants to use. \n\n\n```Smalltalk\niris := AIDTDataset fromDataFrame: AIDatasets loadIris.\n\n\"Setting features and target in the dataset\"\niris target: #species.\niris features: #('sepal length (cm)' 'petal width (cm)').\n```\n\nIn the case of the initialization from an array this can be done directly with\n\n```Smalltalk\narrayOfPoints := {Point x: 10 y: 12 . Point x: 5 y: 7} asArray.\nnewDataset := AIDTDataset \n                  fromArray: arrayOfPoints \n                  withFeatures: #(degrees min) \n                  withTarget: #max.\n```\n\nIf one does not specify the features, by default all columns different from the target will be considered as features.\n\n### DecisionTreeModel - ID3\n\nThe ID3 algorithm treats all columns as categorical. At each split the tree creates a branch for each posible value the variable can take. If one wishes to use a numerical column it is suggested that it is discretized beforehand. If not, each numerical value will be treated as a category. \n\nExample on Iris Dataset\n```Smalltalk\niris := AIDTDataset fromDataFrame: AIDatasets loadIris.\niris target: #species.\n\n\"Training - Preprocessing\"\ndiscretizer := AIDTDiscretizer new.\ndiscretizer fitTransform: iris.\n\n\"Training - Model\"\naTreeModel := AIDTID3DecisionTreeModel new.\naTreeModel fit: iris. \n\n\"Predicting\"\ntestDataset := AIDTDataset \n                   withRows: #(#(8.0 3.8 1.2 0.6) \n                               #(4.5 2.6 3.0 0.7))\n                   withFeatures: (iris features copyWithout: #species) .\ndiscretizer transform: testDataset.\naTreeModel decisionsForAll: testDataset. \n\"an Array(AIDTDecision(setosa) AIDTDecision(versicolor))\"\n```\n\nA decision tree can also explain why it got to a conclusion\n```Smalltalk\n(aTreeModel decisionsForAll: testDataset) anyOne why. \n\"an OrderedCollection(\n  AIDTMultiwaySplitter(petal width (cm))-\u003eAIDTInterval( [0.58, 1.06) ) \n  AIDTMultiwaySplitter(sepal width (cm))-\u003eAIDTInterval( [3.44, 3.92) ))\"\n```\nThis means that the first split was made over `petal width (cm)`, on which the example belonged to the interval [0.58, 1.06).\n\nThen, another split was made over `sepal width (cm)`, on which the example belonged to the interval [3.44, 3.92).\n\n### DecisionTreeModel - C4.5\n\nThe algorithm C4.5 is an extension of ID3. It makes a few improvements like being able to hande both numerical and categorical variables. For numerical variables a threshold is applied and the data is split over the examples that satisfy the threshold and the ones that do not.\n\nWith C4.5 we no longer have the need to discretize numerical values.\n```Smalltalk\niris := AIDTDataset fromDataFrame: AIDatasets loadIris.\niris target: #species.\n\n\"Training - Model\"\naTreeModel := AIDTC45DecisionTreeModel new.\naTreeModel fit: iris. \n\n\"Predicting\"\ntestDataset := AIDTDataset \n                   withRows: #(#(8.0 3.8 1.2 0.6) \n                               #(4.5 2.6 3.0 0.7))\n                   withFeatures: (iris features copyWithout: #species) .\naTreeModel decisionsForAll: testDataset. \n \"an Array(AIDTDecision(setosa) AIDTDecision(versicolor))\"\n```\n\nThis decision tree can also explain why it got to a conclusion\n\n```Smalltalk\n(aTreeModel decisionsForAll: testDataset) anyOne why. \n\"an OrderedCollection(AIDTThresholdSplitter(petal width (cm) \u003c= 0.6)-\u003etrue)\"\n```\n\nThis means that the first split was made on `petal width (cm)`, with a threshold of 0.6. This example was over the threshold, which lead to the decision.\n\nWe can also handle a dataset that has both numerical and categorical variables\n\n```Smalltalk\n\"Build dataset\"\ntennisDataFrame := DataFrame withRows: #(\n    (sunny 85 85 weak false)\n    (sunny 80 90 strong false)\n    (cloudy 83 78 weak true)\n    (rainy 70 96 weak true)\n    (rainy 68 80 weak true)\n    (rainy 65 70 strong false)\n    (cloudy 64 65 strong true)\n    (sunny 72 95 weak false)\n    (sunny 69 70 weak true)\n    (rainy 75 80 weak true)\n    (sunny 75 70 strong true)\n    (cloudy 72 90 strong true)\n    (rainy 71 80 strong false)).\n    \ntennisDataFrame columnNames: #(outlook temperature humidity wind playTennis).\ntennisDataset := AIDTDataset fromDataFrame: tennisDataFrame.\ntennisDataset target: #playTennis.\n\n\"Training - Model\"\naTreeModel := AIDTC45DecisionTreeModel new.\naTreeModel fit: tennisDataset. \n\n\"Predicting\"\ntestDataset := AIDTDataset \n                   withRows: #(#(cloudy 71 70 weak)\n                               #(rainy  65 94 strong))\n                   withFeatures: #(outlook temperature humidity wind) .\naTreeModel decisionsForAll: testDataset. \"an Array(AIDTDecision(true) AIDTDecision(false))\"\n```\n\nWe can again see why a decision was made, where we see that several splits can be done on a numerical variable (by using diferent thresholds).\n\n```Smalltalk\n(aTreeModel decisionsForAll: testDataset) anyOne why.\n \"an OrderedCollection(\n   AIDTThresholdSplitter(temperature \u003c= 83)-\u003etrue \n   AIDTThresholdSplitter(temperature \u003c= 80)-\u003etrue \n   AIDTThresholdSplitter(temperature \u003c= 75)-\u003etrue \n   AIDTThresholdSplitter(temperature \u003c= 72)-\u003etrue \n   AIDTThresholdSplitter(temperature \u003c= 64)-\u003efalse \n   AIDTThresholdSplitter(temperature \u003c= 65)-\u003efalse \n   AIDTThresholdSplitter(temperature \u003c= 70)-\u003efalse \n   AIDTMultiwaySplitter(outlook)-\u003e#cloudy)\"\n```\n\n### DecisionTreeModel - CART\n\nAnother algorithm for building decision trees is CART. It can also handle numerical and categorical variables but only does binary splits on the data. For numerical variables it does a split over a threshold, and for categorical it performs a test in the form: is `a value` in `a subset of values`?. Since checking all possible subsets of values for a variable would be exponentially hard, we will check for subsets with a single value. This means that we will split over examples that satisfy `aVariable=aValue` and the ones that do not. \n\nGoing back to our tennis example:\n\n```Smalltalk\n\"Build dataset\"\ntennisDataFrame := DataFrame withRows: #(\n    (sunny 85 85 weak false)\n    (sunny 80 90 strong false)\n    (cloudy 83 78 weak true)\n    (rainy 70 96 weak true)\n    (rainy 68 80 weak true)\n    (rainy 65 70 strong false)\n    (cloudy 64 65 strong true)\n    (sunny 72 95 weak false)\n    (sunny 69 70 weak true)\n    (rainy 75 80 weak true)\n    (sunny 75 70 strong true)\n    (cloudy 72 90 strong true)\n    (rainy 71 80 strong false)).\n    \ntennisDataFrame columnNames: #(outlook temperature humidity wind playTennis).\ntennisDataset := AIDTDataset fromDataFrame: tennisDataFrame.\ntennisDataset target: #playTennis.\n\n\"Training - Model\"\naTreeModel := AIDTCARTDecisionTreeModel new.\naTreeModel fit: tennisDataset. \n\n\"Predicting\"\ntestDataset := AIDTDataset \n                   withRows: #(#(sunny 80 70 strong)\n                               #(cloudy  70 94 strong))\n                   withFeatures: #(outlook temperature humidity wind) .\naTreeModel decisionsForAll: testDataset. \"an Array(AIDTDecision(false) AIDTDecision(true))\"\n```\n\nIf we see the explanation of a decision we can find both splits over a categorical variable being equal to a value (OneVsAll) or a numerical value being over a threshold. \n\n```Smalltalk\n(aTreeModel decisionsForAll: testDataset) anyOne why.\n \"an OrderedCollection(\n    AIDTOneVsAllSplitter(outlook = cloudy)-\u003efalse \n    AIDTThresholdSplitter(temperature \u003c= 75)-\u003efalse)\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpharo-ai%2Fdecision-tree-model","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpharo-ai%2Fdecision-tree-model","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpharo-ai%2Fdecision-tree-model/lists"}