{"id":15501638,"url":"https://github.com/cerlymarco/linear-tree","last_synced_at":"2025-05-16T12:12:06.913Z","repository":{"id":46086754,"uuid":"352026071","full_name":"cerlymarco/linear-tree","owner":"cerlymarco","description":"A python library to build Model Trees with Linear Models at the leaves.","archived":false,"fork":false,"pushed_at":"2024-07-19T04:16:16.000Z","size":6210,"stargazers_count":373,"open_issues_count":4,"forks_count":56,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-05-12T06:31:14.940Z","etag":null,"topics":["boosting-tree","decision-trees","linear-models","machine-learning","model-trees","random-forest","scikit-learn","tree"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cerlymarco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-27T08:57:48.000Z","updated_at":"2025-04-22T07:00:45.000Z","dependencies_parsed_at":"2022-08-12T12:40:32.234Z","dependency_job_id":"8b08b375-b154-4a92-9444-cd6a910cfbd3","html_url":"https://github.com/cerlymarco/linear-tree","commit_stats":{"total_commits":29,"total_committers":2,"mean_commits":14.5,"dds":0.03448275862068961,"last_synced_commit":"2982edc050206521fa9cde7df1b1f88ab7b2183d"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerlymarco%2Flinear-tree","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerlymarco%2Flinear-tree/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerlymarco%2Flinear-tree/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cerlymarco%2Flinear-tree/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cerlymarco","download_url":"https://codeload.github.com/cerlymarco/linear-tree/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254527099,"owners_count":22085919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["boosting-tree","decision-trees","linear-models","machine-learning","model-trees","random-forest","scikit-learn","tree"],"created_at":"2024-10-02T09:05:07.969Z","updated_at":"2025-05-16T12:12:06.888Z","avatar_url":"https://github.com/cerlymarco.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# linear-tree\nA python library to build Model Trees with Linear Models at the leaves.\n\nlinear-tree provides also the implementations of _LinearForest_ and _LinearBoost_ inspired from [these works](https://github.com/cerlymarco/linear-tree#references).\n\n## Overview\n**Linear Trees** combine the learning ability of Decision Tree with the predictive and explicative power of Linear Models. \nLike in tree-based algorithms, the data are split according to simple decision rules. The goodness of slits is evaluated in gain terms fitting Linear Models in the nodes. This implies that the models in the leaves are linear instead of constant approximations like in classical Decision Trees. \n\n**Linear Forests** generalize the well known Random Forests by combining Linear Models with the same Random Forests. The key idea is to use the strength of Linear Models to improve the nonparametric learning ability of tree-based algorithms. Firstly, a Linear Model is fitted on the whole dataset, then a Random Forest is trained on the same dataset but using the residuals of the previous steps as target. The final predictions are the sum of the raw linear predictions and the residuals modeled by the Random Forest.\n\n**Linear Boosting** is a two stage learning process. Firstly, a linear model is trained on the initial dataset to obtain predictions. Secondly, the residuals of the previous step are modeled with a decision tree using all the available features. The tree identifies the path leading to highest error (i.e. the worst leaf). The leaf contributing to the error the most is used to generate a new binary feature to be used in the first stage. The iterations continue until a certain stopping criterion is met.\n\n**linear-tree is developed to be fully integrable with scikit-learn**. ```LinearTreeRegressor``` and ```LinearTreeClassifier``` are provided as scikit-learn _BaseEstimator_ to build a decision tree using linear estimators. ```LinearForestRegressor``` and ```LinearForestClassifier``` use the _RandomForest_ from sklearn to model residuals. ```LinearBoostRegressor``` and ```LinearBoostClassifier``` are available also as _TransformerMixin_ in order to be integrated, in any pipeline, also for  automated features engineering. All the models available in [sklearn.linear_model](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model) can be used as base learner. \n\n## Installation\n```shell\npip install --upgrade linear-tree\n```\nThe module depends on NumPy, SciPy and Scikit-Learn (\u003e=0.24.2). Python 3.6 or above is supported.\n\n## Media\n- [Linear Tree: the perfect mix of Linear Model and Decision Tree](https://towardsdatascience.com/linear-tree-the-perfect-mix-of-linear-model-and-decision-tree-2eaed21936b7)\n- [Model Tree: handle Data Shifts mixing Linear Model and Decision Tree](https://towardsdatascience.com/model-tree-handle-data-shifts-mixing-linear-model-and-decision-tree-facfd642e42b)\n- [Explainable AI with Linear Trees](https://towardsdatascience.com/explainable-ai-with-linear-trees-7e30a6f067d7)\n- [Improve Linear Regression for Time Series Forecasting](https://towardsdatascience.com/improve-linear-regression-for-time-series-forecasting-e36f3c3e3534#a80b-b6010ccb1c21)\n- [Linear Boosting with Automated Features Engineering](https://towardsdatascience.com/linear-boosting-with-automated-features-engineering-894962c3ba84)\n- [Improve Random Forest with Linear Models](https://towardsdatascience.com/improve-random-forest-with-linear-models-1fa789691e18)\n\n## Usage\n##### Linear Tree Regression\n```python\nfrom sklearn.linear_model import LinearRegression\nfrom lineartree import LinearTreeRegressor\nfrom sklearn.datasets import make_regression\nX, y = make_regression(n_samples=100, n_features=4,\n                       n_informative=2, n_targets=1,\n                       random_state=0, shuffle=False)\nregr = LinearTreeRegressor(base_estimator=LinearRegression())\nregr.fit(X, y)\n```\n##### Linear Tree Classification\n```python\nfrom sklearn.linear_model import RidgeClassifier\nfrom lineartree import LinearTreeClassifier\nfrom sklearn.datasets import make_classification\nX, y = make_classification(n_samples=100, n_features=4,\n                           n_informative=2, n_redundant=0,\n                           random_state=0, shuffle=False)\nclf = LinearTreeClassifier(base_estimator=RidgeClassifier())\nclf.fit(X, y)\n```\n##### Linear Forest Regression\n```python\nfrom sklearn.linear_model import LinearRegression\nfrom lineartree import LinearForestRegressor\nfrom sklearn.datasets import make_regression\nX, y = make_regression(n_samples=100, n_features=4,\n                       n_informative=2, n_targets=1,\n                       random_state=0, shuffle=False)\nregr = LinearForestRegressor(base_estimator=LinearRegression())\nregr.fit(X, y)\n```\n##### Linear Forest Classification\n```python\nfrom sklearn.linear_model import LinearRegression\nfrom lineartree import LinearForestClassifier\nfrom sklearn.datasets import make_classification\nX, y = make_classification(n_samples=100, n_features=4,\n                           n_informative=2, n_redundant=0,\n                           random_state=0, shuffle=False)\nclf = LinearForestClassifier(base_estimator=LinearRegression())\nclf.fit(X, y)\n```\n##### Linear Boosting Regression\n```python\nfrom sklearn.linear_model import LinearRegression\nfrom lineartree import LinearBoostRegressor\nfrom sklearn.datasets import make_regression\nX, y = make_regression(n_samples=100, n_features=4,\n                       n_informative=2, n_targets=1,\n                       random_state=0, shuffle=False)\nregr = LinearBoostRegressor(base_estimator=LinearRegression())\nregr.fit(X, y)\n```\n##### Linear Boosting Classification\n```python\nfrom sklearn.linear_model import RidgeClassifier\nfrom lineartree import LinearBoostClassifier\nfrom sklearn.datasets import make_classification\nX, y = make_classification(n_samples=100, n_features=4,\n                           n_informative=2, n_redundant=0,\n                           random_state=0, shuffle=False)\nclf = LinearBoostClassifier(base_estimator=RidgeClassifier())\nclf.fit(X, y)\n```\n\nMore examples in the [notebooks folder](https://github.com/cerlymarco/linear-tree/tree/main/notebooks).\n\nCheck the [API Reference](https://github.com/cerlymarco/linear-tree/blob/main/notebooks/README.md) to see the parameter configurations and the available methods.\n\n## Examples\nShow the linear tree learning path:\n\n![plot tree](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/plot_tree.png)\n\nLinear Tree Regressor at work:\n\n![linear tree regressor](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_tree_reg.png)\n\nLinear Tree Classifier at work:\n\n![linear tree classifier](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_tree_class.png)\n\nExtract and examine coefficients at the leaves:\n\n![leaf coefficients](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/leaf_coefficients.png)\n\nImpact of the features automatically generated with Linear Boosting:\n\n![linear_boost_importances](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_boost_importances.png)\n\nComparing predictions of Linear Forest and Random Forest:\n\n![linear_forest_predictions](https://raw.githubusercontent.com/cerlymarco/linear-tree/master/imgs/linear_forest_predictions.png)\n\n## References\n- Regression-Enhanced Random Forests. Haozhe Zhang, Dan Nettleton, Zhengyuan Zhu.\n- Explainable boosted linear regression for time series forecasting. Igor Ilic, Berk Gorgulu, Mucahit Cevik, Mustafa Gokce Baydogan.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcerlymarco%2Flinear-tree","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcerlymarco%2Flinear-tree","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcerlymarco%2Flinear-tree/lists"}