{"id":13671585,"url":"https://github.com/jinlow/forust","last_synced_at":"2025-04-27T18:31:25.723Z","repository":{"id":37028636,"uuid":"486014834","full_name":"jinlow/forust","owner":"jinlow","description":"A lightweight gradient boosted decision tree package.","archived":false,"fork":false,"pushed_at":"2025-04-25T16:49:22.000Z","size":14017,"stargazers_count":70,"open_issues_count":10,"forks_count":7,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-25T17:51:03.215Z","etag":null,"topics":["ai","machine-learning","pyo3","python","rust","xgboost","xgboost-algorithm"],"latest_commit_sha":null,"homepage":"https://jinlow.github.io/forust/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jinlow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-27T02:15:38.000Z","updated_at":"2025-03-13T19:42:05.000Z","dependencies_parsed_at":"2023-02-01T05:01:00.852Z","dependency_job_id":"2fafedae-5655-4043-a761-4fc69a17b857","html_url":"https://github.com/jinlow/forust","commit_stats":{"total_commits":160,"total_committers":1,"mean_commits":160.0,"dds":0.0,"last_synced_commit":"9155f608025a7feb8781553bcbc1ed594b03c303"},"previous_names":[],"tags_count":48,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinlow%2Fforust","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinlow%2Fforust/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinlow%2Fforust/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jinlow%2Fforust/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jinlow","download_url":"https://codeload.github.com/jinlow/forust/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251187178,"owners_count":21549597,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","machine-learning","pyo3","python","rust","xgboost","xgboost-algorithm"],"created_at":"2024-08-02T09:01:13.877Z","updated_at":"2025-04-27T18:31:25.096Z","avatar_url":"https://github.com/jinlow.png","language":"Rust","funding_links":[],"categories":["Rust","Scientific Computing"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg  height=\"340\" src=\"https://github.com/jinlow/forust/raw/main/resources/tree-image-crop.png\"\u003e\n\u003c/p\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003ca href=\"https://pypi.org/project/forust/\"\u003e![PyPI](https://img.shields.io/pypi/v/forust?color=gr\u0026style=for-the-badge)\u003c/a\u003e\n  \u003ca href=\"https://crates.io/crates/forust-ml\"\u003e![Crates.io](https://img.shields.io/crates/v/forust-ml?color=gr\u0026style=for-the-badge)\u003c/a\u003e\n\n\u003c/div\u003e\n\n# Forust\n## _A lightweight gradient boosting package_\nForust, is a lightweight package for building gradient boosted decision tree ensembles. All of the algorithm code is written in [Rust](https://www.rust-lang.org/), with a python wrapper. The rust package can be used directly, however, most examples shown here will be for the python wrapper. For a self contained rust example, [see here](rs-example.md). It implements the same algorithm as the [XGBoost](https://xgboost.readthedocs.io/en/stable/) package, and in many cases will give nearly identical results.\n\nI developed this package for a few reasons, mainly to better understand the XGBoost algorithm, additionally to have a fun project to work on in rust, and because I wanted to be able to experiment with adding new features to the algorithm in a smaller simpler codebase.\n\nAll of the rust code for the package can be found in the [src](src/) directory, while all of the python wrapper code is in the [py-forust](py-forust/) directory.\n\n## Documentation\nDocumentation for the python API can be found [here](https://jinlow.github.io/forust/).\n\n## Installation\nThe package can be installed directly from [pypi](https://pypi.org/project/forust/).\n```shell\npip install forust\n```\n\nTo use in a rust project add the following to your Cargo.toml file.\n```toml\nforust-ml = \"0.4.8\"\n```\n\n## Usage\n\nFor details on all of the methods and their respective parameters, see the [python api documentation](https://jinlow.github.io/forust/).\n\nThe [`GradientBooster`](https://jinlow.github.io/forust/#forust.GradientBooster) class is currently the only public facing class in the package, and can be used to train gradient boosted decision tree ensembles with multiple objective functions.\n\n### Training and Predicting\n\nOnce, the booster has been initialized, it can be fit on a provided dataset, and performance field. After fitting, the model can be used to predict on a dataset.\nIn the case of this example, the predictions are the log odds of a given record being 1.\n\n```python\n# Small example dataset\nfrom seaborn import load_dataset\n\ndf = load_dataset(\"titanic\")\nX = df.select_dtypes(\"number\").drop(columns=[\"survived\"])\ny = df[\"survived\"]\n\n# Initialize a booster with defaults.\nfrom forust import GradientBooster\nmodel = GradientBooster(objective_type=\"LogLoss\")\nmodel.fit(X, y)\n\n# Predict on data\nmodel.predict(X.head())\n# array([-1.94919663,  2.25863229,  0.32963671,  2.48732194, -3.00371813])\n\n# predict contributions\nmodel.predict_contributions(X.head())\n# array([[-0.63014213,  0.33880048, -0.16520798, -0.07798772, -0.85083578,\n#        -1.07720813],\n#       [ 1.05406709,  0.08825999,  0.21662544, -0.12083538,  0.35209258,\n#        -1.07720813],\n```\n\nWhen predicting with the data, the maximum iteration that will be used when predicting can be set using the [`set_prediction_iteration`](https://jinlow.github.io/forust/#forust.GradientBooster.set_prediction_iteration) method. If `early_stopping_rounds` has been set, this will default to the best iteration, otherwise all of the trees will be used.\n\nIf early stopping was used, the evaluation history can be retrieved with the [`get_evaluation_history`](https://jinlow.github.io/forust/#forust.GradientBooster.get_evaluation_history) method.\n\n```python\nmodel = GradientBooster(objective_type=\"LogLoss\")\nmodel.fit(X, y, evaluation_data=[(X, y)])\n\nmodel.get_evaluation_history()[0:3]\n\n# array([[588.9158873 ],\n#        [532.01055803],\n#        [496.76933646]])\n```\n\n### Inspecting the Model\n\nOnce the booster has been fit, each individual tree structure can be retrieved in text form, using the [`text_dump`](https://jinlow.github.io/forust/#forust.GradientBooster.text_dump) method. This method returns a list, the same length as the number of trees in the model.\n\n```python\nmodel.text_dump()[0]\n# 0:[0 \u003c 3] yes=1,no=2,missing=2,gain=91.50833,cover=209.388307\n#       1:[4 \u003c 13.7917] yes=3,no=4,missing=4,gain=28.185467,cover=94.00148\n#             3:[1 \u003c 18] yes=7,no=8,missing=8,gain=1.4576768,cover=22.090348\n#                   7:[1 \u003c 17] yes=15,no=16,missing=16,gain=0.691266,cover=0.705011\n#                         15:leaf=-0.15120,cover=0.23500\n#                         16:leaf=0.154097,cover=0.470007\n```\n\nThe [`json_dump`](https://jinlow.github.io/forust/#forust.GradientBooster.json_dump) method performs the same action, but returns the model as a json representation rather than a text string.\n\nTo see an estimate for how a given feature is used in the model, the `partial_dependence` method is provided. This method calculates the partial dependence values of a feature. For each unique value of the feature, this gives the estimate of the predicted value for that feature, with the effects of all features averaged out. This information gives an estimate of how a given feature impacts the model.\n\nThis information can be plotted to visualize how a feature is used in the model, like so.\n\n```python\nfrom seaborn import lineplot\nimport matplotlib.pyplot as plt\n\npd_values = model.partial_dependence(X=X, feature=\"age\", samples=None)\n\nfig = lineplot(x=pd_values[:,0], y=pd_values[:,1],)\nplt.title(\"Partial Dependence Plot\")\nplt.xlabel(\"Age\")\nplt.ylabel(\"Log Odds\")\n```\n\u003cimg  height=\"340\" src=\"https://github.com/jinlow/forust/raw/main/resources/pdp_plot_age.png\"\u003e\n\nWe can see how this is impacted if a model is created, where a specific constraint is applied to the feature using the `monotone_constraint` parameter.\n\n```python\nmodel = GradientBooster(\n    objective_type=\"LogLoss\",\n    monotone_constraints={\"age\": -1},\n)\nmodel.fit(X, y)\n\npd_values = model.partial_dependence(X=X, feature=\"age\")\nfig = lineplot(\n    x=pd_values[:, 0],\n    y=pd_values[:, 1],\n)\nplt.title(\"Partial Dependence Plot with Monotonicity\")\nplt.xlabel(\"Age\")\nplt.ylabel(\"Log Odds\")\n```\n\u003cimg  height=\"340\" src=\"https://github.com/jinlow/forust/raw/main/resources/pdp_plot_age_mono.png\"\u003e\n\nFeature importance values can be calculated with the [`calculate_feature_importance`](https://jinlow.github.io/forust/#forust.GradientBooster.calculate_feature_importance) method. This function will return a dictionary of the features and their importances. It should be noted that if a feature was never used for splitting it will not be returned in importance dictionary. This function takes the following arguments.\n\n```python\nmodel.calculate_feature_importance(\"Gain\")\n# {\n#   'parch': 0.0713072270154953, \n#   'age': 0.11609109491109848,\n#   'sibsp': 0.1486879289150238,\n#   'fare': 0.14309120178222656,\n#   'pclass': 0.5208225250244141\n# }\n```\n\n### Saving the model\nTo save and subsequently load a trained booster, the `save_booster` and `load_booster` methods can be used. Each accepts a path, which is used to write the model to. The model is saved and loaded as a json object.\n\n```python\ntrained_model.save_booster(\"model_path.json\")\n\n# To load a model from a json path.\nloaded_model = GradientBooster.load_booster(\"model_path.json\")\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinlow%2Fforust","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjinlow%2Fforust","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjinlow%2Fforust/lists"}