{"id":20025343,"url":"https://github.com/lgmoneda/time-robust-forest","last_synced_at":"2025-08-05T11:22:26.129Z","repository":{"id":38235978,"uuid":"374349712","full_name":"lgmoneda/time-robust-forest","owner":"lgmoneda","description":"Leverage timestamp information to improve Random Forest out of distribution generalization.","archived":false,"fork":false,"pushed_at":"2024-06-10T09:27:59.000Z","size":236,"stargazers_count":14,"open_issues_count":9,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T15:45:41.448Z","etag":null,"topics":["causality","machine-learning","machine-learning-algorithms"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lgmoneda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-06-06T12:03:19.000Z","updated_at":"2025-03-21T14:39:51.000Z","dependencies_parsed_at":"2023-12-15T22:43:02.188Z","dependency_job_id":"a8954a9a-c593-404e-a0d0-8ff9431f70cd","html_url":"https://github.com/lgmoneda/time-robust-forest","commit_stats":{"total_commits":24,"total_committers":1,"mean_commits":24.0,"dds":0.0,"last_synced_commit":"f598b7d5530767092cb556982b293589993b7454"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgmoneda%2Ftime-robust-forest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgmoneda%2Ftime-robust-forest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgmoneda%2Ftime-robust-forest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lgmoneda%2Ftime-robust-forest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lgmoneda","download_url":"https://codeload.github.com/lgmoneda/time-robust-forest/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252427774,"owners_count":21746269,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["causality","machine-learning","machine-learning-algorithms"],"created_at":"2024-11-13T08:55:02.423Z","updated_at":"2025-05-05T02:31:06.721Z","avatar_url":"https://github.com/lgmoneda.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# time-robust-forest\n\n\u003cdiv align=\"center\"\u003e\n\n[![Build status](https://github.com/lgmoneda/time-robust-forest/workflows/build/badge.svg?branch=main\u0026event=push)](https://github.com/lgmoneda/time-robust-forest/actions?query=workflow%3Abuild)\n[![Python Version](https://img.shields.io/pypi/pyversions/time-robust-forest.svg)](https://pypi.org/project/time-robust-forest/)\n[![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg)](https://github.com/lgmoneda/time-robust-forest/pulls?utf8=%E2%9C%93\u0026q=is%3Apr%20author%3Aapp%2Fdependabot)\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Security: bandit](https://img.shields.io/badge/security-bandit-green.svg)](https://github.com/PyCQA/bandit)\n[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)](https://github.com/lgmoneda/time-robust-forest/blob/main/.pre-commit-config.yaml)\n[![Semantic Versions](https://img.shields.io/badge/%F0%9F%9A%80-semantic%20versions-informational.svg)](https://github.com/lgmoneda/time-robust-forest/releases)\n[![License](https://img.shields.io/github/license/lgmoneda/time-robust-forest)](https://github.com/lgmoneda/time-robust-forest/blob/main/LICENSE)\n\n\u003c/div\u003e\n\nA Proof of concept model that explores timestamp information to train a random forest with better Out-of-distribution generalization power.\n\n## Installation\n\n```bash\npip install -U time-robust-forest\n```\n\n## How to use it\n\nThere are a classifier and a regressor under `time_robust_forest.models`. They follow the sklearn interface, which means you can quickly fit and use a model:\n\n```python\nfrom time_robust_forest.models import TimeForestClassifier\n\nfeatures = [\"x_1\", \"x_2\"]\ntime_column = \"periods\"\ntarget = \"y\"\n\nmodel = TimeForestClassifier(time_column=time_column)\n\nmodel.fit(training_data[features + [time_column]], training_data[target])\npredictions = model.predict_proba(test_data[features])[:, 1]\n```\n\nThere are only three arguments that differ from a traditional Random Forest.\n\n- time_column: the column from the input data frame containing the periods the model will iterate over to find the best splits (default: \"period\")\n- min_sample_periods: the number of examples in every period the model needs\nto keep while it splits.\n- period_criterion: how the model will aggregate the performance in every period. Options: {\"avg\": average, \"max\": maximum, the worst case}.\n(default: \"avg\")\n\nTo use the environment-wise optimization:\n\n```python\nfrom time_robust_forest.hyper_opt import env_wise_hyper_opt\n\nparams_grid = {\"n_estimators\": [30, 60, 120],\n              \"max_depth\": [5, 10],\n              \"min_impurity_decrease\": [1e-1, 1e-3, 0],\n              \"min_sample_periods\": [5, 10, 30],\n              \"period_criterion\": [\"max\", \"avg\"]}\n\nmodel = TimeForestClassifier(time_column=time_column)\n\nopt_param = env_wise_hyper_opt(training_data[features + [time_column]],\n                               training_data[TARGET],\n                               model,\n                               time_column,\n                               params_grid,\n                               cv=5,\n                               scorer=make_scorer(roc_auc_score,\n                                                  needs_proba=True))\n\n```\n\n### Make sure you have a good choice for the time column\n\nDon't simply use a timestamp column from the dataset; make it discrete before and guarantee there are a reasonable number of data points in every period. For example, use year if you have 3+ years of data. Notice that the choice to make it discrete becomes a modeling choice you can optimize.\n\n### Random segments\n\n#### Selecting randomly from multiple time columns\nThe user can use a list instead of a string as the `time_column` argument. The model will select randomly from it when building every estimator from the defined `n_estimators`.\n\n```python\nfrom time_robust_forest.models import TimeForestClassifier\n\nfeatures = [\"x_1\", \"x_2\"]\ntime_columns = [\"periods\", \"periods_2\"]\ntarget = \"y\"\n\nmodel = TimeForestClassifier(time_column=time_columns)\n\nmodel.fit(training_data[features + time_columns], training_data[target])\npredictions = model.predict_proba(test_data[features])[:, 1]\n```\n\n#### Generating random segments from a timestamp column\n\nThe user can define a maximum number of segments (`random_segments`), and the model will split the data using the time stamp information. In the following example, the model segments the data into 1, 2, 3, and 10 parts. For every estimator, it randomly picks one of the ten columns representing the `time_column` and uses it. In this case, the `time_column` should be the time stamp information.\n\n```python\nfrom time_robust_forest.models import TimeForestClassifier\n\nfeatures = [\"x_1\", \"x_2\"]\ntime_column = \"time_stamp\"\ntarget = \"y\"\n\nmodel = TimeForestClassifier(time_column=time_column, random_segments=10)\n\nmodel.fit(training_data[features + [time_column]], training_data[target])\npredictions = model.predict_proba(test_data[features])[:, 1]\n```\n\n## License\n\n[![License](https://img.shields.io/github/license/lgmoneda/time-robust-forest)](https://github.com/lgmoneda/time-robust-forest/blob/main/LICENSE)\n\nThis project is licensed under the terms of the `BSD-3` license. See [LICENSE](https://github.com/lgmoneda/time-robust-forest/blob/main/LICENSE) for more details.\n\n## Useful links\n\n- [Introducing the Time Robust Tree blog post](http://lgmoneda.github.io/2021/12/03/introducing-time-robust-tree.html)\n- [Paper](http://lgmoneda.github.io/resources/papers/Time_Robust_Tree.pdf)\n\n## Citation\n\n```\n@inproceedings{moneda2022time,\n  title={Time Robust Trees: Using Temporal Invariance to Improve Generalization},\n  author={Moneda, Luis and Mauá, Denis},\n  booktitle={Brazilian Conference on Intelligent Systems},\n  pages={385--397},\n  year={2022},\n  organization={Springer}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flgmoneda%2Ftime-robust-forest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flgmoneda%2Ftime-robust-forest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flgmoneda%2Ftime-robust-forest/lists"}