{"id":17263733,"url":"https://github.com/Luwen-Zhang/tabular_ensemble","last_synced_at":"2025-08-21T02:32:25.272Z","repository":{"id":257800078,"uuid":"668992992","full_name":"LuoXueling/tabular_ensemble","owner":"LuoXueling","description":"A framework to evaluate various models for tabular regression and classification tasks.","archived":false,"fork":false,"pushed_at":"2024-09-25T16:33:36.000Z","size":24929,"stargazers_count":1,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-09-26T01:48:50.488Z","etag":null,"topics":["machine-learning","tabular-model"],"latest_commit_sha":null,"homepage":"https://tabular-ensemble.readthedocs.io/en/latest/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LuoXueling.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-21T05:07:35.000Z","updated_at":"2024-09-25T17:12:47.000Z","dependencies_parsed_at":"2024-09-26T01:49:04.387Z","dependency_job_id":null,"html_url":"https://github.com/LuoXueling/tabular_ensemble","commit_stats":null,"previous_names":["luoxueling/tabular_ensemble"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuoXueling%2Ftabular_ensemble","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuoXueling%2Ftabular_ensemble/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuoXueling%2Ftabular_ensemble/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LuoXueling%2Ftabular_ensemble/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LuoXueling","download_url":"https://codeload.github.com/LuoXueling/tabular_ensemble/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219844809,"owners_count":16556479,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","tabular-model"],"created_at":"2024-10-15T07:57:21.885Z","updated_at":"2025-08-21T02:32:25.253Z","avatar_url":"https://github.com/LuoXueling.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tabular_ensemble\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![codecov](https://codecov.io/gh/Luwen-Zhang/tabular_ensemble/graph/badge.svg?token=APnN7LFtv9)](https://codecov.io/gh/Luwen-Zhang/tabular_ensemble)\n[![Test](https://github.com/Luwen-Zhang/tabular_ensemble/actions/workflows/python-package.yml/badge.svg)](https://github.com/Luwen-Zhang/tabular_ensemble/actions/workflows/python-package.yml)\n[![](https://img.shields.io/badge/Python-3.10-blue)](https://github.com/Luwen-Zhang/tabular_ensemble)\n[![Documentation Status](https://readthedocs.org/projects/tabular-ensemble/badge/?version=latest)](https://tabular-ensemble.readthedocs.io/en/latest/?badge=latest)\n\nA framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction \ntasks from the following well-established model bases:\n\n* [`autogluon`](https://github.com/autogluon/autogluon)\n  * `\"LightGBM\"`, `\"CatBoost\"`, `\"XGBoost\"`, `\"Random Forest\"`, `\"Extremely Randomized Trees\"`, `\"K-Nearest Neighbors\"`, `\"Linear Regression\"`, `\"Neural Network with MXNet\"`, `\"Neural Network with PyTorch\"`, `\"Neural Network with FastAI\"`.\n* [`pytorch_widedeep`](https://github.com/jrzaurin/pytorch-widedeep)\n  * `\"TabMlp\"`, `\"TabResnet\"`, `\"TabTransformer\"`, `\"TabNet\"`, `\"SAINT\"`, `\"ContextAttentionMLP\"`, `\"SelfAttentionMLP\"`, `\"FTTransformer\"`, `\"TabPerceiver\"`, `\"TabFastFormer\"`.\n* [`pytorch_tabular`](https://github.com/manujosephv/pytorch_tabular)\n  * `\"Category Embedding\"`, `\"NODE\"`, `\"TabNet\"`, `\"TabTransformer\"`, `\"AutoInt\"`, `\"FTTransformer\"`.\n\nYou are able to implement your own models, data processing pipelines, and datasets under the flexible and \nwell-tested framework for consistent comparisons with baseline models, which is even easier when your own model is \nbased on `pytorch`. \n\n\u003cimg width=\"600\" alt=\"image\" src=\"https://github.com/user-attachments/assets/0fe47266-ae58-4e6b-bcf6-1108ebd762bc\"\u003e\n\nSupported features for all model bases:\n\n* Data processing\n  * Data splitting (training/validation/testing sets)\n  * Data imputation\n  * Data filtering\n  * Data scaling\n  * Data augmentation\n  * Feature augmentation\n  * Feature selection\n  * etc.\n* Multi-modal data\n* Loading [UCI datasets](https://archive.ics.uci.edu/datasets)\n* Data/result analysis\n  * Leaderboard\n  * Box plot\n  * Pair plot\n  * Pearson correlation\n  * Partial dependency plot (with bootstrapping)\n  * Feature importance (Permutation and SHAP)\n  * etc.\n* Building models upon other trained models\n* `pytorch_lightning`-based training for `pytorch` models\n* Gaussian-process-based Bayesian hyperparameter optimization\n* Cross-validation (including continuing from a cross-validation checkpoint)\n* Saving, loading, and migrating models\n\nThe package stands on the shoulder of the giants:\n\n* [scikit-learn](https://scikit-learn.org/)\n* [PyTorch](https://pytorch.org/)\n* [PyTorch Lightning](https://lightning.ai/)\n* etc. (See `requirements.txt`)\n\n\n## Installation/Usage\n\nA full documentation is available [here](https://tabular-ensemble.readthedocs.io/en/latest/index.html). For a quick start:\n\n1. `tabular_ensemble` can be installed using pypi by running the following command:\n\n```shell\npip install tabensemb[torch]\n```\n\nPlease use `pip install tabensemb` instead if you already have `torch\u003e=1.12.0` installed. Use `pip install tabensemb[test]` if you want to run unit tests. \n\nTo install from source,\n\n```shell\npip install -e .[torch]\n```\n\n2. (Optional) Run unit tests after installed `tabensemb[test]`:\n\n```shell\ncd test\npytest .\n```\n\n3. Place your `.csv` or `.xlsx` file in a `data` subfolder (e.g., `data/sample.csv`), and generate a configuration file in a `configs` subfolder (e.g., `configs/sample.py`), containing the following content\n```python\ncfg = {\n    \"database\": \"sample\",\n    \"continuous_feature_names\": [\"cont_0\", \"cont_1\", \"cont_2\", \"cont_3\", \"cont_4\"],\n    \"categorical_feature_names\": [\"cat_0\", \"cat_1\", \"cat_2\"],\n    \"label_name\": [\"target\"],\n}\n```\n\n4. Run the experiment using the configuration and the data using\n```python\npython main.py --base sample --epoch 10\n```\nwhere `--base` refers to the configuration file, and additional arguments (such as `--epoch` here) refer to those in `config/default.py`.\n\nSee the [documentation pages](https://tabular-ensemble.readthedocs.io/en/latest/index.html) for details.\n\n## Citation\n\nIf you use this repository, please cite us as:\n\n```text\n(Will be updated after released on arXiv or published)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLuwen-Zhang%2Ftabular_ensemble","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLuwen-Zhang%2Ftabular_ensemble","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLuwen-Zhang%2Ftabular_ensemble/lists"}