{"id":20313486,"url":"https://github.com/autogluon/tabrepo","last_synced_at":"2025-10-06T11:30:19.210Z","repository":{"id":204753541,"uuid":"641024512","full_name":"autogluon/tabrepo","owner":"autogluon","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-15T21:10:48.000Z","size":102457,"stargazers_count":44,"open_issues_count":19,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-01-20T23:17:10.529Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autogluon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-15T16:05:20.000Z","updated_at":"2025-01-15T19:47:40.000Z","dependencies_parsed_at":"2024-02-26T21:54:46.955Z","dependency_job_id":"b3b973e4-f0e7-4d33-b964-8305940a2e33","html_url":"https://github.com/autogluon/tabrepo","commit_stats":{"total_commits":406,"total_committers":5,"mean_commits":81.2,"dds":0.270935960591133,"last_synced_commit":"16fffe64b0865b229b1a6458ad2c2bee04ba6dec"},"previous_names":["autogluon/tabrepo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Ftabrepo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Ftabrepo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Ftabrepo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autogluon%2Ftabrepo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autogluon","download_url":"https://codeload.github.com/autogluon/tabrepo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235521067,"owners_count":19003380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T18:11:16.521Z","updated_at":"2025-10-06T11:30:19.204Z","avatar_url":"https://github.com/autogluon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cdiv align=\"center\"\u003e\n\n\u003cdiv id=\"user-content-toc\"\u003e\n  \u003cul align=\"center\" style=\"list-style: none;\"\u003e\n    \u003csummary\u003e\n      \u003cimg src=\"https://avatars.githubusercontent.com/u/210855230\" width=\"175\" alt=\"TabArena Logo\"/\u003e\n    \u003c/summary\u003e\n  \u003c/ul\u003e\n\u003c/div\u003e\n\n## A Living Benchmark for Machine Learning on Tabular Data 💫\n\n---\n\n| 🚀 [Leaderboard](https://huggingface.co/spaces/TabArena/leaderboard) | 📂 [Example Scripts](https://github.com/TabArena/tabarena_benchmarking_examples/tree/main) | 📊 [Dataset Curation](https://github.com/TabArena/tabarena_dataset_curation) | 📄 [ArXiv Paper](https://arxiv.org/abs/2506.16791) |\n|:-------------------------------------------------------------------:|:----------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|\n\n---\n\u003c/div\u003e\n\nTabArena is a living benchmarking system that makes benchmarking tabular machine learning models a reliable experience. TabArena implements best practices to ensure methods are represented at their peak potential, including cross-validated ensembles, strong hyperparameter search spaces contributed by the method authors, early stopping, model refitting, parallel bagging, memory usage estimation, and more.\n\nTabArena currently consists of:\n\n- 51 manually curated tabular datasets representing real-world tabular data tasks.\n- 9 to 30 evaluated splits per dataset.\n- 16 tabular machine learning methods, including 3 tabular foundation models.\n- 25,000,000 trained models across the benchmark, with all validation and test predictions cached to enable tuning and post-hoc ensembling analysis.\n- A [live TabArena leaderboard](https://huggingface.co/spaces/TabArena/leaderboard) showcasing the results.\n\n\n## 🕹️ Quickstart\n\n### Benchmarking and Running TabArena Models\nPlease refer to our [example scripts](https://github.com/TabArena/tabarena_benchmarking_examples/tree/main) for using TabArena.\n\n### Datasets \nPlease refer to our [dataset curation repository](https://github.com/TabArena/tabarena_dataset_curation) to learn more about or contributed data! \n\n### Evaluation \u0026 Reproducing Results\nTo locally reproduce individual configurations and compare with the TabArena results of those configurations, refer to [examples/tabarena/run_quickstart_tabarena.py](examples/tabarena/run_quickstart_tabarena.py).\n\nTo locally reproduce all tables and figures in the paper using the raw results data, run [examples/tabarena/run_generate_paper_figures.py](examples/tabarena/run_generate_paper_figures.py)\n\n### More Documentation\nTabArena code is currently being polished. Documentation for TabArena will be available soon.\n\n# 🪄 Installation\n\nTo install TabArena, ensure you are using Python 3.9-3.11. Then, run the following:\n\n```\ngit clone https://github.com/autogluon/tabrepo.git\npip install -e tabrepo/[benchmark]\n```\n\n# 📄 Publication for TabArena\n\nIf you use TabArena in a scientific publication, we would appreciate a reference to the following paper:\n\n**TabArena: A Living Benchmark for Machine Learning on Tabular Data**, \nNick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, Frank Hutter, Preprint., 2025\n\nLink to publication: [arXiv](https://arxiv.org/abs/2506.16791)\n\nBibtex entry:\n```bibtex\n@article{erickson2025tabarena,\n  title={TabArena: A Living Benchmark for Machine Learning on Tabular Data}, \n  author={Nick Erickson and Lennart Purucker and Andrej Tschalzev and David Holzmüller and Prateek Mutalik Desai and David Salinas and Frank Hutter},\n  year={2025},\n  journal={arXiv preprint arXiv:2506.16791},\n  url={https://arxiv.org/abs/2506.16791}, \n}\n```\n\n\n--- \n## Relation to TabRepo \n\nTabArena was built upon [TabRepo](https://arxiv.org/pdf/2311.02971) and now replaces TabRepo. To see details about TabRepo, the portfolio simulation repository, refer to [tabrepo.md](tabrepo.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautogluon%2Ftabrepo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautogluon%2Ftabrepo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautogluon%2Ftabrepo/lists"}