{"id":37633068,"url":"https://github.com/ai-sandbox/iltm","last_synced_at":"2026-01-16T11:00:06.608Z","repository":{"id":332498802,"uuid":"1098653256","full_name":"AI-sandbox/iLTM","owner":"AI-sandbox","description":"iLTM: Integrated Large Tabular Model","archived":false,"fork":false,"pushed_at":"2025-11-21T10:16:49.000Z","size":153,"stargazers_count":11,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-14T09:14:29.505Z","etag":null,"topics":["deep-learning","machine-learning","meta-learning","pretraining","python","pytorch","tabular-data"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2511.15941","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AI-sandbox.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-18T01:10:01.000Z","updated_at":"2025-11-29T16:42:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/AI-sandbox/iLTM","commit_stats":null,"previous_names":["ai-sandbox/iltm"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/AI-sandbox/iLTM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-sandbox%2FiLTM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-sandbox%2FiLTM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-sandbox%2FiLTM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-sandbox%2FiLTM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AI-sandbox","download_url":"https://codeload.github.com/AI-sandbox/iLTM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-sandbox%2FiLTM/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478106,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","meta-learning","pretraining","python","pytorch","tabular-data"],"created_at":"2026-01-16T11:00:06.230Z","updated_at":"2026-01-16T11:00:06.530Z","avatar_url":"https://github.com/AI-sandbox.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# iLTM: Integrated Large Tabular Model\n\n[![PyPI](https://img.shields.io/pypi/v/iltm.svg?color=green)](https://pypi.org/project/iltm)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/AI-sandbox/iLTM/blob/main/LICENSE)\n[![Downloads](https://img.shields.io/pypi/dm/iltm)](https://pypistats.org/packages/iltm)\n[![Python Versions](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue)](https://pypi.org/project/iltm/)\n[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-dbonet%2FiLTM-yellow)](https://huggingface.co/dbonet/iLTM)\n\n\niLTM is a foundation model for tabular data that integrates tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptron (MLP) neural networks, and retrieval. iLTM automatically handles feature scaling, categorical features, and missing values.\n\nWe release open weights of pre-trained model checkpoints that consistently achieve superior performance across tabular classification and regression tasks, from small to large and high-dimensional tasks.\n\n![iLTM architecture diagram](https://github.com/AI-sandbox/iLTM/raw/main/iltm-diagram.svg)\n\n### Install\n\niLTM is accessed through Python. You can install the package via pip:\n```\npip install iltm\n```\n\niLTM works on Linux, macOS and Windows, and can be executed on CPU and GPU, although GPU is **highly recommended** for faster execution.\n\nPre-trained model checkpoints are automatically downloaded from [Hugging Face](https://huggingface.co/dbonet/iLTM) on first use.\nBy default, checkpoints are stored in platform-specific cache directories (e.g., `~/.cache/iltm` on Linux, `~/Library/Caches/iltm` on macOS).\nYou can specify where model checkpoints are stored by setting the `ILTM_CKPT_DIR` environment variable:\n\n```bash\nexport ILTM_CKPT_DIR=/path/to/checkpoints\n```\n\n\u003e [!NOTE]\n\u003e The first call to `iLTMRegressor` or `iLTMClassifier` downloads the selected\n\u003e checkpoint. Later runs reuse the cached weights from `ILTM_CKPT_DIR` or the\n\u003e default cache location.\n\n\u003e [!TIP]\n\u003e For interactive work on a local machine it is often worth pointing\n\u003e `ILTM_CKPT_DIR` to a fast local disk to avoid repeated downloads across\n\u003e environments.\n\n### Quick Start\n\niLTM is designed to be easy to use, with an API similar to scikit-learn.\n\n```py\nfrom iltm import iLTMRegressor, iLTMClassifier\n\n# Regression\nreg = iLTMRegressor().fit(X_train, y_train)\ny_pred = reg.predict(X_test)\n\n# Classification\nclf = iLTMClassifier().fit(X_train, y_train)\nproba = clf.predict_proba(X_test)\ny_hat = clf.predict(X_test)\n\n# With time limit (returns partial ensemble if time runs out)\nreg = iLTMRegressor().fit(X_train, y_train, fit_max_time=3600)  # 1 hour limit\n```\n\n### Model Checkpoints\n\nAvailable checkpoint names:\n- `\"xgbrconcat\"` (default): Robust preprocessing + XGBoost embeddings + concatenation\n- `\"cbrconcat\"`: Robust preprocessing + CatBoost embeddings + concatenation\n- `\"r128bn\"`: Robust preprocessing with 128-dim bottleneck\n- `\"rnobn\"`: Robust preprocessing without bottleneck\n- `\"xgb\"`: XGBoost embeddings only\n- `\"catb\"`: CatBoost embeddings only\n- `\"rtr\"`: Robust preprocessing with retrieval\n- `\"rtrcb\"`: CatBoost embeddings with retrieval\n\nYou can also provide a local path to a checkpoint file.\n\nCommon key args:\n- checkpoint: checkpoint name or path to model file. Default \"xgbrconcat\".\n- device: torch device string. Default \"cuda:0\".\n- n_ensemble: number of generated predictors.\n- batch_size: batch size for weight prediction and inference.\n- preprocessing: \"realmlp_td_s_v0\" or \"minimal\" or \"none\".\n- cat_features: list of categorical column indices.\n- tree_embedding: enable GBDT leaf embeddings.\n- tree_model: \"XGBoost_hist\" or \"CatBoost\".\n- concat_tree_with_orig_features: concatenate original features with embeddings.\n- finetuning: end to end finetuning.\n- Retrieval: do_retrieval, retrieval_alpha, retrieval_temperature, retrieval_distance.\n\nRegressor only:\n- clip_predictions: clip to train target range.\n- normalize_predictions: z-normalize outputs before unscaling.\n\nClassifier only:\n- voting: \"soft\" or \"hard\".\n\n## Hyperparameter Optimization\n\niLTM performs best when you tune its hyperparameters.\n\n### Recommended search space\n\nThe package exposes a recommended search space via `iltm.get_hyperparameter_search_space`, a plain dictionary that maps hyperparameter names to small specs.\n\n\u003e [!TIP]\n\u003e When running hyperparameter optimization with time constraints, you can use the `fit_max_time` parameter in `fit()` to limit training time per configuration. The model will return a partial ensemble if the time limit is reached. \n\nThe checkpoint parameter is part of this space. It is responsible for selecting one of the built in model checkpoints, which in turn sets other fields such as `preprocessing`, `tree_embedding`, and others.\n\nThe specification format is intentionally minimal so that it can be re-used in any hyperparameter optimization library or custom tuning procedure.\n\n\n- `iltm.get_hyperparameter_search_space()` gives you the canonical space definition.\n- `iltm.sample_hyperparameters(rng)` draws a single random configuration from that space for quick baselines and smoke tests.\n\n\u003e [!TIP]\n\u003e `sample_hyperparameters` is mainly intended for quick baselines, smoke\n\u003e tests, or simple random search. For more serious tuning runs it is\n\u003e usually better to adapt the search space from\n\u003e `get_hyperparameter_search_space` into your optimization method of\n\u003e choice, and let that method decide which configurations to try.\n\n\n## Development\n\nTo run the tests:\n\n```bash\npip install -e \".[dev]\"\npytest tests/\n```\n\n## Citation\nIf you use iLTM in your research, please cite our [paper](https://arxiv.org/abs/2511.15941):\n\n```bibtex\n@article{bonet2025iltm,\n  title={iLTM: Integrated Large Tabular Model},\n  author={Bonet, David and Comajoan Cara, Marçal and Calafell, Alvaro and Mas Montserrat, Daniel and Ioannidis, Alexander G},\n  journal={arXiv preprint arXiv:2511.15941},\n  year={2025},\n}\n```\n\n## License\n\n© Contributors, 2025. Licensed under the [Apache-2.0](https://github.com/AI-sandbox/iLTM/blob/main/LICENSE) license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-sandbox%2Filtm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fai-sandbox%2Filtm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-sandbox%2Filtm/lists"}