{"id":25128652,"url":"https://github.com/PriorLabs/TabPFN","last_synced_at":"2025-10-23T08:31:08.639Z","repository":{"id":41398136,"uuid":"509436902","full_name":"PriorLabs/TabPFN","owner":"PriorLabs","description":"⚡ TabPFN: Foundation Model for Tabular Data ⚡","archived":false,"fork":false,"pushed_at":"2025-02-05T17:00:26.000Z","size":265249,"stargazers_count":2463,"open_issues_count":27,"forks_count":198,"subscribers_count":30,"default_branch":"main","last_synced_at":"2025-02-07T09:00:11.346Z","etag":null,"topics":["data-science","foundation-models","machine-learning","tabpfn","tabular-data"],"latest_commit_sha":null,"homepage":"http://priorlabs.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PriorLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-01T11:54:47.000Z","updated_at":"2025-02-07T08:15:20.000Z","dependencies_parsed_at":"2023-02-19T10:45:53.593Z","dependency_job_id":"1a6bc8b5-9c07-46cd-969a-a26510c4dbf9","html_url":"https://github.com/PriorLabs/TabPFN","commit_stats":null,"previous_names":["priorlabs/tabpfn"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PriorLabs%2FTabPFN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PriorLabs%2FTabPFN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PriorLabs%2FTabPFN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PriorLabs%2FTabPFN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PriorLabs","download_url":"https://codeload.github.com/PriorLabs/TabPFN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237801562,"owners_count":19368576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","foundation-models","machine-learning","tabpfn","tabular-data"],"created_at":"2025-02-08T12:01:17.418Z","updated_at":"2025-10-23T08:31:08.633Z","avatar_url":"https://github.com/PriorLabs.png","language":"Python","funding_links":[],"categories":["其他_机器学习与深度学习","Categories","Repos","🤖 AI \u0026 Machine Learning","Python"],"sub_categories":["🧪 General Machine Learning"],"readme":"# TabPFN\n\n[![PyPI version](https://badge.fury.io/py/tabpfn.svg)](https://badge.fury.io/py/tabpfn)\n[![Downloads](https://pepy.tech/badge/tabpfn)](https://pepy.tech/project/tabpfn)\n[![Discord](https://img.shields.io/discord/1285598202732482621?color=7289da\u0026label=Discord\u0026logo=discord\u0026logoColor=ffffff)](https://discord.gg/BHnX2Ptf4j)\n[![Documentation](https://img.shields.io/badge/docs-priorlabs.ai-blue)](https://priorlabs.ai/docs)\n[![colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PriorLabs/TabPFN/blob/main/examples/notebooks/TabPFN_Demo_Local.ipynb)\n[![Python Versions](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)](https://pypi.org/project/tabpfn/)\n\n\u003cimg src=\"https://github.com/PriorLabs/tabpfn-extensions/blob/main/tabpfn_summary.webp\" width=\"80%\" alt=\"TabPFN Summary\"\u003e\n\n## 🏁 Quick Start\n\n### Interactive Notebook Tutorial\n\u003e [!TIP]\n\u003e\n\u003e Dive right in with our interactive Colab notebook! It's the best way to get a hands-on feel for TabPFN, walking you through installation, classification, and regression examples.\n\u003e\n\u003e [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PriorLabs/TabPFN/blob/main/examples/notebooks/TabPFN_Demo_Local.ipynb)\n\n\u003e ⚡ **GPU Recommended**:\n\u003e For optimal performance, use a GPU (even older ones with ~8GB VRAM work well; 16GB needed for some large datasets).\n\u003e On CPU, only small datasets (≲1000 samples) are feasible.\n\u003e No GPU? Use our free hosted inference via [TabPFN Client](https://github.com/PriorLabs/tabpfn-client).\n\n### Installation\nOfficial installation (pip)\n```bash\npip install tabpfn\n```\nOR installation from source\n```bash\npip install \"tabpfn @ git+https://github.com/PriorLabs/TabPFN.git\"\n```\nOR local development installation\n```bash\n\ngit clone https://github.com/PriorLabs/TabPFN.git --depth 1\npip install -e \"TabPFN[dev]\"\n```\n\n### Basic Usage\n\n#### Classification\n```python\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.metrics import accuracy_score, roc_auc_score\nfrom sklearn.model_selection import train_test_split\n\nfrom tabpfn import TabPFNClassifier\n\n# Load data\nX, y = load_breast_cancer(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)\n\n# Initialize a classifier\nclf = TabPFNClassifier()\nclf.fit(X_train, y_train)\n\n# Predict probabilities\nprediction_probabilities = clf.predict_proba(X_test)\nprint(\"ROC AUC:\", roc_auc_score(y_test, prediction_probabilities[:, 1]))\n\n# Predict labels\npredictions = clf.predict(X_test)\nprint(\"Accuracy\", accuracy_score(y_test, predictions))\n```\n\n#### Regression\n```python\nfrom sklearn.datasets import fetch_openml\nfrom sklearn.metrics import mean_squared_error, r2_score\nfrom sklearn.model_selection import train_test_split\n\nfrom tabpfn import TabPFNRegressor\n\n# Load Boston Housing data\ndf = fetch_openml(data_id=531, as_frame=True)  # Boston Housing dataset\nX = df.data\ny = df.target.astype(float)  # Ensure target is float for regression\n\n# Train-test split\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)\n\n# Initialize the regressor\nregressor = TabPFNRegressor()\nregressor.fit(X_train, y_train)\n\n# Predict on the test set\npredictions = regressor.predict(X_test)\n\n# Evaluate the model\nmse = mean_squared_error(y_test, predictions)\nr2 = r2_score(y_test, predictions)\n\nprint(\"Mean Squared Error (MSE):\", mse)\nprint(\"R² Score:\", r2)\n```\n\n### Best Results\n\nFor optimal performance, use the `AutoTabPFNClassifier` or `AutoTabPFNRegressor` for post-hoc ensembling. These can be found in the [TabPFN Extensions](https://github.com/PriorLabs/tabpfn-extensions) repository. Post-hoc ensembling combines multiple TabPFN models into an ensemble.\n\n**Steps for Best Results:**\n1. Install the extensions:\n   ```bash\n   git clone https://github.com/priorlabs/tabpfn-extensions.git\n   pip install -e tabpfn-extensions\n   ```\n\n2.\n   ```python\n   from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier\n\n   clf = AutoTabPFNClassifier(max_time=120, device=\"cuda\") # 120 seconds tuning time\n   clf.fit(X_train, y_train)\n   predictions = clf.predict(X_test)\n   ```\n\n## 🌐 TabPFN Ecosystem\n\nChoose the right TabPFN implementation for your needs:\n\n- **[TabPFN Client](https://github.com/priorlabs/tabpfn-client)**\n  Simple API client for using TabPFN via cloud-based inference.\n\n- **[TabPFN Extensions](https://github.com/priorlabs/tabpfn-extensions)**\n  A powerful companion repository packed with advanced utilities, integrations, and features - great place to contribute:\n\n  - 🔍 **`interpretability`**: Gain insights with SHAP-based explanations, feature importance, and selection tools.\n  - 🕵️‍♂️ **`unsupervised`**: Tools for outlier detection and synthetic tabular data generation.\n  - 🧬 **`embeddings`**: Extract and use TabPFN’s internal learned embeddings for downstream tasks or analysis.\n  - 🧠 **`many_class`**: Handle multi-class classification problems that exceed TabPFN's built-in class limit.\n  - 🌲 **`rf_pfn`**: Combine TabPFN with traditional models like Random Forests for hybrid approaches.\n  - ⚙️ **`hpo`**: Automated hyperparameter optimization tailored to TabPFN.\n  - 🔁 **`post_hoc_ensembles`**: Boost performance by ensembling multiple TabPFN models post-training.\n\n  ✨ To install:\n  ```bash\n  git clone https://github.com/priorlabs/tabpfn-extensions.git\n  pip install -e tabpfn-extensions\n  ```\n\n- **[TabPFN (this repo)](https://github.com/priorlabs/tabpfn)**\n  Core implementation for fast and local inference with PyTorch and CUDA support.\n\n- **[TabPFN UX](https://ux.priorlabs.ai)**\n  No-code graphical interface to explore TabPFN capabilities—ideal for business users and prototyping.\n\n## 🔀 TabPFN Workflow at a Glance\nFollow this decision tree to build your model and choose the right extensions from our ecosystem. It walks you through critical questions about your data, hardware, and performance needs, guiding you to the best solution for your specific use case.\n\n```mermaid\n---\nconfig:\n  theme: 'default'\n  themeVariables:\n    edgeLabelBackground: 'white'\n---\ngraph LR\n    %% 1. DEFINE COLOR SCHEME \u0026 STYLES\n    classDef default fill:#fff,stroke:#333,stroke-width:2px,color:#333;\n    classDef start_node fill:#e8f5e9,stroke:#43a047,stroke-width:2px,color:#333;\n    classDef process_node fill:#e0f2f1,stroke:#00796b,stroke-width:2px,color:#333;\n    classDef decision_node fill:#fff8e1,stroke:#ffa000,stroke-width:2px,color:#333;\n\n    style Infrastructure fill:#fff,stroke:#ccc,stroke-width:5px;\n    style Unsupervised fill:#fff,stroke:#ccc,stroke-width:5px;\n    style Data fill:#fff,stroke:#ccc,stroke-width:5px;\n    style Performance fill:#fff,stroke:#ccc,stroke-width:5px;\n    style Interpretability fill:#fff,stroke:#ccc,stroke-width:5px;\n\n    %% 2. DEFINE GRAPH STRUCTURE\n    subgraph Infrastructure\n        start((Start)) --\u003e gpu_check[\"GPU available?\"];\n        gpu_check -- Yes --\u003e local_version[\"Use TabPFN\u003cbr/\u003e(local PyTorch)\"];\n        gpu_check -- No --\u003e api_client[\"Use TabPFN-Client\u003cbr/\u003e(cloud API)\"];\n        task_type[\"What is\u003cbr/\u003eyour task?\"]\n    end\n\n    local_version --\u003e task_type\n    api_client --\u003e task_type\n\n    end_node((Workflow\u003cbr/\u003eComplete));\n\n    subgraph Unsupervised\n        unsupervised_type[\"Select\u003cbr/\u003eUnsupervised Task\"];\n        unsupervised_type --\u003e imputation[\"Imputation\"]\n        unsupervised_type --\u003e data_gen[\"Data\u003cbr/\u003eGeneration\"];\n        unsupervised_type --\u003e tabebm[\"Data\u003cbr/\u003eAugmentation\"];\n        unsupervised_type --\u003e density[\"Outlier\u003cbr/\u003eDetection\"];\n        unsupervised_type --\u003e embedding[\"Get\u003cbr/\u003eEmbeddings\"];\n    end\n\n\n    subgraph Data\n        data_check[\"Data Checks\"];\n        model_choice[\"Samples \u003e 10k or\u003cbr/\u003eClasses \u003e 10?\"]\n        data_check -- \"Table Contains Text Data?\" --\u003e api_backend_note[\"Note: API client has\u003cbr/\u003enative text support\"];\n        api_backend_note --\u003e model_choice;\n        data_check -- \"Time-Series Data?\" --\u003e ts_features[\"Use Time-Series\u003cbr/\u003eFeatures\"];\n        ts_features --\u003e model_choice;\n        data_check -- \"Purely Tabular\" --\u003e model_choice;\n        model_choice -- \"No\" --\u003e finetune_check;\n        model_choice -- \"Yes, \u003e10k samples\" --\u003e subsample[\"Large Datasets Guide\u003cbr/\u003e\"];\n        model_choice -- \"Yes, \u003e10 classes\" --\u003e many_class[\"Many-Class\u003cbr/\u003eMethod\"];\n    end\n\n    subgraph Performance\n        finetune_check[\"Need\u003cbr/\u003eFinetuning?\"];\n        performance_check[\"Need Even Better Performance?\"];\n        speed_check[\"Need faster inference\u003cbr/\u003eat prediction time?\"];\n        kv_cache[\"Enable KV Cache\u003cbr/\u003e(fit_mode='fit_with_cache')\u003cbr/\u003e\u003csmall\u003eFaster predict; +Memory ~O(N×F)\u003c/small\u003e\"];\n        tuning_complete[\"Tuning Complete\"];\n\n        finetune_check -- Yes --\u003e finetuning[\"Finetuning\"];\n        finetune_check -- No --\u003e performance_check;\n\n        finetuning --\u003e performance_check;\n\n        performance_check -- No --\u003e tuning_complete;\n        performance_check -- Yes --\u003e hpo[\"HPO\"];\n        performance_check -- Yes --\u003e post_hoc[\"Post-Hoc\u003cbr/\u003eEnsembling\"];\n        performance_check -- Yes --\u003e more_estimators[\"More\u003cbr/\u003eEstimators\"];\n        performance_check -- Yes --\u003e speed_check;\n\n        speed_check -- Yes --\u003e kv_cache;\n        speed_check -- No --\u003e tuning_complete;\n\n        hpo --\u003e tuning_complete;\n        post_hoc --\u003e tuning_complete;\n        more_estimators --\u003e tuning_complete;\n        kv_cache --\u003e tuning_complete;\n    end\n\n    subgraph Interpretability\n\n        tuning_complete --\u003e interpretability_check;\n\n        interpretability_check[\"Need\u003cbr/\u003eInterpretability?\"];\n\n        interpretability_check --\u003e feature_selection[\"Feature Selection\"];\n        interpretability_check --\u003e partial_dependence[\"Partial Dependence Plots\"];\n        interpretability_check --\u003e shapley[\"Explain with\u003cbr/\u003eSHAP\"];\n        interpretability_check --\u003e shap_iq[\"Explain with\u003cbr/\u003eSHAP IQ\"];\n        interpretability_check -- No --\u003e end_node;\n\n        feature_selection --\u003e end_node;\n        partial_dependence --\u003e end_node;\n        shapley --\u003e end_node;\n        shap_iq --\u003e end_node;\n\n    end\n\n    %% 3. LINK SUBGRAPHS AND PATHS\n    task_type -- \"Classification or Regression\" --\u003e data_check;\n    task_type -- \"Unsupervised\" --\u003e unsupervised_type;\n\n    subsample --\u003e finetune_check;\n    many_class --\u003e finetune_check;\n\n    %% 4. APPLY STYLES\n    class start,end_node start_node;\n    class local_version,api_client,imputation,data_gen,tabebm,density,embedding,api_backend_note,ts_features,subsample,many_class,finetuning,feature_selection,partial_dependence,shapley,shap_iq,hpo,post_hoc,more_estimators,kv_cache process_node;\n    class gpu_check,task_type,unsupervised_type,data_check,model_choice,finetune_check,interpretability_check,performance_check,speed_check decision_node;\n    class tuning_complete process_node;\n\n    %% 5. ADD CLICKABLE LINKS (INCLUDING KV CACHE EXAMPLE)\n    click local_version \"https://github.com/PriorLabs/TabPFN\" \"TabPFN Backend Options\" _blank\n    click api_client \"https://github.com/PriorLabs/tabpfn-client\" \"TabPFN API Client\" _blank\n    click api_backend_note \"https://github.com/PriorLabs/tabpfn-client\" \"TabPFN API Backend\" _blank\n    click unsupervised_type \"https://github.com/PriorLabs/tabpfn-extensions\" \"TabPFN Extensions\" _blank\n    click imputation \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/unsupervised/imputation.py\" \"TabPFN Imputation Example\" _blank\n    click data_gen \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/unsupervised/generate_data.py\" \"TabPFN Data Generation Example\" _blank\n    click tabebm \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/tabebm/tabebm_augment_real_world_data.ipynb\" \"TabEBM Data Augmentation Example\" _blank\n    click density \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/unsupervised/density_estimation_outlier_detection.py\" \"TabPFN Density Estimation/Outlier Detection Example\" _blank\n    click embedding \"https://github.com/PriorLabs/tabpfn-extensions/tree/main/examples/embedding\" \"TabPFN Embedding Example\" _blank\n    click ts_features \"https://github.com/PriorLabs/tabpfn-time-series\" \"TabPFN Time-Series Example\" _blank\n    click many_class \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/many_class/many_class_classifier_example.py\" \"Many Class Example\" _blank\n    click finetuning \"https://github.com/PriorLabs/TabPFN/blob/main/examples/finetune_classifier.py\" \"Finetuning Example\" _blank\n    click feature_selection \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/interpretability/feature_selection.py\" \"Feature Selection Example\" _blank\n    click partial_dependence \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/interpretability/pdp_example.py\" \"Partial Dependence Plots Example\" _blank\n    click shapley \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/interpretability/shap_example.py\" \"Shapley Values Example\" _blank\n    click shap_iq \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/interpretability/shapiq_example.py\" \"SHAP IQ Example\" _blank\n    click post_hoc \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/phe/phe_example.py\" \"Post-Hoc Ensemble Example\" _blank\n    click hpo \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/hpo/tuned_tabpfn.py\" \"HPO Example\" _blank\n    click subsample \"https://github.com/PriorLabs/tabpfn-extensions/blob/main/examples/large_datasets/large_datasets_example.py\" \"Large Datasets Example\" _blank\n    click kv_cache \"https://github.com/PriorLabs/TabPFN/blob/main/examples/kv_cache_fast_prediction.py\" \"KV Cache Fast Prediction Example\" _blank\n\n```\n\n## 📜 License\n\nPrior Labs License (Apache 2.0 with additional attribution requirement): [here](https://priorlabs.ai/tabpfn-license/)\n\n## 🤝 Join Our Community\n\nWe're building the future of tabular machine learning and would love your involvement:\n\n1. **Connect \u0026 Learn**:\n   - Join our [Discord Community](https://discord.gg/VJRuU3bSxt)\n   - Read our [Documentation](https://priorlabs.ai/docs)\n   - Check out [GitHub Issues](https://github.com/priorlabs/tabpfn/issues)\n\n2. **Contribute**:\n   - Report bugs or request features\n   - Submit pull requests (please make sure to open an issue discussing the feature/bug first if none exists)\n   - Share your research and use cases\n\n3. **Stay Updated**: Star the repo and join Discord for the latest updates\n\n## 📚 Citation\n\nYou can read our paper explaining TabPFN [here](https://doi.org/10.1038/s41586-024-08328-6).\n\n```bibtex\n@article{hollmann2025tabpfn,\n title={Accurate predictions on small data with a tabular foundation model},\n author={Hollmann, Noah and M{\\\"u}ller, Samuel and Purucker, Lennart and\n         Krishnakumar, Arjun and K{\\\"o}rfer, Max and Hoo, Shi Bin and\n         Schirrmeister, Robin Tibor and Hutter, Frank},\n journal={Nature},\n year={2025},\n month={01},\n day={09},\n doi={10.1038/s41586-024-08328-6},\n publisher={Springer Nature},\n url={https://www.nature.com/articles/s41586-024-08328-6},\n}\n\n@inproceedings{hollmann2023tabpfn,\n  title={TabPFN: A transformer that solves small tabular classification problems in a second},\n  author={Hollmann, Noah and M{\\\"u}ller, Samuel and Eggensperger, Katharina and Hutter, Frank},\n  booktitle={International Conference on Learning Representations 2023},\n  year={2023}\n}\n```\n\n\n\n## ❓ FAQ\n\n### **Usage \u0026 Compatibility**\n\n**Q: What dataset sizes work best with TabPFN?**\nA: TabPFN is optimized for **datasets up to 10,000 rows**. For larger datasets, consider using **Random Forest preprocessing** or other extensions. See our [Colab notebook](https://colab.research.google.com/drive/154SoIzNW1LHBWyrxNwmBqtFAr1uZRZ6a#scrollTo=OwaXfEIWlhC8) for strategies.\n\n**Q: Why can't I use TabPFN with Python 3.8?**\nA: TabPFN v2 requires **Python 3.9+** due to newer language features. Compatible versions: **3.9, 3.10, 3.11, 3.12, 3.13**.\n\n### **Installation \u0026 Setup**\n\n**Q: How do I use TabPFN without an internet connection?**\n\nTabPFN automatically downloads model weights when first used. For offline usage:\n\n**Using the Provided Download Script**\n\nIf you have the TabPFN repository, you can use the included script to download all models (including ensemble variants):\n\n```bash\n# After installing TabPFN\npython scripts/download_all_models.py\n```\n\nThis script will download the main classifier and regressor models, as well as all ensemble variant models to your system's default cache directory.\n\n**Manual Download**\n\n1. Download the model files manually from HuggingFace (or use the S3 fallback if HuggingFace is unavailable):\n   - Classifier: [tabpfn-v2-classifier.ckpt](https://huggingface.co/Prior-Labs/TabPFN-v2-clf/resolve/main/tabpfn-v2-classifier.ckpt) ([S3 fallback](https://storage.googleapis.com/tabpfn-v2-model-files/05152025/tabpfn-v2-classifier.ckpt))\n   - Regressor: [tabpfn-v2-regressor.ckpt](https://huggingface.co/Prior-Labs/TabPFN-v2-reg/resolve/main/tabpfn-v2-regressor.ckpt) ([S3 fallback](https://storage.googleapis.com/tabpfn-v2-model-files/05152025/tabpfn-v2-regressor.ckpt))\n\n2. Place the file in one of these locations:\n   - Specify directly: `TabPFNClassifier(model_path=\"/path/to/model.ckpt\")`\n   - Set environment variable: `export TABPFN_MODEL_CACHE_DIR=\"/path/to/dir\"` (see environment variables FAQ below)\n   - Default OS cache directory:\n     - Windows: `%APPDATA%\\tabpfn\\`\n     - macOS: `~/Library/Caches/tabpfn/`\n     - Linux: `~/.cache/tabpfn/`\n\n**Q: I'm getting a `pickle` error when loading the model. What should I do?**\nA: Try the following:\n- Download the newest version of tabpfn `pip install tabpfn --upgrade`\n- Ensure model files downloaded correctly (re-download if needed)\n\n**Q: What environment variables can I use to configure TabPFN?**\nA: TabPFN uses Pydantic settings for configuration, supporting environment variables and `.env` files:\n\n**Model Configuration:**\n- `TABPFN_MODEL_CACHE_DIR`: Custom directory for caching downloaded TabPFN models (default: platform-specific user cache directory)\n- `TABPFN_ALLOW_CPU_LARGE_DATASET`: Allow running TabPFN on CPU with large datasets (\u003e1000 samples). Set to `true` to override the CPU limitation. Note: This will be very slow!\n\n**PyTorch Settings:**\n- `PYTORCH_CUDA_ALLOC_CONF`: PyTorch CUDA memory allocation configuration to optimize GPU memory usage (default: `max_split_size_mb:512`). See [PyTorch CUDA documentation](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf) for more information.\n\nExample:\n```bash\nexport TABPFN_MODEL_CACHE_DIR=\"/path/to/models\"\nexport TABPFN_ALLOW_CPU_LARGE_DATASET=true\nexport PYTORCH_CUDA_ALLOC_CONF=\"max_split_size_mb:512\"\n```\n\nOr simply set them in your `.env`\n\n**Q: How do I save and load a trained TabPFN model?**\nA: Use :func:`save_fitted_tabpfn_model` to persist a fitted estimator and reload\nit later with :func:`load_fitted_tabpfn_model` (or the corresponding\n``load_from_fit_state`` class methods).\n\n```python\nfrom tabpfn import TabPFNRegressor\nfrom tabpfn.model_loading import (\n    load_fitted_tabpfn_model,\n    save_fitted_tabpfn_model,\n)\n\n# Train the regressor on GPU\nreg = TabPFNRegressor(device=\"cuda\")\nreg.fit(X_train, y_train)\nsave_fitted_tabpfn_model(reg, \"my_reg.tabpfn_fit\")\n\n# Later or on a CPU-only machine\nreg_cpu = load_fitted_tabpfn_model(\"my_reg.tabpfn_fit\", device=\"cpu\")\n```\n\nTo store just the foundation model weights (without a fitted estimator) use\n``save_tabpfn_model(reg.model_, \"my_tabpfn.ckpt\")``. This merely saves a\ncheckpoint of the pre-trained weights so you can later create and fit a fresh\nestimator. Reload the checkpoint with ``load_model_criterion_config``.\n\n### **Performance \u0026 Limitations**\n\n**Q: Can TabPFN handle missing values?**\nA: **Yes!**\n\n**Q: How can I improve TabPFN’s performance?**\nA: Best practices:\n- Use **AutoTabPFNClassifier** from [TabPFN Extensions](https://github.com/priorlabs/tabpfn-extensions) for post-hoc ensembling\n- Feature engineering: Add domain-specific features to improve model performance\n\nNot effective:\n- Adapt feature scaling\n- Convert categorical features to numerical values (e.g., one-hot encoding)\n\n## 🛠️ Development\n\n1. Setup environment:\n```bash\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\ngit clone https://github.com/PriorLabs/TabPFN.git\ncd TabPFN\npip install -e \".[dev]\"\npre-commit install\n```\n\n2. Before committing:\n```bash\npre-commit run --all-files\n```\n\n3. Run tests:\n```bash\npytest tests/\n```\n\n## 📊 Anonymous Telemetry\n\nThis project collects **anonymous usage telemetry** by default.\n\nThe data is used exclusively to help us understand how the library is being used and to guide future improvements.\n\n- **No personal data is collected**\n- **No code, model inputs, or outputs are ever sent**\n- **Data is strictly anonymous and cannot be linked to individuals**\n\n### What we collect\nWe only collect high-level, non-identifying information such as:\n- Package version\n- Python version\n- How often fit and inference are called, including simple metadata like the dimensionality of the input and the type of task (e.g., classification vs. regression) (:warning: never the data itself)\n\nThis data is processed in compliance with the **General Data Protection Regulation (GDPR)** principles of data minimization and purpose limitation.\n\nFor more details, please see our [Privacy Policy](https://priorlabs.ai/privacy_policy/).\n\n### How to opt out\nIf you prefer not to send telemetry, you can disable it by setting the following environment variable:\n\n```bash\nexport TABPFN_DISABLE_TELEMETRY=1\n```\n---\n\nBuilt with ❤️ by [Prior Labs](https://priorlabs.ai) - Copyright (c) 2025 Prior Labs GmbH\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPriorLabs%2FTabPFN","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPriorLabs%2FTabPFN","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPriorLabs%2FTabPFN/lists"}