https://github.com/tidymodels/tabpfn

Foundation Model for Tabular Data via reticulate
https://github.com/tidymodels/tabpfn

Last synced: 6 months ago
JSON representation

Foundation Model for Tabular Data via reticulate

Host: GitHub
URL: https://github.com/tidymodels/tabpfn
Owner: tidymodels
License: apache-2.0
Created: 2025-01-27T16:59:44.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-12-04T14:34:18.000Z (8 months ago)
Last Synced: 2026-01-14T21:39:20.853Z (6 months ago)
Language: R
Homepage: http://tabpfn.tidymodels.org/
Size: 1.7 MB
Stars: 19
Watchers: 3
Forks: 3
Open Issues: 4
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# tabpfn

[![CRAN status](https://www.r-pkg.org/badges/version/tabpfn)](https://CRAN.R-project.org/package=tabpfn)

[![R-CMD-check](https://github.com/tidymodels/tabpfn/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/tabpfn/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/tidymodels/tabpfn/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/tabpfn?branch=main)

tabpfn, meaning prior fitted networks for tabular data, is a deep-learning model. See:

- [_Transformers Can Do Bayesian Inference_](https://arxiv.org/abs/2112.10510) (arXiv, 2021)

- [_TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second_](https://arxiv.org/abs/2207.01848) (arXiv, 2022)

- [_Accurate predictions on small data with a tabular foundation model_](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C7&q=%22Accurate+predictions+on+small+data+with+a+tabular+foundation+model%22) (Nature, 2025)

This R package is a wrapper of the [Python library](https://github.com/PriorLabs/tabpfn) via reticulate. It has an idiomatic R syntax using standard S3 methods. 

## Installation

You can install the development version of tabpfn like so:

```{r}

#| eval: false

require(pak)

pak(c("tidymodels/tabpfn"), ask = FALSE)

```

You'll need a Python virtual environment to access the underlying library. After installing the R package, tabpfn will install the required Python bits when you first fit a model: 

```

> library(tabpfn)

>

> predictors <- mtcars[, -1]

> outcome <- mtcars[, 1]

>

> # XY interface

> mod <- tab_pfn(predictors, outcome)

Downloading uv...Done!

Downloading cpython-3.12.12 (download) (15.9MiB)

 Downloading cpython-3.12.12 (download)

Downloading setuptools (1.1MiB)

Downloading scikit-learn (8.2MiB)

Downloading numpy (4.9MiB)

 Downloading llvmlite

 Downloading torch

Installed 58 packages in 350ms

> mod

tabpfn Regression Model

Training set

i 32 data points

i 10 predictors

```

## Example

```{r}

#| label: tab-start-up

library(tabpfn)

```

To fit a model: 

```{r}

#| label: mtcars

set.seed(364)

reg_mod <- tab_pfn(mtcars[1:25, -1], mtcars$mpg[1:25])

reg_mod

```

In addition to the x/y interface shown above, there are also formula and recipes interfaces. 

Prediction follows the usual S3 `predict()` method: 

```{r}

#| label: mtcars-pred

predict(reg_mod, mtcars[26:32, -1])

```

tabpfn follows the tidymodels prediction convention: a data frame is always returned with a standard set of column names. 

For a classification model, the outcome should always be a factor vector. For example, using these data from the modeldata package: 

```{r}

#| label: cls

#| results: none

library(modeldata)

library(ggplot2)

two_cls_train <- parabolic[1:400,  ]

two_cls_val   <- parabolic[401:500,]

grid <- expand.grid(X1 = seq(-5.1, 5.0, length.out = 25), 

                    X2 = seq(-5.5, 4.0, length.out = 25))

set.seed(3824)

cls_mod <- tab_pfn(class ~ ., data = two_cls_train)

grid_pred <- predict(cls_mod, grid)

grid_pred

```

The fit looks fairly good when shown with out-of-sample data: 

```{r}

#| label: boundaries

#| fig.width: 5

#| fig.height: 4

#| fig.align: "center"

#| out.width: 50%

cbind(grid, grid_pred) |>

 ggplot(aes(X1, X2)) + 

 geom_point(data = two_cls_val, aes(col = class, pch = class), 

            alpha = 3 / 4, cex = 3) +

 geom_contour(aes(z = .pred_Class1), breaks = 1/ 2, col = "black", linewidth = 1) +

 coord_equal(ratio = 1)

```

## Code of Conduct

  

Please note that the tabpfn project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tidymodels/tabpfn

Awesome Lists containing this project

README